SOOFTWARE

welcome,

Welcome to sooftware.io

Welcome to sooftware.io 안녕하세요! sooftware.io…

record,

AI Joker 공개

AI Joker 공개 이번에 회사에서 작업한 A.I. Joker 모델을 공개했습니다! Joker 모델은 한국어 Hate Speech를 생성하는 모델입니다. 최근 많은 LLM들이 나오고 있는 만큼 해당 모델들의 Safety…

단 30줄로 ChatGPT 웹페이지 만들기 (Streamlit chat_message) cover image

toolkit, web, chatgpt,

단 30줄로 ChatGPT 웹페이지 만들기 (Streamlit chat_message)

단 30줄로 ChatGPT 웹페이지 만들기 (Streamlit chat_message) Streamlit은 파이썬 기반의 오픈소스 웹 UI 라이브러리입니다. 매우 간단한 코드로 손쉽게 웹페이지를 띄울 수 있어서 간단한 데모나 PoC…

record,

GitHub Follow 500 달성!!

GitHub Follow 500 달성!! 최근에 깃허브 팔로워 500명을 달성했습니다! 🎉🎉 창업 이후에는 사실 깃허브 활동을 못하고 있었는데도, 꾸준히 팔로워가 늘어서 어느새 50…

Terabyte(TB) 단위 데이터 셔플링 - terashuf cover image

toolkit, environment,

Terabyte(TB) 단위 데이터 셔플링 - terashuf

Terabyte(TB) 단위 데이터 셔플링 - terashuf 리눅스에서 TB 단위의 데이터를 line 기준으로 셔플링이 필요할 때가 (가끔) 있다. 직접 코딩해서 쓰기에는 메모리, 속도 등을 신경써야해서 생각보다 큰 작업인데, terashuf…

RLHF는 수다쟁이를 만든다?! (Does RLHF Breed Verbose Chatterboxes?!) cover image

nlp, rlhf,

RLHF는 수다쟁이를 만든다?! (Does RLHF Breed Verbose Chatterboxes?!)

RLHF는 수다쟁이를 만든다?! (Does RLHF Breed Verbose Chatterboxes?!) RLHF(Reinforcement Learning from Human Feedback)는 OpenAI의 ChatGPT…

book, review,

규칙 없음 (No Rules Rules)

규칙 없음 (No Rules Rules) ‘오징어게임’, ‘더 글로리’, ‘종이의 집’ 등으로 알려진 세계 최대 OTT(Over The Top) 플랫폼 ‘넷플릭스’는 이제는 국내에서도 누구나 알만한 서비스가 됐다. 2023년 1…

튜닙, AICA H100 * 8대 GPU 지원 선정 cover image

record,

튜닙, AICA H100 * 8대 GPU 지원 선정

튜닙, AICA H100 * 8대 GPU 지원 선정 저희 튜닙이 인공지능산업융합사업단(AICA)에서 지원하는 ‘2024년 AI 데이터센터 서비스’ 사업에 선정되면서 1년간 H100 *…

튜닙, 삼성 C-Lab Outside 선정 & 사무실 이사 cover image

record,

튜닙, 삼성 C-Lab Outside 선정 & 사무실 이사

튜닙, 삼성 C-Lab Outside 선정 & 사무실 이사 저희 튜닙이 삼성 C-Lab Outside에 선정됐습니다! 😄 😄 C-Lab Outside는 삼성전자의 사외 스타트업 육성 프로그램으로, 국내 유망 스타트업을 육성하기 위해 201…

성공팔이들에게 속지 말자. (feat. 역행자, 자청) cover image

record,

성공팔이들에게 속지 말자. (feat. 역행자, 자청)

성공팔이들에게 속지 말자. (feat…

toolkit, environment,

tmux - conf

tmux - conf Tmux 설정을 위한 tmux configuraion 기록

book, review,

울트라러닝 (Ultra Learning)

# 울트라러닝 (Ultra Learning) 읽은지 한 달 정도 지나서 어느새 디테일한 내용은 가물가물하지만, 기억이 더 희미해지기 전에 리뷰를 남겨두려고 한다. 가끔 참 너무하다 싶을 정도로 공부해야될게 너무 많다. 초중고 12년과 대학 교육…

news,

Meta는 왜 LLaMA를 공개할까?

Meta는 왜 LLaMA를 공개할까? Meta(구 Facebook)은 FAANG(Facebook, Apple, Amazon, Netflix, Google)라는 용어가 있을 정도로 거대 IT 기업 중 하나입니다. Meta는 지금 LLM…

review,

Atomic Habits (아주 작은 습관의 힘)

Atomic Habits (아주 작은 습관의 힘) 자기계발 서적중 스테디셀러인 ‘Atomic Habits…

review,

팀장의 관점

팀장의 관점 김규철님이 쓰신 ‘팀장의 관점’이라는 책을 읽었다. 현재 회사에서 AI…

LLM Paper Abstract - 2023.12 cover image

nlp,

LLM Paper Abstract - 2023.12

LLM Paper Abstract - 2023.12 LLM…

record,

Throwback 2023

Throwback 2023 어느덧 2023년이 가고 2024년을 맞이하게 됐습니다. 2020년 회고 글을 작성하고, 매년 회고글을 작성하려고 마음을 먹었는데, 202…

book, review,

원씽 (THE ONE THING)

# 원씽 (THE ONE THING) 원씽(The One Thing)은 2013년에 나온 책이고, 아마존 베스트셀러…

What is MoE? (Mixture of Experts) cover image

nlp, mixtral,

What is MoE? (Mixture of Experts)

What it MoE? (Mixture of Experts) 현존 최강 LLM인 GPT-4에서 “MoE (Mixture of Experts)” 방식을 채택하여 사용하고 있다고 알려졌는데요, 최근 AI계의 뜨거운 감자 Mistral AI…

book, review,

당신의 뇌는 최적화를 원한다

…

news,

Welcome to the Gemini Era

Welcome to the Gemini Era Google이 드디어 OpenAI에 대한 반격을 시작한걸까요? 구글이 ‘Gemini’라는 이름의 초거대 모델을 갑작스럽게 공개했습니다. MMLU 벤치마크에서 GPT-4를 넘으며, 32개 중 3…

LLM Paper Abstract - 2023.11 cover image

nlp,

LLM Paper Abstract - 2023.11

LLM Paper Abstract - 2023.11 LLM…

thinking, book, review,

Self Branding

Self Branding…

Open AI CEO Sam Altman join Microsoft cover image

news,

Open AI CEO Sam Altman join Microsoft

Open AI CEO Sam Altman join Microsoft 저번 주말부터 오늘(23.11.20)까지 충격적인 소식이 AI 업계를 강타했다. MAU 1억+명의 사용자를 보유한 ChatGPT를 만든 OpenAI의 CEO Sam Altman…

book, review,

더 마인드 (The Mind)

# 더 마인드 (The Mind…

부의 추월차선 (The Millionaire Fastlane) cover image

book, review,

부의 추월차선 (The Millionaire Fastlane)

# 부의 추월차선 (The Millionaire Fastlane…

LLM Paper Abstract - 2023.10 cover image

nlp,

LLM Paper Abstract - 2023.10

LLM Paper Abstract - 2023.10 LLM…

book, review,

그들은 왜 최후의 승자가 되지 못했나

…

book, review,

레버리지 (LEVERAGE)

# 레버리지 (LEVERAGE…

Findings of EMNLP 2023 Accept cover image

nlp, record,

Findings of EMNLP 2023 Accept

Findings of EMNLP 2023 - Accept 공동 1저자로 참여한 “A Korean News Comments Dataset with Target-Specific Offensiveness Ratings” 논문이 Findings of…

record,

[RECORD] 개인 운동 기록 - 2023.10.11

개인 운동 기록 - 2023.10.11 오랜만의 운동 기록이다. 다행히 운동은 꾸준히 하고 있고, 몸도 많이 좋아졌고 수행능력도 좋아졌다. 운동 시작하기로 마음 먹은지…

자체 LLM으로 제작한 챗봇 서비스, Dearmate cover image

tunib,

자체 LLM으로 제작한 챗봇 서비스, Dearmate

자체 LLM으로 제작한 챗봇 서비스, Dearmate 최근 제가 하고 있는 일에 대해서 소개드리고 싶어 글을 적게 됐습니다. 저는 튜닙 이라는 자연어처리 테크 스타트업의 공동창업자이자, AI 엔지니어로써 일하고 있습니다. 21년…

nlp, paper,

LLaMA2

LLaMA2 Meta(전 Facebook)가 이번 7월 18일에 LLaMA2를 공개했습니다. 🎉 LLaMA2 관련 내용이 담긴 논문 과 함께 7B, 13B, 70B 모델을 공개했습니다. 이전 LLaMA와 다르게 LLaMA…

GPT-NeoX - DeepSpeed Inference cover image

toolkit, environment,

GPT-NeoX - DeepSpeed Inference

GPT-NeoX - DeepSpeed Inference DeepSpeed Inference를 사용하면 간단하게 모델 추론 성능을 끌어올릴 수 있다. Tensor Parallel…

toolkit,

sshpass

Sshpass…

Huggingface PEFT (Parameter-Efficient Fine-Tuning) cover image

huggingface, nlp, lora,

Huggingface PEFT (Parameter-Efficient Fine-Tuning)

Huggingface PEFT (Parameter-Efficient Fine-Tuning) 허깅페이스에서 나온 PEFT라는 라이브러리인데 LoRA, Prefix Tuning, P-Tuing, Prompt Tuning…

toolkit, environment,

ast - literal_eval 에러 기록

ast - literal_eval 에러 기록 ast의 literal_eval…

toolkit, environment,

Docker - GPU 할당

Docker - GPU 할당 도커에서 GPU를 할당하는 방법

toolkit, environment,

Docker - 공유 디렉토리 연결 (mount)

Docker - 공유 디렉토리 연결 (mount) 도커를 쓰다보면 코드상에서 뭔가를 저장한다거나 어떤 데이터를 읽어와야 한다던가 하는 상황이 있는데, 이때 공유 디렉토리를 연결해서 run하면 편하다. 나 옵션을 이용하면 쉽게 가능하다.

2022 인공지능 그랜드 챌린지 (정책 지원 AI) 3위 cover image

record,

2022 인공지능 그랜드 챌린지 (정책 지원 AI) 3위

2022 인공지능 그랜드 챌린지 - 정책지원 3위 2022 인공지능 그랜드 챌린지 (정책 지원 AI)에서…

nlp, serving,

Sooftware Serving - Kernl

Sooftware Serving - Kernl ELS-RD (Lefebvre Dalloz Services) 라는 단체에서 Kernl 이라는 좋은 Inference Enginer을 내주었습니다! PyTorch 기반의 Transformer…

record,

[RECORD] 개인 운동 기록 - 2022.10.30

개인 운동 기록 - 2022.10.30 한 달하고 3주만의 기록이다. 안타깝게도, 수행능력은 이전 기록과 동일한 것 같다. 그래도 좋은 소식은 한 달동안 근육량은 유지한채, 지방만 3kg정도 빠졌다. 1년 반동안 멈췄던 운동을 다시 시작하면서…

Sooftware Serving - Huggingface Optimum cover image

huggingface, nlp, serving,

Sooftware Serving - Huggingface Optimum

Sooftware Serving - Huggingface Optimum 허깅페이스에서 나온 Transformers의 Extension 라이브러리이다. 목적은 모델 학습 및 인퍼런스를 더욱 빠르게 해주기 위한 라이브러리이다. Exporting…

Sooftware Serving - Terminology cover image

nlp, serving,

Sooftware Serving - Terminology

Sooftware Serving - Terminology NLP…

Sooftware Serving - Triton Inference Server cover image

nlp, serving,

Sooftware Serving - Triton Inference Server

Sooftware Serving - Triton Inference Server Triton Inference Server는 인공지능 모델의 인퍼런스를 도와주는 오픈소스 소프트웨어다. 다양한 프레임워크(TensorRT, TensorFlow…

record,

[RECORD] 개인 운동 기록 - 2022.09.07

개인 운동 기록 - 2022.09.07 21살에 해병대 면접에 붙기 위해 팔굽혀펴기를 연습하던 때부터 전역 후 몇년간은 맨몸운동 위주의 운동을 꾸준히 했었는데, 대학교…

record,

2022 AI 온라인 경진대회 장관상 수상

2022 AI 온라인 경진대회 장관상 수상 2022 인공지능 경진대회 - 기계독해 태스크에서 1위를 차지하고, 사업화 부문에서 인정받으며 자연어 부문 1위를 했습니다! 상으로 과학기술정보통신부 장관상을 수상했습니다. 튜닙 소속으로 받은…

Sooftware NLP - Mecab 설치 & 사용자 정의 사전 추가 cover image

nlp, environment,

Sooftware NLP - Mecab 설치 & 사용자 정의 사전 추가

Mecab 설치 & 사용자 정의 사전 추가 Mecab은 대표적인 형태소 분석기입니다. 한국어 형태소 분석기로 유명합니다만, Mecab은 본래 일본의 Taku Kudo…

N행시를 지어주는 인공지능이 있다?! TUNiBridge N행시 서비스! cover image

nlp, service,

N행시를 지어주는 인공지능이 있다?! TUNiBridge N행시 서비스!

N행시를 지어주는 인공지능이 있다?! TUNiBridge N행시 서비스! 한국 사람이라면 누구나 한 번쯤은 N행시를 만들어본 경험이 있을겁니다. 예능에서도 많이 등장하고, 회사에서, 군대에서, 혹은 연인 이름으로 N…

Sooftware Coding - 좋은 코딩 습관 (네이밍) cover image

record,

Sooftware Coding - 좋은 코딩 습관 (네이밍)

Sooftware Coding - 좋은 코딩 습관 (네이밍) 깔끔한 코드를 짜기 위한 제가 생각하는 10가지 습관입니다. ※ 저의 개인적인 주관이 포함된 글입니다. 1. 변수, 클래스명에는 동사를 넣지 않는다 변수, 클래스명은 다음과 같이 count…

Sooftware Pandas - 다중 딕셔너리를 Pandas DataFrame으로! cover image

toolkit, python,

Sooftware Pandas - 다중 딕셔너리를 Pandas DataFrame으로!

Sooftware Pandas - 다중 딕셔너리를 Pandas DataFrame으로! 가끔 데이터를 요렇게 저렇게 정리하다가보면, 데이터를 멀티인덱싱(Multi-Indexing…

마크다운으로 발표자료 만들기 (Marp for VS Code) cover image

software,

마크다운으로 발표자료 만들기 (Marp for VS Code)

마크다운으로 발표자료 만들기 (Marp for VS Code) 저는 개인적으로 무언가를 정리할 때 텍스트 레벨에서 모든 편집이 가능한 마크다운(Markdown…

Sooftware NLP - 문장의 개체명을 분석해보자! 개체명 인식, Named Entity Recognition (NER) cover image

nlp,

Sooftware NLP - 문장의 개체명을 분석해보자! 개체명 인식, Named Entity Recognition (NER)

Sooftware NLP - 문장의 개체명을 분석해보자! Named Entity Recognition (NER) NLP 기술을 이용하면 꽤나 정교한 텍스트 분석이 가능합니다. 텍스트 분석에서 빼놓으면 섭섭한 녀석이 개체명 인식(Named Entity…

Sooftware NLP - 혐오는 이제 그만! St. Patrick cover image

nlp,

Sooftware NLP - 혐오는 이제 그만! St. Patrick

St. Patrick, the original safety engine by TUNiB, checks if the user text includes any toxic expressions or personal information and provides detailed reports.

Sooftware NLP - 한국어 사전학습 모델 (Korean Pre-trained Language Model) cover image

nlp,

Sooftware NLP - 한국어 사전학습 모델 (Korean Pre-trained Language Model)

한국어 사전학습 모델 (Korean Pre-trained Language Model) 공개된 한국어 사전학습 모델을 기록합니다. 크게 아래 3개의 모델 계열로 구분했으며, 모델 사이즈는 정확하지 않을 수 있습니다. Encoder Model (BERT…

record,

2022 AI 온라인 경진대회 1위

2022 AI 온라인 경진대회 1위 2022 인공지능 경진대회 - 기계독해 태스크에서 1위를 차지했습니다. 🎉 🎉 작년에는 제가 모든 리딩 및 코딩을 담당했는데, 이번에는…

Sooftware NLP - Huggingface Datasets Methods cover image

nlp,

Sooftware NLP - Huggingface Datasets Methods

Huggingface Datasets Methods 자주 사용하는 허깅페이스 datasets의 메서드를 정리합니다. load_datasets 허깅페이스 서버에 올라가 있는 데이터셋을 다운 받을 때 사용하는 메서드 save_to_disk…

Sooftware NLP - Decoding Strategy (디코딩 전략) cover image

nlp,

Sooftware NLP - Decoding Strategy (디코딩 전략)

Decoding Strategy (디코딩 전략) 이번 포스팅에서는 자연어처리 모델의 디코딩 전략에 관해서 다뤄보려고 합니다. 디코딩이란 말처럼 디코딩은 디코더에서 수행하는 작업입니다. 즉, BERT와 같은 인코더 모델에서 사용하는게 아니라 GPT…

record,

광주소프트웨어마이스터고등학교 학생들 튜닙 방문

…

Sooftware NLP - Generation with Retrieval cover image

nlp, paper,

Sooftware NLP - Generation with Retrieval

Generation with Retrieval 이번에 딥마인드에서 RETRO(Retrieval-Enhanced Transformer) 라는 모델을 내놓았습니다. 문서 retrieval + GPT 기반 모델인데, 7B 모델임에도 불구하고 2…

Basic Computer System - 연산 장치 cover image

cs,

Basic Computer System - 연산 장치

Basic Computer System ※ 본 포스팅의 내용은 책을 읽고 공부한 내용을 기록한 포스트입니다. 컴퓨터를 구성하는 3 요소 연산 장치 (CPU) => 초당 얼마나 많이 계산 가능한지 메모리 장치 (RAM, Hard-Drive…

고성능 파이썬 (High Performance Python) cover image

book, review,

고성능 파이썬 (High Performance Python)

고성능 파이썬 (High Performance Python) 신년 첫 개발 서적으로 고성능 파이썬 (High Performance Python…

record,

2021년 회고

2021년 회고 2020년 회고 글 을 적은지 얼마 되지 않은 것 같은데 벌써 202…

toolkit,

Slack Bot

Slack Bot Python과 Slack API를 사용하여, 특정 채널에 자동으로 글을 올리거나 댓글을 달아주는 슬랙봇을 만들어보겠습니다. 두 개의 과정으로 진행되는데, 첫 번째는 Slack API에 bot을 등록하는 것이고 두 번째는 등록된 bot…

Sooftware NLP - Fine-grained Post-training for Improving Retrieval-based Dialogue Systems Paper Review cover image

nlp, paper,

Sooftware NLP - Fine-grained Post-training for Improving Retrieval-based Dialogue Systems Paper Review

Fine-grained Post-training for Improving Retrieval-based Dialogue Systems Paper Review Paper: https://aclanthology.org/2021.naacl-main.12…

K-Startups (Korea AI Startups) cover image

record,

K-Startups (Korea AI Startups)

K(orea) Startups Intro 한국에 있는 테크 스타트업들 리스트와 간단한 설명을 기록합니다. ‘내가 아는 스타트업들이 얼마나 될까?’ 라는 간단한 생각을 시작으로 기록하게 됐습니다. 허훈님의 nlp-startups, 염기웅님의 Korea…

toolkit, web, python, open-source,

Flask

Flask Flask는 ‘micro’웹 프레임워크입니다. 즉 Django…

총, 균, 쇠 (guns germs and steel) cover image

book, review,

총, 균, 쇠 (guns germs and steel)

# 총, 균, 쇠 (guns germs and steel…

Sooftware NLP - GPT (Generative Pre-trained Transformer) cover image

nlp, parallelism, large-scale, lm,

Sooftware NLP - GPT (Generative Pre-trained Transformer)

GPT (Generative Pre-trained Transformer) 1 gpt1 먼저 알아보고, gpt2에 대해 알아보겠습니다. GPT1 Improving Language Understanding by Generative Pre-Training…

Sooftware NLP - Large Scale LM (2) Distributed Programming cover image

nlp, parallelism, large-scale, lm,

Sooftware NLP - Large Scale LM (2) Distributed Programming

Large Scale LM (2) Distributed Programming (작성중) 이 자료는 [해당 link…

Sooftware NLP - Large Scale LM (1) Background cover image

nlp, parallelism, large-scale, lm,

Sooftware NLP - Large Scale LM (1) Background

Large Scale LM (1) Background 이 자료는 [해당 link…

record,

튜닙 블로그 개편

튜닙 블로그 개편 2021.11.16일자로 튜닙 블로그가 개편되었습니다! 기존에는 노션을 이용해 다소 밋밋한 텍스트 위주의 UI…

record,

튜닙, 30억 규모 시드투자 유치

튜닙, 30억 규모 시드투자 유치 저희 튜닙이 30억 규모의 시드투자를 유치했습니다! 😄 😄 이번 투자에는 펄어비스캐피탈(PAC), DSC인베스트먼트, 네이버 D2SF…

광운대학교 SW 중심대학사업단에서 SW 전문가 특강 cover image

record, presentation,

광운대학교 SW 중심대학사업단에서 SW 전문가 특강

광운대학교 SW 중심대학사업단에서 SW 전문가 특강 2021.11.04에 모교인 광운대학교 컴퓨터공학과 학생들을 대상으로 SW…

Sooftware NLP - DeepSpeed Usage cover image

nlp, parallelism, large-scale,

Sooftware NLP - DeepSpeed Usage

DeepSpeed Usage…

Sooftware ML - Wandb Image Log cover image

toolkit, logging,

Sooftware ML - Wandb Image Log

Wandb (Weights & Bias) Image Log Wandb 라이브러리는 최근에 가장 편리하면서도 파워풀한 logging 라이브러리입니다. NLP에서 많이 쓰이는 PyTorch, PyTorch-Lightning, Huggingface…

nlp, metric,

Sooftware NLP - NLP Metrics

NLP Metrics Confusion Matrix Confusion Matrix는 분류 모델을 평가할때 모델이 얼마나 정밀한지, 얼마나 실용적인 분류를 해냈는지, 얼마나 정확한 분류를 해냈는지에 대한 모든 내용을 포함하고 있습니다. Accuracy…

2021 SSDC 발표 - "TUNiB Electra 개발기" cover image

presentation,

2021 SSDC 발표 - "TUNiB Electra 개발기"

2021 SSDC 발표 - “TUNiB Electra 개발기” 영광스럽게도 이번에 열리는 SSDC(구 SOSCON) - Samsung Software Developer Conference…

speech,

Sooftware Speech - 한국어 Tacotron2

한국어 Tacotron2 이번 포스팅에서는 Tacotron2 아키텍처로 한국어 TTS 시스템을 만드는 방법에 대해 다루겠습니다. Tacotron2 Tacotron2는 17년 12월 구글이 NATURAL TTS SYNTHESIS BY…

toolkit, environment,

Docker란?

Docker 란? 도커는 컨테이너 기반의 오픈소스 가상화 플랫폼입니다. 배가 물건을 컨테이너에 넣어 운반하는 것처럼, 도커도 여러 가지 원하는 프로그램들을 컨테이너에 넣어 배포할 수 있다는 점이 비슷합니다. Docker 주요 개념…

Sooftware NLP - Page Rank란?? cover image

nlp, algorithm,

Sooftware NLP - Page Rank란??

Page Rank 구글은 무엇을 기준으로 사이트를 보여주는 순서를 정할까요?? 구글에 특정 단어를 검색하면 다음과 같이 여러 사이트 들을 보여주는 것을 알 수 있습니다. 구글은 이런 사이트들에 점수를 부여해주는데, 여기서 부여된 점수들을 Page…

toolkit,

Sooftware ML - BentoML

BentoML Machine Learning Serving 라이브러리인 BentoML 사용방법에 대해 정리합니다. image 주요 특징 Online / Offline Serving Flask 기반 모델보다 100배의 처리량을 가지고, Adaptive…

Sooftware NLP - Uniform Length Batching in PyTorch cover image

nlp,

Sooftware NLP - Uniform Length Batching in PyTorch

Uniform Length Batching in PyTorch 전체 토큰 길이가 비슷한 인풋끼리 배치를 이루어주는 방식 그냥 랜덤하게 배치를 묶어주면 길이가 한 데이터를 제외하고는 평균 길이가 10인데 한 데이터 길이가 10…

React 기반 개인 웹페이지 배포하기 (gatsby) cover image

toolkit, web,

React 기반 개인 웹페이지 배포하기 (gatsby)

React 기반 개인 웹페이지 배포하기 이번 글에서는 react…

Sooftware NLP - Textless NLP cover image

speech, nlp, paper,

Sooftware NLP - Textless NLP

Textless NLP: Generating expressive speech from raw audio paper / code / pre-train model / blog Name: Generative Spoken Language Model (GSLM…

huggingface, nlp, record,

TUNiB Electra 공개

이번에 저희 튜닙에서 공들여 만든 TUNiB Electra 모델을 공개했습니다 !! 🎉 🎉 이번 공개에서는 한-영 bilingual 모델과 한국어 모델을 각각 Small/Base 사이즈로 공개했으며, HuggingFace transformers…

Sooftware ML - PyTorch Lightning cover image

toolkit,

Sooftware ML - PyTorch Lightning

PyTorch Lightning 대표적인 딥러닝 프레임워크로 , 가 있습니다. 최근에는 보다 를 선호하는 유저가 많아지는 것 같습니다. PyTorch Lightning 은 PyTorch에 대한 High-level…

nlp,

Sooftware NLP - Tokenizer

Tokenization 문장에서 의미있는 단위로 나누는 작업을 라고 한다. 문자 단위 토큰화 문자 단위로 토큰화를 하는 것이다. 한글 음절 수는 모두 11,172개이므로 알파벳, 숫자, 기호 등을 고려한다고 해도 단어 사전의 크기는 기껏해야 1…

nlp,

정규표현식 (regex)

정규 표현식 정규표현식(regular expression)은 일종의 문자를 표현하는 공식으로, 특정 규칙이 있는 문자열 집합을 추출할 때 자주 사용되는 기법입니다. 주로 Prograaming Language나 Text Editor…

toolkit, web,

Streamlit

The fastest way to build data apps in Python

2021 LangCon 발표 - "한국어 음성 인식: KoSpeech 개발기부터 OpenSpeech 개발기까지" cover image

presentation,

2021 LangCon 발표 - "한국어 음성 인식: KoSpeech 개발기부터 OpenSpeech 개발기까지"

2021 LangCon 발표 - “한국어 음성 인식: KoSpeech 개발기부터 OpenSpeech 개발기까지” 올해로 3회째를 맞는 LangCon에 발표자로 참석해 “한국어 음성 인식: KoSpeech 개발기부터 OpenSpeech…

record, presentation,

판교 AI Camp 프로그램 발표

판교 AI Camp 프로그램 발표 지난 8월 28일 있었던 판교 AI Camp 프로그램 에서 발표자로 나서 인공지능경진대회 1위 수상의 노하우를 공유했습니다.

Sooftware NLP - Hugging Face Tokenizers cover image

huggingface, nlp,

Sooftware NLP - Hugging Face Tokenizers

최근 NLP 토크나이저를 만드는데 가장 많이 사용되는 라이브러와 실제 사용이 가장 많이 되는 라이브러리로의 변환에 대한 코드를 담고 있습니다. 해당 내용은 버젼에서 수행되었습니다. Train 아래 코드는 wordpiece, char-bpe…

Sooftware NLP - Efficient Attention Paper Review cover image

nlp, paper,

Sooftware NLP - Efficient Attention Paper Review

Efficient Attention: Attention with Linear Complexities Shen Zhuoran et al. Abstract Dot-product attention은 들어오는 인풋 길이에 따라 memory…

presentation,

2021 AI 온라인 경진대회 1위 후기 발표

2021 AI 온라인 경진대회 1위 노하우 발표 이번에 참가한 2021 AI 온라인 경진대회 - 노인 대화 감성 분석 트랙 1위 노하우에 대해 구글밋으로 발표했습니다. 관심 있으신 분들은 위 링크로 접속하셔서 보시면 됩니다 :)

record,

2021 AI 온라인 경진대회 1위

2021 AI 온라인 경진대회 1위 이번에 열린 2021 인공지능 온라인 경진대회 대화 감성 분류 태스크에 회사 대표로 참가 Public / Private / Final 리더보드에서 모두…

[REVIEW] 크래프톤웨이 (Krafton Way) cover image

book, review,

[REVIEW] 크래프톤웨이 (Krafton Way)

[REVIEW] 크래프톤웨이 (Krafton Way) 배틀그라운드, 테라 등의 게임개발사로 유명한 KRAFTON…

Sooftware NLP - Luna: Linear Unified Nested Attention cover image

nlp, paper,

Sooftware NLP - Luna: Linear Unified Nested Attention

Luna: Linear Unified Nested Attention USC + CMU + Facebook AI 2021.06 code Abstract 트랜스포머의 Multi Headed Self Attention…

Ray: multi-processing library cover image

toolkit,

Ray: multi-processing library

Ray: multi-processing library…

Sooftware ML - wandb (Weight & Bias) cover image

toolkit,

Sooftware ML - wandb (Weight & Bias)

wandb (Weight & Bias) image 는 Tensorboard와 같이 log를 보기 쉽게 시각화해주는 툴입니다. Tensorflow, PyTorch, transformers, PyTorch-Lightning…

Sooftware NLP - P-Tuning Paper Review cover image

nlp, paper,

Sooftware NLP - P-Tuning Paper Review

GPT Understands, Too Xiao Liu et al. Tsinghua University etc. arXiv pre-print Abstract GPT를 파인튜닝하는 방법은 Narural Language Understanding (NLU…

Sooftware Speech - Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition Paper Review cover image

speech, paper,

Sooftware Speech - Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition Paper Review

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition Yu Zhang et al., 2020 Google Research, Brain Team Reference…

카카오브레인 퇴사, 그리고 창업 (feat. 졸업) cover image

record,

카카오브레인 퇴사, 그리고 창업 (feat. 졸업)

카카오브레인 퇴사, 그리고 창업 (feat…

speech, toolkit, record,

PORORO Text-To-Speech (TTS)

PORORO Text-To-Speech (TTS) 얼마전에 저희 팀에서 공개한 PORORO: Platform Of neuRal mOdels for natuRal language prOcessing 라이브러리에 제가 공들여만든 TTS…

Sooftware NLP - Longformer Paper Review cover image

nlp, paper,

Sooftware NLP - Longformer Paper Review

Longformer: The Long-Document Transformer Paper Code Iz Beltagy et al. Introduction 트랜스포머는 긴 시퀀스는 처리하지 못한다는 한계를 가지고 있음 이유는 시퀀스 길이에 O(n^…

Computer Architecture Review cover image

cs,

Computer Architecture Review

Computer Architecture Review 오랜만에 컴퓨터 구조에서 배운 내용을 조금 복습해보며 감을 잡기 위함 컴퓨터가 코드를 처리하는 과정 Read Code Assembly 변환 CPU에서 실행 CPU에서 하나의 명령(Ex) Add…

Sooftware NLP - Pororo: A Deep Learning based Multilingual Natural Language Processing Library cover image

nlp, toolkit, record,

Sooftware NLP - Pororo: A Deep Learning based Multilingual Natural Language Processing Library

Pororo: A Deep Learning based Multilingual Natural Language Processing Library Link: https://github.com/kakaobrain/pororo…

record,

2020년 회고

2020년 회고 다사다난했던 2020년이 지나고 어느덧 2021년 새해가 밝았습니다. 🤗 🤗 코로나라는 세계적인 재앙 때문에 생활부터 모든게 많이 달라진 한 해 였습니다. 여태까지…

Sooftware NLP - Fairseq Hydra cover image

toolkit, nlp,

Sooftware NLP - Fairseq Hydra

Fairseq’s Hydra Fairseq이 0.10.1로 버젼 업그레이드를 하면서 configuration 관리를 Hydra로 하게됨. Fairseq을 실행시키는 command line…

toolkit,

Sooftware ML - Hydra

Hydra: framework for elegantly configuring complex applications Facebook Research에서 공개한 오픈소스. 복잡한 Configuration…

Sooftware Speech - EMNLP Paper Review: Speech cover image

speech, paper,

Sooftware Speech - EMNLP Paper Review: Speech

EMNLP Paper Review: Speech Adaptive Feature Selection for End-to-End Speech Translation (Biao Zhang et al) Incremental Text-to-Speech…

Sooftware NLP - Megatron LM Paper Review cover image

nlp, parallelism, paper,

Sooftware NLP - Megatron LM Paper Review

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Mohammad Shoeybi et al. 2019. NVIDIA Corp. Summary…

software, environment,

Mac iTerm2 + ZSH 세팅

Mac iTerm2 + ZSH 세팅 개발환경에서 가장 중요한 소프트웨어 중 하나는 쉘입니다. 어떤 OS에서 작업하냐에 따라서 어떤 쉘을 쓰는지 등이 달라질텐데요, Mac OS에서 가장 많이 사용되는 iTerm2와 ZSH…

Sooftware Speech - One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech Paper Review cover image

speech, tts, paper,

Sooftware Speech - One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech Paper Review

One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech Tomáš Nekvinda, Ondřej Dušek Charles University INTERSPEECH, 202…

Sooftware NLP - RoBERTa Paper Review cover image

nlp, paper,

Sooftware NLP - RoBERTa Paper Review

RoBERTa paper / code Abstract BERT를 제대로 학습시키는 법을 제안 BERT는 엄청난 모델이지만, Original BERT 논문에서 하이퍼파라미터에 대한 실험이 제대로 진행되지 않음 BERT…

Sooftware NLP - Electra Paper Review cover image

nlp, paper,

Sooftware NLP - Electra Paper Review

Below is just about everything you’ll need to style in the theme. Check the source code to see the many embedded elements within paragraphs…

Sooftware Speech - Wav2vec 2.0 : A Framework for Self-Supervised Learning of Speech Representations cover image

speech, paper,

Sooftware Speech - Wav2vec 2.0 : A Framework for Self-Supervised Learning of Speech Representations

wav2vec 2.0 : A Framework for Self-Supervised Learning of Speech Representations Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael…

Sooftware Speech - Conformer Paper Review cover image

speech, paper,

Sooftware Speech - Conformer Paper Review

Conformer: Convolution-augmented Transformer for Speech Recognition Anmol Gulati et al. Google Inc. INTERSPEECH, 2020 Reference Conformer…

speech,

Sooftware Speech - AI & Speech Processing: Application-2

AI & Speech Processing: Application-2 본 글은 광운대학교 전자공학과 박호종 교수님의 강의를 듣고 작성되었음을 밝힙니다. Speaker Verification and Identification Verification…

speech,

Sooftware Speech - AI & Speech Processing: Application-1

AI & Speech Processing: Application-1 본 글은 광운대학교 전자공학과 박호종 교수님의 강의를 듣고 작성되었음을 밝힙니다. 음성/오디오/sound…

Sooftware Speech - AI & Speech Processing: DSP for Audio cover image

speech, dsp,

Sooftware Speech - AI & Speech Processing: DSP for Audio

AI & Speech Signal Processing Lecture : DSP for Audio 본 글은 광운대학교 전자공학과 박호종 교수님의 강의를 듣고 작성되었음을 밝힙니다. 이제는 오디오에 특화된 DSP로 넘어가보자. Short-Time…

Sooftware Speech - AI & Speech Processing: DSP-2 cover image

speech, dsp,

Sooftware Speech - AI & Speech Processing: DSP-2

AI & Speech Processing: DSP-2 본 글은 광운대학교 전자공학과 박호종 교수님의 강의를 듣고 작성되었음을 밝힙니다. DFT (Discrete Fourier Transform) Digital 처리를 위하여 time와 frequency…

Sooftware Speech - AI & Speech Processing: DSP-1 cover image

speech, dsp,

Sooftware Speech - AI & Speech Processing: DSP-1

AI & Speech Processing: DSP-1 본 글은 광운대학교 전자공학과 박호종 교수님의 강의를 듣고 작성되었음을 밝힙니다. DSP Review Time-to-Frequency transform Continuous-Time Fourier…

Sooftware Speech - ClovaCall Paper Review cover image

speech, paper,

Sooftware Speech - ClovaCall Paper Review

ClovaCall: Korean Goal-Oriented Dialog Speech Corpus for Automatic Speech Recognition of Contact Centers image 논문링크 2020-04-2…

Sooftware NLP - Beam Search (빔서치) cover image

nlp,

Sooftware NLP - Beam Search (빔서치)

Sooftware NLP - Beam Search (빔서치) 본 포스팅은 “빔서치”에 대한 본질적인 개념보다는 Encoder-Decoder 모델 (Seq2seq…

Sooftware Speech - STATE-OF-THE-ART SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODEL Paper Review cover image

speech, paper,

Sooftware Speech - STATE-OF-THE-ART SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODEL Paper Review

「STATE-OF-THE-ART SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODEL」 Review title https://arxiv.org/abs/1712.0176…

Sooftware NLP - Attention Mechanism (어텐션 메커니즘) cover image

nlp,

Sooftware NLP - Attention Mechanism (어텐션 메커니즘)

Sooftware NLP - Attention Mechanism (어텐션 메커니즘) 본 포스팅을 이해하기 위해서는 다음 글에 대한 이해가 선행되는 것이 좋습니다. RNN (Recurrent Neural Network) LSTM & GRU (Long…

Sooftware NLP - Seq2seq (Sequence to sequence) cover image

nlp,

Sooftware NLP - Seq2seq (Sequence to sequence)

Sooftware NLP - Seq2seq (Sequence to sequence) 본 포스팅을 이해하기 위해서는 다음 글에 대한 이해가 선행되는 것이 좋습니다. RNN (Recurrent Neural Network) LSTM & GRU (Long…

Sooftware NLP - Teacher Forcing (티쳐포싱) cover image

nlp,

Sooftware NLP - Teacher Forcing (티쳐포싱)

Sooftware NLP - Teacher Forcing (티쳐포싱) 본 포스팅을 이해하기 위해서는 다음 글에 대한 이해가 선행되는 것이 좋습니다. RNN (Recurrent Neural Network) LSTM & GRU (Long Short…

nlp,

Sooftware NLP - LSTM & GRU

Sooftware NLP - LSTM & GRU 본 포스팅을 이해가기 위해서는 아래 글에 대한 이해가 선행되는 것이 좋습니다. RNN (Recurrent Neural Network) LSTM 등장 배경 RNN…

Sooftware Speech - Attention-Based Models for Speech Recognition Paper Review cover image

speech, paper,

Sooftware Speech - Attention-Based Models for Speech Recognition Paper Review

Attention-Based Models for Speech Recognition Paper Review title http://papers.nips.cc/paper/5847-attention-based-models-for-speech…

Sooftware Speech - SpecAugment Paper Review cover image

speech, paper,

Sooftware Speech - SpecAugment Paper Review

SpecAugment: 「A Simple Data Augmentation Method for Automatic Speech Recognition」 Review title https://arxiv.org/abs/1904.08779 Abstract…

Sooftware NLP - RNN (Recurrent Neural Network) cover image

nlp,

Sooftware NLP - RNN (Recurrent Neural Network)

Sooftware NLP - RNN (Recurrent Neural Network) 본 포스팅을 이해하기 위해서는 피드포워드 네트워크에 대한 이해가 선행되는 것이 좋습니다. RNN의 등장 배경 RNN에 대해 알아보기 전에 RNN…

record,

네이버 2019 해커톤 - Speech 결선진출

네이버 2019 해커톤 - Speech 결선진출 네이버 2019 해커톤 - Speech 대회 예선전에서 100 팀 중 11위를 기록하며 예선전을 통과했다 !! 오늘 아침까지만 해도 10위여서 Top 1…

speech, paper,

Sooftware Speech - DeepSpeech Paper Review

Deep Speech: Scaling up end-to-end speech recognition title https://arxiv.org/pdf/1412.5567.pdf (Awni Hannun et al. 2014) Abstract…

Sooftware Speech - Listen, Attend and Spell Paper Review cover image

speech, paper,

Sooftware Speech - Listen, Attend and Spell Paper Review

「Listen, Attend and Spell」 Review title https://arxiv.org/abs/1508.01211 (William Chan et al. 2015) Introduction 어텐션 기반 Seq2seq…

Sooftware Speech - MFCC (Mel-Frequency Cepstral Coefficient) cover image

dsp, speech,

Sooftware Speech - MFCC (Mel-Frequency Cepstral Coefficient)

MFCC (Mel-Frequency Cepstral Coefficient) ‘Voice Recognition Using MFCC Algorithm’ 논문 참고 MFCC란? 음성인식에서 MFCC, Mel-Spectrogram…

Subscribe to SOOFTWARE

SOOFTWARE

Co-founder | A.I. lead @tunib.ai | Open-source contributor