Search

Koala

Affiliation
UC Berkeley
Commercial
Fine-tuning Method
SFT
Note
- homepage, demo - github code - 웹에서 얻은 대화 데이터를 기반으로 LLaMA를 fine-tuning 한 모델 - JAX/Flax를 활용한 EasyLM을 활용하여, pre-train, fine-tune, serve, evaluate 진행 - Training : 8 A100 GPUs (a single Nvidia DGX) & 6 hours for 2 epochs & <$100 - evaluation : Alpaca보다 좋고, 반정도의 데이터에서 ChatGPT와 비슷한 성능 - result : ChatGPT > Koala ~ Alpaca . testset (link) : 180 queries & human evaluation - comparison data를 학습시에는, “a helpful answer” for positive, “an unhelpful answer” for negative를 conditioning 한 뒤 정답 생성. . human feedback이 없는 데이터는 “a helpful answer”를 붙임. . evaluation 시에도 “a helpful answer” 붙임.
데이터
[ChatGPT DIstillation Data] - ShareGPT : 30k cleaned English data obtained by ChatGPT (from ~60k original data) . multi-turn - HC3 (paper) : 24k questions 에 대해 사라므이 대답 60k & ChatGPT 대답 27k (총 87k QA examples) . single-turn [Open Source Data] - OIG : Open Instruction Generalist , LAION에 의해 만들어진 “garde-school-math-instructions, poetry-to-songs, plot-screenplay-books-dialogue” 30k examples . single-turn - Alpaca dataset (link) . single-turn - Anthropic HH : ~160k Human-rated examples (harmfulness & helpfulness 기준, response pair 중에 더 선호되는 것) - OpenAI WebGPT : ~20K comparisons (a question, a pair of model answers, and metadata, comparision human rates with a preference score) - OpenAI Summarization : ~93K examples . comparison part : 2개의 summaries 중 best를 고르는 것 . axis part : likert scale로 quality 평가하는 것 (overall, accuracy, coverage, coherence, compatible) [test set] - testset (link) : 180 queries (자체 데이터 + Alpaca testset)
모델 크기
13B
새롭게 제공된 Resource
InstructData
Model
Training/Inference Pipeline
출시일
2023-04-03

ShareGPT Instruct Data example

Instruct Data
instruction - input - output : "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Identify the odd one out. ### Input: Twitter, Instagram, Telegram ### Response: Telegram"
instruction - output : "Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: What are the three primary colors? ### Response: The three primary colors are red, blue, and yellow."