Search

LLaMA

Affiliation
MetaAI
Commercial
Fine-tuning Method
SFT
Note
- paper - Architecture : Pre-Norm, SwiGLU, Rotary PE (GPTNeo & GPT-J에서 쓰이기 시작하면서 LLM에 활용됨) - Efficient Implementation : attention 계산 (xformer library) & backward (Flashattention for self-att), autograd 대신 backward function 직접 구현 (activations 값 미리 저장), model & sequence parallelism (memory 감소) - Resource for 65.2B : 380 tokens/seg/GPU on 2048 A100 GPU with 80 GB of RAM & 1.4 tokens 학습시 21일 소요
데이터
- English CommonCrawl (67%) : CCNet pipeline quality filtering + filtering model 추가 활용 - C4 (15%) : CCNet pipeline - Github (4.5%) : Apache, BSD, MIT licenses - Wikipedia (4.5%) : bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk - Gutenberg and Books3 (4.5%) - ArXiv (2.5%) : latex files - Stack Exchange (2%) : High quality QA data
모델 크기
6.7B (1.0T tokens) 13.0B (1.0T tokens) 32.5B (1.4T tokens) 65.2B (1.4T tokens)
새롭게 제공된 Resource
Model
출시일
2023-02

References

20230312
Run LLaMA and Alpaca on your computer.
20230313 : pixel 6 위에서 구동