LLaMA

Affiliation

MetaAI

Commercial

Fine-tuning Method

SFT

Note

- paper - Architecture : Pre-Norm, SwiGLU, Rotary PE (GPTNeo & GPT-J에서 쓰이기 시작하면서 LLM에 활용됨) - Efficient Implementation : attention 계산 (xformer library) & backward (Flashattention for self-att), autograd 대신 backward function 직접 구현 (activations 값 미리 저장), model & sequence parallelism (memory 감소) - Resource for 65.2B : 380 tokens/seg/GPU on 2048 A100 GPU with 80 GB of RAM & 1.4 tokens 학습시 21일 소요

데이터

- English CommonCrawl (67%) : CCNet pipeline quality filtering + filtering model 추가 활용 - C4 (15%) : CCNet pipeline - Github (4.5%) : Apache, BSD, MIT licenses - Wikipedia (4.5%) : bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk - Gutenberg and Books3 (4.5%) - ArXiv (2.5%) : latex files - Stack Exchange (2%) : High quality QA data

모델 크기

6.7B (1.0T tokens) 13.0B (1.0T tokens) 32.5B (1.4T tokens) 65.2B (1.4T tokens)

새롭게 제공된 Resource

Model

출시일

2023-02

References

•

20230302

◦

model : 73

•

20230310

◦

lamma.cpp : https://github.com/ggerganov/llama.cpp

•

20230312

◦

Dalai : https://cocktailpeanut.github.io/dalai/#/

▪

Run LLaMA and Alpaca on your computer.

•

20230313 : pixel 6 위에서 구동 

◦

https://twitter.com/thiteanish/status/1635188333705043969

License

You make something open source by granting rights under an approved license. Meta’s license looks pretty open at first glance with phrases like “You are granted a non-exclusive, worldwide, non-transferable and royalty-free limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the Llama Materials.” So far so good, you can do what you like – but watch out for the restrictions, carefully designed to protect Meta’s share of the market:

“2. Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.“

Sorry Google or anyone else running a particularly large Internet service with lots of users (someone mentioned even Snapchat comes under this restriction) – you can’t use Meta’s model. Is this concerning for anyone else? Probably not, but it’s still anti-competitive.

“v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Llama 2 or derivative works thereof).”

So you can’t use Llama 2 to generate training data for either building or fine tuning any other model, which is potentially more of a problem as lots of people might want to do this.

This is Meta’s own license, created by them and not approved by the Open Source Initiative, who are generally accepted to be the (non-profit) authorities on what is open source and what is not. It has similarities to the approaches taken by MongoDB and Elastic to restrict the use of their software by cloud hosting companies like Amazon, as I’ve written about previously.