Search

BLOOMZ

Affiliation
BigScience
EleutherAI
Commercial
Fine-tuning Method
SFT
Note
- paper, huggingface . pre-training에서 보지 못한 언어에 대해 adapter를 활용한 language adaptation 후, zero-shot 실험 - BLOOMZ : BLOOM의 multi-task fine-tuned version . Adapter architecture : (IA)^3 (inner transformer block activation에 element-wise rescaling 하는 것), continued pretraining, bottleneck adapter, BitFit, LoRA, FishMask 등을 실험 . 결론 : BLOOM + MadX (bottleneck adapter) 조합으로 language adaptation 후 zero-shot 했을 때 성능이 제일 좋음 . 결론 : BLOOMZ (fine-tuned) 는 prompting capability를 잃어버려서, adaptation method가 오히려 성능을 낮춤
데이터
- xP3 (multilingual task mixture; cross-lingual & cross-task paper)
모델 크기
300M (mt5 fine-tuning) 580M (mt5 fine-tuning) 1.2B (mt5 fine-tuning) 3.7B (mt5 fine-tuning) 13B (mt5 fine-tuning) 560M (bloom fine-tuning) 1.1B (bloom fine-tuning) 1.7B (bloom fine-tuning) 3B (bloom fine-tuning) 7.1B (bloom fine-tuning) 176B (bloom fine-tuning)
새롭게 제공된 Resource
Model
출시일
2022-12

Data