BLOOM

Affiliation

EleutherAI

BigScience

Commercial

Fine-tuning Method

Note

- data description, huggingface - code : megatron-LM & DeepSpeed를 수정하여 활용 . PyTorch (pytorch-1.11 w/ CUDA-11.5; see Github link) . apex (Github link) - model : Megatron-LM GPT2 architecture . Stable Embedding (Layer norm을 wor embeddings layer에 적용; code, paper) . ALiBI positional encoding (paper) . GeLU activation functions . BPE : The BLOOM tokenizer (link) - A simple pre-tokenization rule, no normalization - training : 2022년 3월 11일 ~ 2022년 7월 5일 (version 1.3) . 384 A100 80GB GPUs (48 nodes) . 32 A100 80GB GPUs (4 nodes) in reserve . 8 GPUs per node using NVLink 4 inter-gpu connects, 4 OmniPath links . GPU Memory : 640GB per node . CPU Memory : 512GB per node . NCCL-communications network . inter-node connect : Omni-Path Architecture (OPA) . Disc IO Network : shared network with other types of nodes . Training throughput : ~150 TFLOP per GPU per second . epoch : 1 epoch (95000 iterations) . total tokens : 366B tokens

데이터

- 데이터 셋 : ROOTS ◦ 46개 자연어 (multi-lingual) : 한국어 없음 ◦ 13개 프로그래밍 언어

모델 크기

300M 580M 1.2B 3.7B 13B 560M 1.1B 1.7B 3B 7.1B 176B

새롭게 제공된 Resource

Model

출시일

2022-11