Junseong’s AI Blog
/
LLM Ecosystem: Open-Source Model/Data/Code (since ChatGPT)
/
BLOOM
Search
BLOOM
Affiliation
EleutherAI
BigScience
Commercial
Fine-tuning Method
Note
-
data description
,
huggingface
-
code
: megatron-LM &
DeepSpeed
를 수정하여 활용 . PyTorch (pytorch-1.11 w/ CUDA-11.5; see
Github link
) . apex (
Github link
) - model : Megatron-LM GPT2 architecture . Stable Embedding (Layer norm을 wor embeddings layer에 적용;
code
,
paper
) . ALiBI positional encoding (
paper
) . GeLU activation functions . BPE : The BLOOM tokenizer (
link
) -
A simple pre-tokenization rule, no normalization - training : 2022년 3월 11일 ~ 2022년 7월 5일 (version 1.3) . 384 A100 80GB GPUs (48 nodes) . 32 A100 80GB GPUs (4 nodes) in reserve . 8 GPUs per node using NVLink 4 inter-gpu connects, 4 OmniPath links . GPU Memory : 640GB per node . CPU Memory : 512GB per node . NCCL-communications network . inter-node connect : Omni-Path Architecture (OPA) . Disc IO Network : shared network with other types of nodes . Training throughput : ~150 TFLOP per GPU per second . epoch : 1 epoch (95000 iterations) . total tokens : 366B tokens
데이터
- 데이터 셋 : ROOTS ◦ 46개 자연어 (multi-lingual) : 한국어 없음 ◦ 13개 프로그래밍 언어
모델 크기
300M 580M 1.2B 3.7B 13B 560M 1.1B 1.7B 3B 7.1B 176B
새롭게 제공된 Resource
Model
출시일
2022-11