Gemma

Affiliation

Google

Commercial

Fine-tuning Method

SFT

RLHF

Note

데이터

모델 크기

2B 7B

새롭게 제공된 Resource

Model

출시일

2024-02

Intro

A family of lightweight, state-of-the art open models (Commercial) built from the same research and technology used to create the Gemini models

References

•

Tech Report : https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf

•

Main Page : https://ai.google.dev/gemma?hl=en

•

HuggingFace (2B, 7B; pre-trained model, instruction-finetuned model; 16bit, 8bit, 4bit) 

You need to upgrade your HF version to v4.38 

◦

models

▪

https://huggingface.co/google/gemma-7b

▪

https://huggingface.co/google/gemma-7b-it

▪

https://huggingface.co/google/gemma-2b

▪

https://huggingface.co/google/gemma-2b-it

◦

blog : https://huggingface.co/blog/gemma

◦

demo : https://huggingface.co/chat

•

Official Guideline 

◦

Pytorch (via kagglehub; 2B & 7B) : https://ai.google.dev/gemma/docs/pytorch_gemma

◦

LoRA fine-tuning : https://ai.google.dev/gemma/docs/lora_tuning

◦

TPU distributed fine-tuning : https://ai.google.dev/gemma/docs/distributed_tuning 

•

Kaggle Notebook example : https://www.kaggle.com/models/google/gemma/code/

◦

LoRA fine-tuning : https://www.kaggle.com/code/nilaychauhan/fine-tune-gemma-models-in-keras-using-lora

◦

Finetuning & Inference : https://www.kaggle.com/code/nilaychauhan/keras-gemma-distributed-finetuning-and-inference

Released Models

•

Size : 2B, 7B

•

Training Type : pre-trained model, instruction-finetuned mode (its own prompt style) 

•

Different Precisions : 16bit, 8bit, 4bit

License

https://ai.google.dev/gemma/terms

Training Data

•

Totaling 2T/6T Tokens for 2B/7B

•

Data Types

Web Documents: A diverse collection of web text ensures the model is exposed to a broad range of linguistic styles, topics, and vocabulary. Primarily English-language content.

Code: Exposing the model to code helps it to learn the syntax and patterns of programming languages, which improves its ability to generate code or understand code-related questions.

Mathematics: Training on mathematical text helps the model learn logical reasoning, symbolic representation, and to address mathematical queries.

•

Data Preprocessing

◦

CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content.

◦

Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets.

◦

Additional methods: Filtering based on content quality and safety in line with our policies.

Performance

•

Open LLM Leaderboard

•

Comparison

Limitation

Gemma 2B and 7B are trained on 2T and 6T tokens respectively of primarily-English data from web documents, mathematics, and code. Unlike Gemini, these models are not multimodal, nor are they trained for state-of-the-art performance on multilingual tasks.