Mistral 8x7B

Affiliation

Mistral AI

Commercial

Fine-tuning Method

SFT

DPO

Note

데이터

모델 크기

46.7B (8x7B; 12.8B active params)

새롭게 제공된 Resource

Model

출시일

2023-12

Mistral AI

•

Meta & Google 출신의 Researcher들이 2023년 04월에 설립한 AI 프랑스 회사.

•

$415M fund raising in Oct. 2023.

•

a valuation of more than $2B in Dec. 2023.

•

Run the model

◦

Mistral 7x8b quantized down to 4b and running locally assumes I'm running a Debian-based system

[23/12] Mistral 8x7B 모델 공개 & platform 공개 & embedding model 공개

GPT-4가 166B 모델 8개를 MoE로 묶은 것으로 알려진 것처럼, MoE 모델 공개

•

Mistral 8x7B

◦

SMoE Decoder-only architecture

▪

8개의 7B로 구성되어있으며, 각 layer는 token별 router Networt를 통해 2개의 SMoE를 구성.

▪

총 46.7B params (12.9B active params)

▪

min. GPU RAM for inference : 100GB

◦

성능

▪

Mixtral은 Llama 2 모델과 GPT3.5 기본 모델과 비교하여 대부분의 벤치마크에서 동등하거나 더 나은 성능을 보임.

•

Llama 2 70B를 대부분 벤치마크에서 능가하며, 6배 빠른 추론

•

대부분의 표준 벤치마크에서 GPT 3.5와 비슷하거나 더 나은 성능을 보임

•

MMLU 70.6% (Llama 2 70B 69.9%, GPT 3.5 70.0%)

▪

Mixtral은 Llama 2 70B 모델과 비교하여 더 진실된 답변을 제공하고(TruthfulQA 벤치마크에서 73.9% 대 50.2%), BBQ 벤치마크에서 더 적은 편향을 보임.

▪

영어/프랑스어/이탈리아어/독일어/스페인어 처리

▪

허용 라이센스가 있는 가장 강력한 오픈 웨이트 모델이며, 비용/성능 면에서 가장 우수한 모델

▪

32k token context length

▪

코드 생성에서 강력한 성능을 보임

◦

Instruct

▪

Mixtral 8x7B Instruct는 지시에 따른 성능을 최적화하기 위해 감독된 미세 조정과 직접적인 선호도 최적화(DPO)를 거침.

▪

MT-Bench에서 8.30의 점수를 달성하여 GPT3.5와 비슷한 성능을 가진 최고의 오픈 소스 모델이 됨.

•

platform 공개 (API) : https://console.mistral.ai/ & https://mistral.ai/product/

◦

API : https://docs.mistral.ai/api/ & https://docs.mistral.ai/platform/endpoints/

▪

create chat compeletion

▪

create embeddings

▪

list available models

◦

model endpoints

▪

tiny : Mistral-7B-v0.2

▪

small : Mistral-8x7B-v0.1

▪

medium : internal prototype model

▪

embed : embedding models

1024 dimensions, It achieves a retrieval score of 55.26 on MTEB.

•

Open-weight models

◦

https://docs.mistral.ai/models/

◦

Mistral-7B-v0.1: Hugging Face // raw_weights (md5sum: 37dab53973db2d56b2da0a033a15307f).

▪

Min. GPU RAM for inference : 16GB

◦

Mistral-7B-Instruct-v0.2: Hugging Face // raw_weights (md5sum: fbae55bc038f12f010b4251326e73d39).

◦

Mixtral-8x7B-v0.1: Hugging Face.

▪

Min. GPU RAM for inference : 100GB

◦

Mixtral-8x7B-Instruct-v0.1: Hugging Face // raw_weights (md5sum: 8e2d3930145dc43d3084396f49d38a3f).

•

the way of tokenization

[START_SYMBOL_ID] + 
tok("[INST]") + tok(USER_MESSAGE_1) + tok("[/INST]") +
tok(BOT_MESSAGE_1) + [END_SYMBOL_ID] +
…
tok("[INST]") + tok(USER_MESSAGE_N) + tok("[/INST]") +
tok(BOT_MESSAGE_N) + [END_SYMBOL_ID]
Python
복사

<s>[INST] Instruction [/INST] Model answer</s>[INST] Follow-up instruction [/INST]
Python
복사

Characteristics of Mistral Models (previous version)

General guidance on models

•

The OpenAI models are the most reliable.

•

Mistral models have the most natural writing style. OpenAI models have a distinct “ChatGPT style.”

•

For simple tasks, the cheaper models are going to work just as well as more expensive ones like GPT-4. For more complex tasks that require high-level reasoning, GPT-4 is unmatched.

•

For Long-Form, Mistral Medium works great. It’s reasonably priced, powerful, and has a very natural writing style.

•

Mistral models only support English, French, Italian, German, and Spanish.

•

If you have a very high quality knowledge base, you can get away with a cheaper model.

•

Experiment with all of the models to find what works best for your use case.