Fine-tuning Method
- blog, github - model input size 512 - supervised learning (RLHF 3단계 중 1단계만 적용한 것) - 학습 비용 : A100 x 8 for 3 hours at the cost of $100, augmented data generation at the cost of $500 - self-instruct 방식으로 데이터 생성 (modified self-instruct) . instruction 생성은 한번에 20개씩 . 평가단계는 생략 (기존에는 task identification을 통해, input-first or output-first w/ task class를 선택함) . response 생성은 1 instruction 당 한개씩
- Alpaca Instruct Data (link) : 52K instruction-following demonstrations from GPT-3.5 responses (175 self-instruct seed tasks & augmentation)
모델 크기
새롭게 제공된 Resource


재현 깃헙
알파카, 라마 관련 레포입니다.
세미나 영상


modified self-Instruct (instruction 생성은 한번에 20개씩, 평가단계는 생략, response 생성은 1 instruction 당 한개씩)

We built on the data generation pipeline from self-instruct and made the following modifications:
We used text-davinci-003 to generate the instruction data instead of davinci.
We wrote a new prompt (prompt.txt) that explicitly gave the requirement of instruction generation to text-davinci-003. Note: there is a slight error in the prompt we used, and future users should incorporate the edit in #24
You are asked to come up with a set of 20 diverse task instructions. These task instructions will be given to a GPT model and we will evaluate the GPT model for completing the instructions. Here are the requirements: 1. Try not to repeat the verb for each instruction to maximize diversity. 2. The language used for the instruction also should be diverse. For example, you should combine questions with imperative instrucitons. 3. The type of instructions should be diverse. The list should include diverse types of tasks like open-ended generation, classification, editing, etc. 2. A GPT language model should be able to complete the instruction. For example, do not ask the assistant to create any visual or audio output. For another example, do not ask the assistant to wake you up at 5pm or set a reminder because it cannot perform any action. 3. The instructions should be in English. 4. The instructions should be 1 to 2 sentences long. Either an imperative sentence or a question is permitted. 5. You should generate an appropriate input to the instruction. The input field should contain a specific example provided for the instruction. It should involve realistic data and should not contain simple placeholders. The input should provide substantial content to make the instruction challenging but should ideally not exceed 100 words. 6. Not all instructions require input. For example, when a instruction asks about some general information, "what is the highest peak in the world", it is not necssary to provide a specific context. In this case, we simply put "<noinput>" in the input field. 7. The output should be an appropriate response to the instruction and the input. Make sure the output is less than 100 words. List of 20 tasks:
We adopted much more aggressive batch decoding, i.e., generating 20 instructions at once, which significantly reduced the cost of data generation.
We simplified the data generation pipeline by discarding the difference between classification and non-classification instructions.
We only generated a single instance for each instruction, instead of 2 to 3 instances as in [1].