The Musings On AI
Posts
🍁 Edition 15 : LLM Customisation Technique

🍁 Edition 15 : LLM Customisation Technique

what is Prompt Tuning, Fine tuning and RLHF?

October 10, 2023

In Enterprise, we need to customize the LLM according to our business cases.

Custom models empower enterprises to create personalized solutions that align with their brand voice, optimize workflows, provide more precise insights, and deliver enhanced user experiences, ultimately driving a competitive edge in the market.

I tried to wrap the information in a diagram.

Prompt engineering: Manipulates the prompt sent to the LLM but doesn’t alter the parameters of the LLM in any way. It is light in terms of data and compute requirements.
Prompt learning: Uses prompt and completion pairs imparting task-specific knowledge to LLMs through virtual tokens. This process requires more data and computation but provides better accuracy than prompt engineering.
Parameter-efficient fine-tuning (PEFT): Introduces a small number of parameters or layers to existing LLM architecture and is trained with use-case–specific data, providing higher accuracy than prompt engineering and prompt learning, while requiring more training data and computing.
Fine-tuning: Involves updating the pretrained LLM weights unlike the three types of customization techniques outlined earlier that keep these weights frozen. This means fine-tuning also requires the most amount of training data and computing as compared to these other techniques. However, it provides the most accuracy for specific use cases, justifying the cost and complexity.

🦚 Prompt Engineering

Prompt engineering involves customization at inference time with show-and-tell examples. An LLM is provided with example prompts and completions, and detailed instructions that are prepended to a new prompt to generate the desired completion. The parameters of the model are not changed.

Few-shot prompting: This approach requires prepending a few sample prompts and completion pairs to the prompt, so that the LLM learns how to generate responses for a new unseen prompt.
Chain-of-thought reasoning: Just as humans decompose bigger problems into smaller ones and apply chain of thought to solve problems effectively, chain-of-thought reasoning is a prompt engineering technique that helps LLMs improve their performance on multi-step tasks.
- Details

System prompting: This approach involves adding a system-level prompt in addition to the user prompt to provide specific and detailed instructions to the LLMs to behave as intended.

🦤 Prompt Learning

Prompt learning is an efficient customization method that makes it possible to use pretrained LLMs on many downstream tasks without needing to tune the pretrained model’s full set of parameters. It includes two variations with subtle differences called p-tuning and prompt tuning; both methods are collectively referred to as prompt learning.

🦃 Parameter-efficient fine-tuning

Parameter-efficient fine-tuning (PEFT) techniques use clever optimizations to selectively add and update a few parameters or layers to the original LLM architecture. Using PEFT, model parameters are trained for specific use cases. Pretrained LLM weights are kept frozen and significantly fewer parameters are updated during PEFT using domain and task-specific datasets. This enables LLMs to reach high accuracy on trained tasks.

Adapter Learning: Introduces small feed-forward layers in between the layers of the core transformer architecture. Only these layers (adapters) are trained at fine-tuning time for specific downstream tasks.
LoRA: Injects trainable low-rank matrices into transformer layers to approximate weight updates. Instead of updating the full pre-trained weight matrix W, LoRA updates its low-rank decomposition, reducing the number of trainable parameters 10,000 times and the GPU memory requirements by 3x compared to fine-tuning.

🐧 Fine-tuning

When data and compute resources have no hard constraints, customization techniques such as supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) are great alternative approaches to PEFT and prompt engineering. Fine-tuning can help achieve the best accuracy on a range of use cases as compared to other customization approaches.

Supervised fine-tuning: SFT is the process of fine-tuning all the model’s parameters on labeled data of inputs and outputs that teaches the model domain-specific terms and how to follow user-specified instructions. It is typically done after model pretraining.
Reinforcement learning with human feedback: Reinforcement learning with human feedback (RLHF) is a customization technique that enables LLMs to achieve better alignment with human values and preferences.
DPO - Direct Preference Optimization: DPO is a method introduced to achieve precise control over LLMs. DPO treats the constrained reward maximization problem as a classification problem on human preference data. This approach is stable, efficient, and computationally lightweight. It eliminates the need for reward model fitting, extensive sampling, and hyperparameter tuning.

I will publish the next Edition on Thursday.

This is the 15th Edition, If you have any feedback please don’t hesitate to share it with me, And if you love my work, do share it with your colleagues.

It takes time to research and document it - Please be a paid subscriber and support my work.

Cheers!!

Raahul

Reply

or to participate.