Edition 36 🌸

Good Morning! ☀️ Enjoy last week’s AI snapshot! 🚀

August 18, 2024

🌸 LLM

🌼 The rise and fall of synthetic datasets and smaller language models: 🤗 Thomas Wolf

Are synthetic data still useful? Yes, but the web is so big and diverse that synthetic data really make more sense for some domain specific part where the right data is lacking, say reasoning or math.

Now, right as they were excited by these new discoveries and results, they were joined by a new intern Elie, who proved a great specialist of various trainings techniques and they decided to push the experiments to the limits in term of model size, going from 1.7B down to 360M and even 170M, aka the sizes of the old GPT1, BERT and GPT2, to see how small a model could be while still keeping good performances

🌼 Hermes 3

Nous Research has just unveiled Hermes 3, a cutting-edge, open-source model packed with game-changing enhancements. 🌟

What’s New?

Enhanced Capabilities:
- 🚀 Major upgrades in roleplaying, agentic tasks, and function calling.
- 🗣️ Improved multi-turn conversations and long context coherence.
Model Variants:
- 🧠 Available in three sizes: 8B, 70B, and 405B.
- 🏆 The 405B parameter model sets a new standard, outperforming other open models.
Instruct-Tuned Precision:
- 🎯 Trained to faithfully follow user requests and adhere closely to system prompts.
- 🔥 Delivers similar or superior performance compared to Meta’s Llama-3.1 405B.
Versatile Application:
- 💡 Excels in judgment, reward modeling, interpretable problem-solving, code generation, and tool use.

🌼 Transformer Explainer - A Beautiful Visualization

🌻 Code

🌼 California's SB 1047: A Controversial AI Safeguard

In the world outside of sci-fi films, AI hasn’t yet caused real-world disasters or massive cyberattacks. But some lawmakers in California want to act before that dystopian future becomes a reality. SB 1047—a bill designed to prevent AI systems from causing catastrophic harm, now headed for a final vote in the state senate later this month. 🗳️

The Goal of SB 1047:

Preventing AI Disasters:
- 🛑 Aims to stop large AI models from being weaponized to cause “critical harms” to humanity. like Mass Casualties, Massive Cyberattacks
Holding Developers Accountable:
- Makes developers (the companies behind these AI models) responsible for implementing safety protocols to prevent these disastrous outcomes.
Targeting the Biggest AI Models:
- Only applies to AI models that cost at least $100 million and require 10^26 FLOPS for training—essentially the largest and most powerful models.
Please watch the Interview:

🌼 Llama-3.1 8B → NVIDIA Llama-3.1-Minitron 4B

Pruning and distillation in model training offer significant advantages:

Improved Performance: Boosts MMLU scores by 16% compared to training from scratch.
Token Efficiency: Requires fewer training tokens, around 100B tokens per model, with up to a 40x reduction.
Cost Savings: Reduces compute costs by up to 1.8x when training a family of models, compared to training each from scratch.
Competitive Results: Achieves performance on par with models like Mistral 7B, Gemma 7B, and Llama-3 8B, which are trained on significantly more data, up to 15 trillion tokens.

🌸 LLM Security

🌼 Backdoor Attacks against Model Merging

Model Merging (MM) enhances model performance across tasks by merging multiple fine-tuned models. However, it also opens the door to new vulnerabilities.
An adversary only needs to contribute one backdoored task-specific model to compromise the entire merged model.
How It Works:
- BadMerging uses a two-stage attack mechanism and a novel feature-interpolation-based loss to ensure the backdoors remain effective despite changes in merging parameters.

🌻 Code

🌸 Engineering

🌼 A Beautiful Read - “Passkeys are not passwords”

Passkeys have been a thing for a little while now, and if you know me you know that I’m a big fan of the technology. They bring the power of public-key cryptography to individual website authentication, and they do it in a way that (usually) feels like magic. It’s pretty awesome stuff.

🌼 Prompt caching with Claude

Prompt caching, which enables developers to cache frequently used context between API calls, is now available on the Anthropic API. With prompt caching, customers can provide Claude with more background knowledge and example outputs—all while reducing costs by up to 90% and latency by up to 85% for long prompts. Prompt caching is available today in public beta for Claude 3.5 Sonnet and Claude 3 Haiku, with support for Claude 3 Opus coming soon.