The 3rd Edition

Petals - The future!!!

July 17, 2023

👨‍💻 Developers Arena 👨‍💻

⚡️A 14-Line Text Classification Code ⚡️

📌 Imagine a compact 0-parameter 14-line Python script going head-to-head with a colossal 345 Million-parameter transformer model—and winning! Yes, you read that right! In an extraordinary turn of events, a revolutionary study has shattered preconceived notions about machine learning and reaffirmed the irreplaceable value of science and deep thinking. Say hello to the future of text classification!

📌 Introducing the paper that's causing a stir in the scientific community: "Low-Resource" Text Classification: A Parameter-Free Classification Method with Compressors. Brace yourself for a mind-bending journey through the extraordinary power of information theory and the boundless potential of compression algorithms.

📌 Here's how it works: Basically, this code defines the distance between one piece of text x1 and another piece of text x2 by how much the gzip compression of the combined text x1x2 differs from the maximum of the compressions of x1 and x2. For x1=x2 this distance is zero because as soon as you know x1 you also know x2. However, the more the information content of x1 differs from x2, the less "compressible" x1x2 becomes, and the distance increases.

📌 But wait, there's more! We won't ignore the practical considerations. It's worth noting that the runtime of the 14-line code may experience some sluggishness when dealing with extensive datasets. Don't fret though—this is a general issue with K-nearest neighbors (KNN). Our talented researchers are actively exploring workarounds to optimize its performance. Nevertheless, even if the code remains perfect for smaller datasets where runtime is not a major concern, the results are undeniably groundbreaking.

📌 So buckle up, fellow thinkers and innovators, because the future of text classification just got a lot more exciting. Stay tuned for further updates and brace yourselves for a paradigm shift that will shape the way we analyze and understand text.

📜 LLM Only 📜

🌸 Petals: Collaborative Inference and Fine-tuning of Large Models 🌸

📌 Training over the Internet became a reality.

Paper: https://arxiv.org/abs/2209.01188
Code: https://github.com/bigscience-workshop/petals
Discord: https://discord.com/invite/D9MwApKgWa

📌 Is it slow - Check yourself.

📌 This ChatBot is running around the world.

📌 Speed: - With a 65B model the inference can reach speeds of 5 tokens/sec. - This method is so fast that it can even be faster than training locally with offloading.

📌 Colab Link

✨ Last 15 Days in AI ✨

🚀 Code Interpreter For All 🚀

📌 Code Interpreter is the latest feature in OpenAI ChatGPT (specifically, with the GPT-4 model) that allows you to run Python code in a live working environment. It’s basically a sandboxed Python environment where you can execute Python code to perform any task you like. It may sound like a feature built for coders, but it can also help general users with many tasks.

📌 ChatGPT Code Interpreter can access over 300 Python packages. Here's the complete list — with a short description of each generated by ChatGPT. Source.

🚀 Claude 2.0 100K Token Window 🚀

📌 Anthropic released the next version of its chatbot.

📌 You can feed it content (like a PDF) of up to 100K tokens. It scored above the 90th percentile on the GRE reading and writing exams.

📌 Compared to GPT-4, there's lots to like about Claude 2

Free!
75k word input.
Data updated to early 2023
Can upload several documents.

So it has been generating a lot of excitement.

Author’s Note

The substack recommends the shorter newsletter, that’s why I divided the actual newsletter into three different parts to make it quick.

I will publish the other parts this week. The 4th Edition will reach your inbox by Thursday.

This is the 3rd Edition, If you have any feedback please don’t hesitate to share it with me, And if you love my work, do share it with your colleagues.

Cheers!!

Raahul

Reply

or to participate.