The Musings On AI
Posts
🌻 E38: Jevons Paradox, Agent Updates & HelixFold3

🌻 E38: Jevons Paradox, Agent Updates & HelixFold3

Raahul Dutta
September 03, 2024

🌸 Good Morning - Jevon’s Paradox of LLMs

Over the past years, we've witnessed a remarkable 79% reduction in costs, with the cost per million tokens in large language models decreasing even more rapidly than the growth in computing power predicted by Moore’s Law.

Let me take you back to the 19th century when a British economist named William Stanley Jevons (1835-1882) introduced a concept that would later be known as the Jevons Paradox. When James Watt unveiled his efficient steam engine, which used significantly less coal than previous models, many believed that coal consumption would decrease. However, to everyone's surprise, the opposite occurred—coal consumption in the UK soared.

Similarly, consider Apple’s iPod and iTunes, which many thought would simply make music more accessible. Instead, they revolutionized the music industry, leading to an unprecedented surge in music consumption.

Source: https://www.deeplearning.ai/the-batch/issue-264/

That's why we're seeing the widespread adoption of large language models—everyone's buzzing about AI agents.

However, in real-world production, it's not just about deploying a single agent; we need a whole flock of them—100 or more—working together seamlessly. From my experience over the past year, I've noticed that while agent integration succeeds 90-95% of the time, the pipeline often breaks down because full automation is still lacking. This is the real challenge. The company that can crack this 'last mile' will be the next unicorn.

Prompt: A man sleeps and does not dream!!!

🌸 From The Agent Community

🌼 A Text2SQL Debugger Agent

Database mismatches, such as conditions and constraints, often cause errors in real-life Text-to-SQL frameworks.

To tackle this, a tool-assisted agent framework for SQL inspection and refinement has been proposed, equipping LLMs with a retriever and a detector to diagnose and correct these mismatches.

Spider-Mismatch, a dataset that reflects real-world condition mismatches, outperforms baselines on its dataset and achieves top performance in few-shot settings on the Spider and Spider-Realistic datasets.

Tool-Assisted Agent on SQL Inspection and Refinement in Real-World Scenarios

Recent Text-to-SQL methods leverage large language models (LLMs) by incorporating feedback from the database management system. While these methods effectively address execution errors in SQL queries, they struggle with database mismatches -- errors that do not trigger execution exceptions. Database mismatches include issues such as condition mismatches and stricter constraint mismatches, both of which are more prevalent in real-world scenarios. To address these challenges, we propose a tool-assisted agent framework for SQL inspection and refinement, equipping the LLM-based agent with two specialized tools: a retriever and a detector, designed to diagnose and correct SQL queries with database mismatches. These tools enhance the capability of LLMs to handle real-world queries more effectively. We also introduce Spider-Mismatch, a new dataset specifically constructed to reflect the condition mismatch problems encountered in real-world scenarios. Experimental results demonstrate that our method achieves the highest performance on the averaged results of the Spider and Spider-Realistic datasets in few-shot settings, and it significantly outperforms baseline methods on the more realistic dataset, Spider-Mismatch.

arxiv.org/abs/2408.16991

🌼 Webpilot - Autonomous Multi-Agent System for Web Task Execution

LLM-based autonomous agents often struggle with complex web tasks due to the unpredictable nature of these environments. Traditional agents rely on rigid, expert-designed policies, limiting their adaptability to new tasks. Unlike humans, who adapt through exploration, these agents lack flexibility.

WebPilot, a multi-agent system that enhances Monte Carlo Tree Search (MCTS) with a dual optimization strategy. The Global Optimization phase breaks tasks into subtasks, refining plans based on new observations.

The Local Optimization phase uses tailored MCTS to handle uncertainties and refine decisions. A 93% success rate increase on WebArena marks a significant advancement in autonomous agent capabilities. It looks like MindSearch and The code are still not available but want to explore more.

Webpilot

Autonomous Multi-Agent System for Web Task Execution

yaoz720.github.io/WebPilot

🌸 Choice Cuts

🌼 HelixFold3, a model based on PaddlePaddle that replicates AlphaFold3, has been released as open-source.Baidu's PaddleHelix team has successfully matched AlphaFold3's performance with HelixFold3 and made it available to the public.

PaddleHelix/apps/protein_folding/helixfold3 at dev · PaddlePaddle/PaddleHelix

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集 - PaddlePaddle/PaddleHelix

github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold3

🌼 The Mamba in LLAMA

🌼 Gpt4 is costly for Scraping.

Source

100 M Context window - A 100M context window means it can probably store everything you’ve ever told it for years.

100M Token Context Windows

Research update on ultra-long context models, our partnership with Google Cloud, and new funding.

magic.dev/blog/100m-token-context-windows

🌸 Podcasts

Love MusingsOnAI? Tell your friends!

📮Want to Advertise with us?

If your company is interested in reaching an audience of AI professionals and decision-makers, reach us.

Musings on AI | Passionfruit

Let's work together! Visit my Passionfroot page for more info and to book your slot.

www.passionfroot.me/musings-on-ai

If you have any comments or feedback, just respond to this email!

Thanks for reading, Let’s explore the world together!

Raahul

Reply

or to participate.