• The Musings On AI
  • Posts
  • 🌻 E46: Next-Gen Agent or a True Leap in AI Consciousness?

🌻 E46: Next-Gen Agent or a True Leap in AI Consciousness?

Happy 11/11

In partnership with

Developers will tell you: AI agents are still too early. They’re expensive, unpredictable, and sometimes unreliable. They’re not wrong - The developers are feeling the heat in the production.

But there’s a spark of something promising in recent research. A glimpse of the future. And it’s suggesting that the future will be AGENTIC in next 1-2 years.

Today, let’s dive into a paper from Stanford University titled “ALIGNING AI AGENTS VIA INFORMATION-DIRECTED SAMPLING”

Why?

AI alignment is a critical challenge in developing superintelligent agents. The goal? Ensuring these agents align with human values and interests.

Human values are different, human interests are different, and Designing an agent that aligns perfectly with all human preferences? Nearly impossible.

Think about a scenario - you are in hospital, and there are some robotic agents are welcoming you.

The AI-powered agents analyze your clinical data, past health records, maybe even credit details. They’re efficient. But then, a human nurse approaches, asking how you’re feeling today, bringing a sense of warmth and understanding that’s hard to replace.

To ensure AI agents make decisions that truly serve us, they must integrate human preferences into their actions.But here’s the catch - Many real world problems are uncertain and partially observable.

Enter Stanford’s New Approach.

A novel approach to alignment - as published by Stanford Univ (see reference below) - applying a class of bandit alignment problems, where an AI agent must balance exploration of an environment with querying human preferences to maximize long-term rewards.

In traditional alignment, methods like Thompson Sampling or “explore-then-exploit” strategies attempt to manage this balance. But these often fall short in dynamic environments, accumulating high regret (in other words, lots of missed opportunities to learn and adapt).

Stanford’s solution? Information-Directed Sampling (IDS).

IDS takes alignment to a new level by minimizing regret and optimizing learning. It’s an algorithm that chooses actions based on two goals: maximizing knowledge gain while staying reward-focused. The magic lies in its information ratio—calculating each step to weigh immediate benefits against potential long-term knowledge.

This allows IDS to handle alignment with sublinear regret, meaning it improves continuously without sacrificing alignment with human goals.

Tests show IDS consistently outperforms traditional methods, making it a scalable, efficient approach for aligning AI in uncertain, ever-changing environments.

Am excited - want to see it in the prod soon.

Ready to Level up your work with AI?

HubSpot’s free guide to using ChatGPT at work is your new cheat code to go from working hard to hardly working

HubSpot’s guide will teach you:

  • How to prompt like a pro

  • How to integrate AI in your personal workflow

  • Over 100+ useful prompt ideas

All in order to help you unleash the power of AI for a more efficient, impactful professional life.

🌸 From The Agent Community

In this survey, researchers have consolidated cutting-edge work on (M)LLM-based GUI agents, focusing on three main areas:

  • Datasets and Benchmarks: First, they dive into the key datasets and benchmarks that serve as the foundation for training and evaluating these agents. These datasets allow agents to learn GUI interaction patterns and user expectations, creating a baseline for improvement.

  • Frameworks and Taxonomy: Next, the survey presents a unified framework, capturing the essential components that researchers use across studies. A detailed taxonomy further categorizes these components, mapping out the variety of approaches and helping future research build on a shared understanding.

  • Commercial Applications: The practical impact of these advances can already be seen in industries ranging from customer service automation to software testing and beyond. Companies are using (M)LLM-based agents to handle user instructions and perform tasks in real time, reducing the need for manual intervention.

🌸Choice Cuts

🌼A new adaptive gradient method named ADOPT.

As AI agents become more common, we could see a shift toward a “dual-use web” where websites serve both humans and bots. While websites remain essential for human users due to the effectiveness of visual UIs, AI agents might increasingly interact directly with data through APIs or automated browser interfaces.

Companies may prioritize their own web interfaces over accommodating third-party AI assistants, leading to a blend of human-friendly UIs and agent-friendly APIs that support both types of users in parallel.

🌼 Hugging Face introduced the fully open sourced small high performance SmolLM2 language model.SmolLM2 pushes the limits for language models under 2B parameters with three optimized sizes: 135M, 360M, and 1.7B parameters.

The future of on-device and in-browser models is open, and it's incredibly exciting!

🌼 The developers are looking for a stable common chunking library, and we found it.

🌸 Podcasts

There’s a lot more I could write about but I figure very few people will read this far anyways. If you did, you’re amazing and I appreciate you!

Love MusingsOnAI? Tell your friends!

If your company is interested in reaching an audience of AI professionals and decision-makers, reach us.

If you have any comments or feedback, just respond to this email!

Thanks for reading, Let’s explore the world together!

Raahul

Reply

or to participate.