šŸŒ» E41: Open AI Dev Day Edition

Smaller models will be the future.

In partnership with

šŸŒ¼ Real Time API

Developers can now build fast speech-to-speech experiences into their applications

Photo Courtesy: Open AI Community

OpenAI has just introduced its Realtime API, now in public beta, allowing developers to build low-latency, multimodal experiencesā€”particularly in speech-to-speech applications.

What does this mean? You can now integrate ChatGPTā€™s voice controls directly into apps, enabling real-time, natural conversations. A perfect use case for call centers.

OpenAI showed a demo on a product called Wanderlust, a travel planning app originally showcased last year.

With the Realtime API, you can chat with the app, plan trips by speaking naturally, and even interrupt mid-sentence, creating a conversational flow that mirrors human dialogue.

But travel planning is just the tip of the iceberg. The Realtime API opens doors to a wide range of applicationsā€”from customer service to education and accessibility tools. Imagine voice-controlled apps that respond instantly and feel more like a conversation than a command.

ā€œWe focus on both startups and enterprises,ā€ - OpenAI

Now, while the API isnā€™t exactly cheap

  • $0.06 per minute for audio input

  • $0.24 per minute for audio output

The Graphic Design and Call Center industries are already feeling the impact of Large Language Models (LLMs), and which now will be next, Legal?

Writer RAG tool: build production-ready RAG apps in minutes

  • Writer RAG Tool: build production-ready RAG apps in minutes with simple API calls.

  • Knowledge Graph integration for intelligent data retrieval and AI-powered interactions.

  • Streamlined full-stack platform eliminates complex setups for scalable, accurate AI workflows.

šŸŒ¼ Vision Fine Tune

This feature enables developers to tailor the modelā€™s visual understanding capabilities using both images and text, opening up exciting new possibilities for a variety of industries.

Images In - Images Out.

{
  "messages": [
    { "role": "system", "content": "You are an assistant that identifies uncommon cheeses." },
    { "role": "user", "content": "What is this cheese?" },
    { "role": "user", "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/3/36/Danbo_Cheese.jpg"
          }
        }
      ] 
    },
    { "role": "assistant", "content": "Danbo" }
  ]
}

It sounds like youā€™re looking for an easy-to-use solution where users can input URLs of images or videos from the internet, and then engage with the content using a question set in a conversational manner.

Autonomous vehicles, medical imaging, and visual search functionalityā€”all of which rely heavily on precise visual data interpretation.

One standout early adopter is Grab, a leading Southeast Asian food delivery and rideshare company. Using vision fine-tuning, Grab was able to significantly improve its mapping services. With just 100 examples, the company saw a 20% increase in lane count accuracy and a 13% boost in speed limit sign localization. These impressive results demonstrate how small batches of visual training data can lead to dramatic improvements in AI-powered systems.

Absolutely! A flock of UI agents using vision fine-tuning and other features like retrieving information from websites could automate and streamline workflows in unprecedented ways.

Price is also cheap -

  • 1 M Tokens for training will be free till 31st October, 2024 to fine tune gpt4o with images.

    • After- that $25 per 1 M tokens to fine tune.

    • Inference:

      • $3.75 per 1M input tokens.

      • $15 per 1M output tokens

šŸŒ¼ Prompt Caching

Claude introduced prompt caching, claiming it can reduce API costs by up to 90% and improve latency by 80%.

Google has also recently launched it in the last Google IO.

And it was expected from OpenAI also.

This system automatically applies a 50% discount on input tokens that the model has recently processed, potentially leading to substantial savings for applications that frequently reuse context.

ā€œWeā€™ve been pretty busy,ā€ said Olivier Godement, OpenAIā€™s head of product for the platform, at a small press conference at the companyā€™s San Francisco headquarters kicking off the developer conference. ā€œJust two years ago, GPT-3 was winning. Now, weā€™ve reduced [those] costs by almost 1000x. I was trying to come up with an example of technologies who reduced their costs by almost 1000x in two yearsā€”and I cannot come up with an example.ā€

šŸŒ¼ Model Distillation

Have you noticed a pattern lately with how models are being rolled out? OpenAI often starts with a larger model, like GPT-4, and then follows up with a smaller version, like GPT-4-mini.

They did something similar with the ā€˜o1-previewā€™ model, and Meta with LLaMA followed suitā€”launching a higher-end LLaMA 3.1 model, then a smaller one like the 7B.

The logic behind this is simple: smaller models are cheaper to manage, require less computational power, and offer faster latency. But thereā€™s a trade-off: performance. The smaller models often donā€™t perform as well as their larger counterparts.

The intuition is that smaller models have lower management costs, require less to operate, and offer faster latency, though their performance wonā€™t match that of the larger models.

In this approach, you can generate completions from larger models like o1-preview or GPT-4, and then fine-tune smaller models, such as GPT-4-mini, using those completions.

The expectation is that the smaller model will perform similarly to the larger model for the specific task.

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "what's the capital of the USA?"
        }
      ]
    }
  ],
  store=True,
  metadata={"username": "user123", "user_id": "123", "session_id": "123"}

šŸŒø Podcasts

Thereā€™s a lot more I could write about but I figure very few people will read this far anyways. If you did, youā€™re amazing and I appreciate you!

Love MusingsOnAI? Tell your friends!

If your company is interested in reaching an audience of AI professionals and decision-makers, reach us.

If you have any comments or feedback, just respond to this email!

Thanks for reading, Letā€™s explore the world together!

Raahul

Reply

or to participate.