- The Musings On AI
- Posts
- š» E41: Open AI Dev Day Edition
š» E41: Open AI Dev Day Edition
Smaller models will be the future.
š¼ Real Time API
Developers can now build fast speech-to-speech experiences into their applications
Photo Courtesy: Open AI Community
OpenAI has just introduced its Realtime API, now in public beta, allowing developers to build low-latency, multimodal experiencesāparticularly in speech-to-speech applications.
What does this mean? You can now integrate ChatGPTās voice controls directly into apps, enabling real-time, natural conversations. A perfect use case for call centers.
OpenAI showed a demo on a product called Wanderlust, a travel planning app originally showcased last year.
With the Realtime API, you can chat with the app, plan trips by speaking naturally, and even interrupt mid-sentence, creating a conversational flow that mirrors human dialogue.
But travel planning is just the tip of the iceberg. The Realtime API opens doors to a wide range of applicationsāfrom customer service to education and accessibility tools. Imagine voice-controlled apps that respond instantly and feel more like a conversation than a command.
āWe focus on both startups and enterprises,ā - OpenAI
Now, while the API isnāt exactly cheap
$0.06 per minute for audio input
$0.24 per minute for audio output
The Graphic Design and Call Center industries are already feeling the impact of Large Language Models (LLMs), and which now will be next, Legal?
Writer RAG tool: build production-ready RAG apps in minutes
Writer RAG Tool: build production-ready RAG apps in minutes with simple API calls.
Knowledge Graph integration for intelligent data retrieval and AI-powered interactions.
Streamlined full-stack platform eliminates complex setups for scalable, accurate AI workflows.
š¼ Vision Fine Tune
This feature enables developers to tailor the modelās visual understanding capabilities using both images and text, opening up exciting new possibilities for a variety of industries.
Images In - Images Out.
{
"messages": [
{ "role": "system", "content": "You are an assistant that identifies uncommon cheeses." },
{ "role": "user", "content": "What is this cheese?" },
{ "role": "user", "content": [
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/3/36/Danbo_Cheese.jpg"
}
}
]
},
{ "role": "assistant", "content": "Danbo" }
]
}
It sounds like youāre looking for an easy-to-use solution where users can input URLs of images or videos from the internet, and then engage with the content using a question set in a conversational manner.
Autonomous vehicles, medical imaging, and visual search functionalityāall of which rely heavily on precise visual data interpretation.
One standout early adopter is Grab, a leading Southeast Asian food delivery and rideshare company. Using vision fine-tuning, Grab was able to significantly improve its mapping services. With just 100 examples, the company saw a 20% increase in lane count accuracy and a 13% boost in speed limit sign localization. These impressive results demonstrate how small batches of visual training data can lead to dramatic improvements in AI-powered systems.
Absolutely! A flock of UI agents using vision fine-tuning and other features like retrieving information from websites could automate and streamline workflows in unprecedented ways.
Price is also cheap -
1 M Tokens for training will be free till 31st October, 2024 to fine tune gpt4o with images.
After- that $25 per 1 M tokens to fine tune.
Inference:
$3.75 per 1M input tokens.
$15 per 1M output tokens
š¼ Prompt Caching
Claude introduced prompt caching, claiming it can reduce API costs by up to 90% and improve latency by 80%.
Google has also recently launched it in the last Google IO.
And it was expected from OpenAI also.
This system automatically applies a 50% discount on input tokens that the model has recently processed, potentially leading to substantial savings for applications that frequently reuse context.
āWeāve been pretty busy,ā said Olivier Godement, OpenAIās head of product for the platform, at a small press conference at the companyās San Francisco headquarters kicking off the developer conference. āJust two years ago, GPT-3 was winning. Now, weāve reduced [those] costs by almost 1000x. I was trying to come up with an example of technologies who reduced their costs by almost 1000x in two yearsāand I cannot come up with an example.ā
š¼ Model Distillation
Have you noticed a pattern lately with how models are being rolled out? OpenAI often starts with a larger model, like GPT-4, and then follows up with a smaller version, like GPT-4-mini.
They did something similar with the āo1-previewā model, and Meta with LLaMA followed suitālaunching a higher-end LLaMA 3.1 model, then a smaller one like the 7B.
The logic behind this is simple: smaller models are cheaper to manage, require less computational power, and offer faster latency. But thereās a trade-off: performance. The smaller models often donāt perform as well as their larger counterparts.
The intuition is that smaller models have lower management costs, require less to operate, and offer faster latency, though their performance wonāt match that of the larger models.
In this approach, you can generate completions from larger models like o1-preview or GPT-4, and then fine-tune smaller models, such as GPT-4-mini, using those completions.
The expectation is that the smaller model will perform similarly to the larger model for the specific task.
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "what's the capital of the USA?"
}
]
}
],
store=True,
metadata={"username": "user123", "user_id": "123", "session_id": "123"}
šø Podcasts
Thereās a lot more I could write about but I figure very few people will read this far anyways. If you did, youāre amazing and I appreciate you!
Love MusingsOnAI? Tell your friends!
If your company is interested in reaching an audience of AI professionals and decision-makers, reach us.
If you have any comments or feedback, just respond to this email!
Thanks for reading, Letās explore the world together!
Raahul
Reply