FilterHN

Show HN: openai-realtime-embedded-SDK Build AI assistants on microcontrollers

51 points

3 days ago

| 4 comments

Hi HN! This is an SDK for ESP32s (microcontrollers) that runs against OpenAI's new WebRTC service [0] My hope is that people can easily add AI to lots of 'real' devices. Wearable devices, speakers around the house, toys etc... You don't have to write any code, just buy a device and set some env variables.

If you have any feedback/questions I would love to hear! I hope this kicks off a generation of new interesting devices. If you aren't familiar with WebRTC it can do some magical things. Check out WebRTC for the Curious[1] and would love to talk about all the cool things that does also.

[0] https://platform.openai.com/docs/guides/realtime-webrtc

[1] https://webrtcforthecurious.com

▲

kaycebasques

19 hours ago

[-]

Took a bit of poking to figure out what the use case is. Doesn't seem to be mentioned in the README (usage section is empty) or the intro above. Looks like the main use case is speech-to-speech. Which makes sense since we're talking about embedded products, and text-to-speech (for example) wouldn't usually be relevant (because most embedded products don't have a keyboard interface). Congrats on the launch! Cool to see WebRTC applied to embedded space. Streaming speech-to-speech with WebRTC could make a lot of sense.

▲

Sean-Der

18 hours ago

[-]

Sorry I forgot to put use cases in! Here are the ones I am excited about.

* Making a toy. I have had a lot of fun putting a silly/sarcastic voice in toys. My 4 year old thinks it is VERY funny.

* Smart Speaker/Assistant. I want to put one in each room. If I am in the kitchen it has a prompt to assist with recipes.

I have A LOT more in the future I want to do. The microcontrollers I was using can't do video yet BUT ESP32 does have newer ones that can. When I pull that I can do smart cameras, then it gets really fun :)

▲

kaycebasques

18 hours ago

[-]

"Use case" perhaps wasn't the right word for me to use. Maybe "applications" would have been a better word. What this enables is speech-to-speech applications in embedded devices. (From my quick scan) it doesn't seem to do anything around other ML applications that OpenAI could potentially be involved in, such as speech-to-text, text-to-speech, or computer vision.

But yeah, once I figured out that this enables streaming speech-to-speech applications on embedded devices, then it's easy to think up use cases.

▲

swatcoder

16 hours ago

[-]

It doesn't help that this was posted to HN with the "Usages" section of the README left blank. That alone would probably have addressed your question. The submission is just a little prematue.

Beyond that, while it does seem like its primarily vision is for speech-to-speech interfaces, it could easily be stretched to do things like send a templatized text prompt that was constructed based on toggle states, sensor readings, etc and (optimistically) asking for a structured response that could control lights or servos or whatever.

Generally, this looks like a very early stage in a hobby project (the code practices fall short of my expectations for good embedded work, being presented as a library would be better than as an application, the README needs lots of work, etc), but something more sophisticated isn't too far out of reach.

▲

Sean-Der

14 hours ago

[-]

I will work on making it better! This was announced Tuesday [0] I still need to give it lots of love.

Even though the README isn’t completely done, give it a chance I bet you can have fun with it :)

[0] https://youtu.be/14leJ1fg4Pw?t=625&si=aqHm1UAdDEz91TnD

▲

jonathan-adly

18 hours ago

[-]

Here is a nice use-case. Put this in a pharmacy - have people hit a button, and ask questions about over-the-counter medications.

Really - any physical place where people are easily overwhelmed, have something like that would be really nice.

With some work - you can probably even run RAG on the questions and answer esoteric things like where the food court in an airport or the ATM in a hotel.

▲

swatcoder

16 hours ago

[-]

> Put this in a pharmacy - have people hit a button, and ask questions about over-the-counter medications.

Even if you trust OpenAI's models more than your trained, certified, and insured pharmacist -- the pharmacists, their regulators, and their insurers sure won't!

They've got a century of sunk costs to consider (and maybe even some valid concern over the answers a model might give on their behalf...)

Don't be expecting anything like that in an traditional regulated medical setting any time soon.

▲

dymk

15 hours ago

[-]

The last few doctors appointments I’ve had, the clinician used a service to record and summarize the visit. It was using some sort of TTS and LLM to do so. It’s already in medical settings.

▲

swatcoder

15 hours ago

[-]

Transcription and summary is a vastly different thing than providing medical advice to patients.

▲

pixelsort

17 hours ago

[-]

Thanks for digging that out. Yes, that makes sense to me as someone who made a fully local speech-2-speech prototype with Electron, including VAD and AEC. It was responsive but taxing. I had to use a mix of specialty models over onnx/wasm in the renderer and llama.cpp in the main process. One day, multimodal model will just do it all.

▲

roland35

13 hours ago

[-]

Favorited and starred! I wonder if the real power of this could be in integrating large low cost sensor networks? I think with things like video and audio it might make more sense to bump up to a single board Linux board - but maybe the AI could help parse or create notifications based on sensor readings, and push back events to the real world (lights, solenoids, etc)

I think it would help to either have a freertos example, or if you want to go real crazy create a zephyr integration! It would be a lot of fun to work on AI and microcontroller combination - what a cool niche!

▲

Sean-Der

13 hours ago

[-]

I’m very curious about what a LLM could deduce if you sent in lots of sensor data.

I love my Airthings. It don’t know if it’s actionable, but it would be cool to see what conclusions would come up from sending co2 and radon readings in. Could make understanding your home a lot easirr

▲

johanam

19 hours ago

[-]

Love this! Excited to give it a try.

▲

Sean-Der

18 hours ago

[-]

Thank you! If you run into problems shoot me a message. I really want to make this easy enough for everyone to build with it.

I have talked with incredibly creative developers that are hampered by domain knowledge requirements. I hope to see an explosion of cool projects if we get this right :)