“Oh but they only run on local hardware…”
Okay, but that doesn't mean every aspect of our lives needs to be recorded and analyzed by an AI.
Are you okay with private and intimate conversations and moments (including of underage family members) being saved for replaying later?
Have all your guests consented to this?
What happens when someone breaks in and steals the box?
What if the government wants to take a look at the data in there and serves a warrant?
What if a large company comes knocking and makes an acquistion offer? Will all the privacy guarantees still stand in face of the $$$ ?
The non privacy-conscious will just use Google/etc.
My response was no I don't get any of that because I disable that technology since it is always listening and can never be trusted. There is no privacy in those services.
They did not like that response.
Typically not how these things work. Speech is processed using ASR (automatic speech recognition), and then ran through a prompt that checks for appropriate tools calls.
I've been meaning to basically make this myself but I've been too lazy lately to bother.
I actually want a lot more functionality from a local only AI machine, I believe the paradigm is absurdly powerful.
Imagine an AI reminding you that you've been on HN too long and offering to save off the comment your working on for later and then moving they browser window to a different tab.
Having idle thoughts in the car of things you need to do and being able to just say them out loud and know important topics won't be forgotten about.
I understand for people who aren't neurodiverse that the idea of just forgetting to do something that is incredibly critical to ones health and well-being isn't something that happens (often) but for plenty of other people a device that just helps people remember important things can be dramatically life changing.
> Having idle thoughts in the car of things you need to do and being able to just say them out loud and know important topics won't be forgotten about.
> I understand for people who aren't neurodiverse that the idea of just forgetting to do something that is incredibly critical to ones health and well-being isn't something that happens (often) but for plenty of other people a device that just helps people remember important things can be dramatically life changing.
Those don't sound like things that you need AI for.
I do sometimes wish it would be seen as an enlightened policy to legislate that personal private information held in technical devices is legally treated the same as information held in your brain. Especially for people for whom assistive technology is essential (deaf, blind, etc). But everything we see says the wind is blowing the opposite way.
Some of our decisions in this direction:
- Minimize how long we have "raw data" in memory
- Tune the memory extraction to be very discriminating and err on the side of forgetting (https://juno-labs.com/blogs/building-memory-for-an-always-on-ai-that-listens-to-your-kitchen)
- Encrypt storage with hardware protected keys (we're building on top of the Nvidia Jetson SOM)
We're always open to criticism on how to improve our implementation around this.My problem is Siri doesn't do any of this stuff well. I'd really love to just get it out of the way so someone can build it better.
One of our core architecture decisions was to use a streaming speech-to-text model. At any given time about 80ms of actual audio is in memory and about 5 minutes of transcribed audio (text) is in memory (this is help the STT model know the context of the audio for higher transcription accuracy).
Of these 5 minute transcripts, those that don't become memories are forgotten. So only selected extracted memories are durably stored. Currently we store the transcript with the memory (this was a request from our prototype users to help them build confidence in the transcription accuracy) but we'll continue to iterate based on feedback if this is the correct decision.
Also agree with paxys that the social implications here are deep and troubling. Having ambient AI in a home, even if it's caged to the home, has tricky privacy problems.
I really like the explorations of this space done in Black Mirror's The Entire History of You[1] and Ted Chiang's The Truth of Fact short story[2].
My bet is that the home and other private spaces almost completely yield to computer surveillance, despite the obvious problems. We've already seen this happen with social media and home surveillance cameras.
Just as in Chiang's story spaces were 'invaded' by writing, AI will fill the world and those opting out will occupy the same marginal positions as those occupied by dumb phone users and people without home cameras or televisions.
Interesting times ahead.
1. https://en.wikipedia.org/wiki/The_Entire_History_of_You 2. https://en.wikipedia.org/wiki/The_Truth_of_Fact,_the_Truth_o...
Friends at your house who value their privacy probably won’t feel great knowing you’ve potentially got a transcript of things they said just because they were in the room. Sure, it's still better than also sending everything up to OpenAI, but that doesn’t make it harmless or less creepy.
Unless you’ve got super-reliable speaker diarization and can truly ensure only opted-in voices are processed, it’s hard to see how any always-listening setup ever sits well with people who value their privacy.
This is something we call out under the "What we got wrong" section. We're currently collecting an audio dataset that should help create a speech-to-text (STT) model that incorporates speaker identification and that tag will be weaved into the core of the memory architecture.
> The shared household memory pool creates privacy situations we’re still working through. The current design has everyone in the family shares the same memory corpus. Should a child be able to see a memory their parents created? Our current answer is to deliberately tune the memory extraction to be household-wide with no per-person scoping because a kitchen device hears everyone equally. But “deliberately chose” doesn’t mean “solved.” We’re hoping our in-house STT will allow us to do per-person memory tagging and then we can experiment with scoping memories to certain people or groups of people in the household.
I wrote a blog post about this exact product space a year ago. https://meanderingthoughts.hashnode.dev/lets-do-some-actual-...
I hope y'all succeed! The potential use cases for locally hosted AI dwarf what can be done with SaSS.
I hope the memory crisis isn't hurting you too badly.
Feel free to reach out. Would love to swap notes and send you a prototype.
> I hope the memory crisis isn't hurting you too badly.
Oh man, we've had to really track our bill of materials (BOM) and average selling price (ASP) estimates to make sure everything stays feasible. Thankfully these models quantize well and the size-to-intelligence frontier is moving out all the time.
Apple? [1]
For once,we (as the technologists) have a free translator to laymen speak via the frontier LLMs, which can be an opportunity to educate the masses as to the exact world on the horizon.
It is actually both a technology and regulation/law issue.
What can be solved with the former should be. What is left, solved with the latter. With the best cases where both consistently/redundantly uphold our rights.
I want legal privacy protections, consistent with privacy preserving technology. Inconsistencies create technical and legal openings for nefarious or irresponsible powers.
This is like a shitty Disney movie.
I have little hope that is true. Don't expect privacy laws and boycott campaigns. That very same elite control the law via bribes to US politicians (and indirectly the laws of other counties via those politicians threats, see the ongoing watering down of EU laws). They also directly control public discourse via ownership of the media and mainstream communication platforms. What backlash can they really suffer?
Is your argument that these affected parties are not users and that the GDPR does not require their consent?
Don't take this as hostility. I am 100% for local inference. But that is the way I understand the law, and I do think it benefits us to hold companies to a high standard. Because even such a device could theoretically be used against a person, or could have other unintended consequences.
If there's a camera in an AI device (like Meta Ray Ban glasses) then there's a light when it's on, and they are going out of their way to engineer it to be tamper resistant.
But audio - this seems to be on the other side of the line. Passively listening ambient audio is being treated as something that doesn't need active consent, flashing lights or other privacy preserving measures. And it's true, it's fundamentally different, because I have to make a proactive choice to speak, but I can't avoid being visible. So you can construct a logical argument for it.
I'm curious how this will really go down as these become pervasively available. Microphones are pretty easy to embed almost invisibly into wearables. A lot of them already have them. They don't use a lot of power, it won't be too hard to just have them always on. If we settle on this as the line, what's it going to mean that everything you say, everywhere will be presumed recorded? Is that OK?
That’s not accurate. There are plenty of states that require everyone involved to consent to a recording of a private conversation. California, for example.
Voice assistants today skirt around that because of the wake word, but always-on recording obviously negates that defense.