Day-to-day Japanese is fine for me. But neighborhood meetings were a completely different level.
People speak fast. There's local dialect. Someone references a flood from 1987, a land boundary dispute from 1994, and three people I've never met but everyone else knows. I would walk out feeling like I understood maybe 5% of what happened.
So I built a tool for myself to help follow those conversations.
Live Kaiwa listens to Japanese speech and, in real time, shows:
* Japanese transcription * English translation * a running summary of what's being discussed * suggested responses you can say back
The idea is to help you stay oriented in complex conversations.
You can try it here: https://livekaiwa.com
---
How it works
When you start a session, the browser microphone captures the conversation and streams audio.
The pipeline looks roughly like this:
1. Audio streaming - Browser microphone → WebRTC → server
2. Speech to text - Kotoba Whisper runs a fast first pass transcription.
3. Multi-pass correction - Buffered audio is re-transcribed with higher accuracy and replaces earlier text.
4. LLM processing - Each batch of transcript is sent to an LLM that generates: English translations, summary bullets, and suggested replies (with TTS)
5. Live UI updates - Everything streams back to the browser in (mostly) real time.
Session data stays in the browser, nothing is stored server-side.
Why I built it, in short: even if you speak Japanese reasonably well, fast, multi-person discussions can become overwhelming. Seeing the conversation transcribed and summarized helps.