Show HN: I built a desktop app combining Claude, GPT, Gemini with local Ollama
1 points
2 hours ago
| 1 comment
| github.com
| HN
I built a desktop app (PyQt6, Windows) that orchestrates multiple AI models in a 3-phase pipeline:

Phase 1 – A cloud LLM (Claude/GPT/Gemini) decomposes the prompt into structured sub-tasks Phase 2 – Local Ollama models process each sub-task (free, private, runs on your GPU) Phase 3 – The cloud LLM integrates the results into a coherent final answer

The motivation: cloud APIs are great at reasoning and structure but cost money. Local Ollama models are free but sometimes inconsistent. The pipeline lets you use each where it's strongest.

Also includes: - FastAPI + React web UI (accessible from LAN/mobile) - SQLite chat history - ChromaDB-based RAG - Discord webhook notifications

Stack: Python, PyQt6, FastAPI, React, Ollama, Anthropic/OpenAI/Google APIs. MIT license.

tsunamayo
1 minute ago
[-]
Some questions I'm anticipating:

*"How is this different from Open WebUI / AnythingLLM?"*

Both are excellent tools, but they're manual model selectors — you pick one model and chat. Helix automates the routing: the cloud model never sees your raw data during Phase 2, and local models never need to handle planning/synthesis. The pipeline runs with a single button press.

*"Why PyQt6 desktop instead of Electron or a pure web app?"*

The desktop shell is intentional. The WebSocket server and React UI are embedded, so you get LAN access from phones/tablets on the same network without any Docker or separate server setup. The desktop app is the server. This lets it work offline (for local-only mode) while still being accessible network-wide.

*"Do I need Ollama / a GPU?"*

No. You can run it in cloudAI-only mode (direct chat with Claude/GPT/Gemini). Ollama and a GPU are only needed for the mixAI pipeline's Phase 2. A mid-range GPU (8-12GB VRAM) handles 7-14B models fine for most tasks.

*"What's the catch?"*

Windows-only for now (PyQt6 + some Windows-specific paths). Phase 2 quality depends on your local model selection — a 4B model won't match a 27B one. And you still pay for 2 cloud API calls per pipeline run, which adds up if you run hundreds of queries.

Happy to answer other questions.

reply