Phase 1 – A cloud LLM (Claude/GPT/Gemini) decomposes the prompt into structured sub-tasks Phase 2 – Local Ollama models process each sub-task (free, private, runs on your GPU) Phase 3 – The cloud LLM integrates the results into a coherent final answer
The motivation: cloud APIs are great at reasoning and structure but cost money. Local Ollama models are free but sometimes inconsistent. The pipeline lets you use each where it's strongest.
Also includes: - FastAPI + React web UI (accessible from LAN/mobile) - SQLite chat history - ChromaDB-based RAG - Discord webhook notifications
Stack: Python, PyQt6, FastAPI, React, Ollama, Anthropic/OpenAI/Google APIs. MIT license.
*"How is this different from Open WebUI / AnythingLLM?"*
Both are excellent tools, but they're manual model selectors — you pick one model and chat. Helix automates the routing: the cloud model never sees your raw data during Phase 2, and local models never need to handle planning/synthesis. The pipeline runs with a single button press.
*"Why PyQt6 desktop instead of Electron or a pure web app?"*
The desktop shell is intentional. The WebSocket server and React UI are embedded, so you get LAN access from phones/tablets on the same network without any Docker or separate server setup. The desktop app is the server. This lets it work offline (for local-only mode) while still being accessible network-wide.
*"Do I need Ollama / a GPU?"*
No. You can run it in cloudAI-only mode (direct chat with Claude/GPT/Gemini). Ollama and a GPU are only needed for the mixAI pipeline's Phase 2. A mid-range GPU (8-12GB VRAM) handles 7-14B models fine for most tasks.
*"What's the catch?"*
Windows-only for now (PyQt6 + some Windows-specific paths). Phase 2 quality depends on your local model selection — a 4B model won't match a 27B one. And you still pay for 2 cloud API calls per pipeline run, which adds up if you run hundreds of queries.
Happy to answer other questions.