Over the past few months, I've been working on building Omni - a workplace search and chat platform that connects to apps like Google Drive/Gmail, Slack, Confluence, etc. Essentially an open-source alternative to Glean, fully self-hosted.
I noticed that some orgs find Glean to be expensive and not very extensible. I wanted to build something that small to mid-size teams could run themselves, so I decided to build it all on Postgres (ParadeDB to be precise) and pgvector. No Elasticsearch, or dedicated vector databases. I figured Postgres is more than capable of handling the level of scale required.
To bring up Omni on your own infra, all it takes is a single `docker compose up`, and some basic configuration to connect your apps and LLMs.
What it does:
- Syncs data from all connected apps and builds a BM25 index (ParadeDB) and HNSW vector index (pgvector)
- Hybrid search combines results from both
- Chat UI where the LLM has tools to search the index - not just basic RAG
- Traditional search UI
- Users bring their own LLM provider (OpenAI/Anthropic/Gemini)
- Connectors for Google Workspace, Slack, Confluence, Jira, HubSpot, and more
- Connector SDK to build your own custom connectors
Omni is in beta right now, and I'd love your feedback, especially on the following:
- Has anyone tried self-hosting workplace search and/or AI tools, and what was your experience like?
- Any concerns with the Postgres-only approach at larger scales?
Happy to answer any questions!
The code: https://github.com/getomnico/omni (Apache 2.0 licensed)
* "Bring Your Own LLM: Anthropic, OpenAI, Gemini, or open-weight models via vLLM."
With so many newbies wanting these kinds of services it might be worth adjusting the first bullet to say: "No data leaves your network, at least as long as you don't use any Anthropic, OpenAI, or Gemini models via the network of course"
Does each user do their own auth and the ingest runs for each user using stored user creds, perhaps deduplicating the data in the index, but storing permissions metadata for query time filtering?
Or is there a single "team" level integration credential that indexes everything in the workspace and separately builds a permissions model based on the ACLs from the source system API?
In general, the goal is to use an org-wide installation method wherever possible, and record the identify of the user we are impersonating when ingesting data in the ACL. There are some gaps in the permission-gathering step in some of the connectors, I'm still working on fixing those.
The part that's easy to overlook: your search index is transactionally consistent with everything else. No stale results because some background sync job fell over at 3am.
With 3000+ schemas I'd keep an eye on GIN index bloat. The per-index overhead across that many schemas adds up and autovac has trouble keeping pace.
- Their rebranded Onyx launch: https://news.ycombinator.com/item?id=46045987
- Their orignal Danswer launch: https://news.ycombinator.com/item?id=36667374
I also started to build something similar for us, as an PoC/alternative to Glean. I'm curious how you handle data isolation, where each user has access to just the messages in their own Slack channels, or Jira tickets from only workspaces they have access to? Managing user mapping was also super painful in AWS Q for Business.
Currently permissions are handled in the app layer - it's simply a WHERE clause filter that restricts access to only those records that the user has read permissions for in the source. But I plan to upgrade this to use RLS in Postgres eventually.
For Slack specifically, right now the connector only indexes public channels. For private channels, I'm still working on full permission inheritance - capturing all channel members, and giving them read permissions to messages indexed from that channel. It's a bit challenging because channel members can change over time, and you'll have to keep permissions updated in real-time.
I haven't directly compared against Elasticsearch yet, but I plan to do that next and publish some numbers. There's a benchmark harness setup already: https://github.com/getomnico/omni/tree/master/benchmarks, but there's a couple issues with it right now that I need to address first before I do a large scale run (the ParadeDB index settings need some tuning).
I started parsing its system logs to create entries in our system automatically to book my times - just not todeal with their silly REST api requirements.
Typical RAG implementations I’ve seen take the user query and directly run it against the full-text search and embedding indexes. This produces sub-par results because the query embedding doesn’t really capture fully what the user is really looking for.
A better solution is to send the user query to the LLM, and let it construct and run queries against the index via tool calling. Nothing too ground-breaking tbh, pretty much every AI search agent does this now. But it produces much better results.