FilterHN

Show HN: ModelAtlas – Find AI models that HuggingFace search can't

1 points

by Vinaik

1 hour ago

| past

| 1 comment

| github.com

| HN

▲

Vinaik

1 hour ago

[-]

I built this because I kept having the same experience: I knew the kind of model I wanted, but HuggingFace couldn't express my query.

You search for "small code model with tool-calling." HF gives you Qwen2.5-Coder-32B and a 480B model. Sorted by popularity. It can't express "small" as a direction — it can only filter by tags and sort by likes.

ModelAtlas treats model space as a coordinate system. Eight signed dimensions (architecture, capability, efficiency, domain, etc.) plus 166 semantic anchors. You query by direction: "I want something small (+efficiency), code-focused (+capability), with tool-calling." The scoring is multiplicative — a model that nails efficiency but misses capability gets zero, not fifty percent.

The README has a three-level comparison against HuggingFace search (same queries, both systems, real results):

--Level 1: Common queries — both systems return the right models. This is the baseline.

--Level 2: Directional queries ("small," "fast," "medical classifier") — HF starts returning noise. "Fast embedding model" returns nothing because "fast" isn't a tag.

--Level 3: Multi-constraint queries ("multilingual chat, NOT code/math") — HF can't express these at all. ModelAtlas finds a 100M-parameter TTS model, a genomics classifier, a 0.8B model distilled from Claude Opus. All real models on HuggingFace, all invisible to keyword search.

It's an MCP tool — works inside Claude Code, Cursor, VS Code, any MCP client. The idea is to give a model that calls the tool a kind of subconscious vibe about what's out there on HuggingFace. Instead of guessing from stale training data or making you browse HF yourself, the LLM gets structured awareness of ~30K models in one tool call — ~500 tokens, <100ms. It just knows what model fits your use case.

The network ships as a pre-built SQLite file via GitHub Releases. It covers the community-validated portion of HuggingFace (models with 10+ likes) and is mostly up to date — this is a fun personal project so it's not always updated to the minute, but it's MIT licensed so anyone can run the ingestion pipeline themselves or fork it however they want.

Repo: https://github.com/rohanvinaik/ModelAtlas

Happy to answer questions about the scoring math, the extraction pipeline, or anything else!