Liquid AI reveals 8B-A1B MoE trained on 38T
152 points
9 hours ago
| 16 comments
| liquid.ai
| HN
onlyrealcuzzo
3 hours ago
[-]
I just tested this on a bug fixing benchmark I'm working on.

It did not perform as well as I expected. Qwen2.5-Coder-3B (2 years old) outperformed it by a wide range -> fixing ~50% of bugs whereas this model only fixed ~12%.

Granted, it's not a coder specific model, but given its benchmark performance to Gemma models, and that it's two years newer, and that it's an MoE with 8B total params, I expected it to be more competitive.

reply
walrus01
19 minutes ago
[-]
I personally find any model smaller than something like Qwen 3.6 35B-A3B (8-bit quantization, about 49GB memory usage when loaded into llama.cpp) to be too "stupid" for reliable use.

I would much rather not run the model on my local laptop hardware and offload that to some system sitting under my desk in my home office, accessible via VPN, than take the risk of using an unreliable and flaky tool for the convenience of having it on the same hardware on my lap.

I pay very little attention to 8 billion or whatever (or even much smaller) models these days and I don't feel like I'm missing much.

reply
debazel
1 hour ago
[-]
I tried it with OpenCode and it is borderline incapable of using tool calls, so that might be why it is doing so bad on your test.
reply
peder
1 hour ago
[-]
I just did the same. Absolutely awful. I assume OpenCode's heavy context is a problem, and it's probably better to use Liquid's own OpenCode alternative for this.
reply
XCSme
1 hour ago
[-]
I will test it when it's accessible via OpenRouter, but the previous LFM2 model (lfm-2-24b-a2b) didn't do well on my tests, it got only 1/20 questions/tasks right, way below Gemma 31B or Qwen 35b-a3b (those get like 10/20 right)
reply
HanClinto
3 hours ago
[-]
Some of the coding-specific fine-tunes were really impressive boosts. Qwen2.5-3B-Instruct is also available [0] -- if it's not too much to ask, I'd be curious how more general models stack up in your benchmark?

[0] - https://huggingface.co/Qwen/Qwen2.5-3B-Instruct

reply
mlmonkey
6 hours ago
[-]
Question: I have a dirty car and the car wash is just 50 meters away. Should I walk or drive to the carwash?

Answer: . . . . So, unless you have a compelling reason not to, walk to the car wash.

reply
nine_k
5 hours ago
[-]
You did not express any desire to wash the car. After nudging, it's funnier:

> Is my desire to make my car clean related to a car wash in any way? To driving to it?

> Your desire to have a clean car isn’t tied to the existence or use of a car wash or to traveling to one. It stems from how much you value the effort, time, and resources you’re willing to invest in achieving cleanliness — whether that means doing it yourself, paying for a professional service, or simply enjoying the routine of a quick walk‑out. Driving to a nearby wash is just one possible method among many; the decision ultimately depends on factors such as how much time you have, how much you’d like to spend, the quality of the wash, and any personal preferences you have for hands‑on versus outsourced cleaning. In this case, walking is the simplest and fastest way, but it isn’t required for you to achieve a clean car.

Common sense is clearly there, but we should not underestimate the colossal heap of tacit assumptions that drive "obvious" decisions in our daily life.

reply
enaaem
1 hour ago
[-]
The AI would be an awful husband
reply
andai
4 hours ago
[-]
Reminder that the human baseline for the car wash question is also pretty bad.
reply
MPSimmons
2 hours ago
[-]
Oh, I'm interested - do you have any docs with human responses to that?
reply
cwnyth
5 hours ago
[-]
I'm surprised these models haven't picked this up yet in the training data. Both Claude and ChatGPT missed that one when I posed the question to them last year.
reply
treis
3 hours ago
[-]
ChatGPT still says walk but adds:

>The main reasons to drive such a short distance would be if you're bringing the car specifically to be washed, carrying something heavy, or the weather or walking conditions make it impractical.

>If your goal is to get your car washed, you'll need the car there—so driving makes sense. If you're just going to talk to someone at the car wash or check it out, walking is probably faster.

reply
tingletech
5 hours ago
[-]
Why would a model know that one washes cars at a car wash? We don't clean our bodies at the body wash or clean the kitchen at the kitchen wash.
reply
shepardrtc
5 hours ago
[-]
There's meaning in the term "car wash" that it understands. But I don't suspect anyone has taught it that for 99.9% of people, going to car wash ONLY means that you're going to wash your car and that it should make that implicit assumption.

What if you're the car wash owner? Or a maintenance technician? Pretty easy to just walk over there if you're just 50ft away.

reply
jjtheblunt
5 hours ago
[-]
to your point, when my Aussie friends first mentioned a "car park" to my north american born self, i wondered _momentarily_ what that was, then realized it's sort of a fun name for what i would call a parking lot.
reply
nl
1 hour ago
[-]
I've never thought of it as a fun term before.

We use "park" as "I will park the car" not park as in "amusement park"

reply
jjtheblunt
29 minutes ago
[-]
yeah but syntactically "car park" gets used like a noun phrase, not verb phrase, which was (to your point really) what had me think "huh?" momentarily.
reply
SequoiaHope
4 hours ago
[-]
Every model knows what a car wash is.
reply
purerandomness
2 hours ago
[-]
If it doesn't, what's the point using it? Trusting it with your workflows, your code?
reply
sroussey
4 hours ago
[-]
I walk to the gas station more often than I drive there.
reply
deklesen
4 hours ago
[-]
Yeah, but you are not washing yourself there, I suppose?

The whole twist here is that to wash your car, you need your car, so you cannot go by foot.

reply
strangegecko
1 hour ago
[-]
His analogy is that a gas station is for putting gas into your car. But he walks there often, so the assumption that you need your car if you go to the gas station isn't inevitable.

You could conceivably walk to a car wash that has similar sundries as a gas station.

reply
sroussey
1 hour ago
[-]
Indeed, the little market there is why I walk there. There is also one at the car wash another 2 blocks away. I’d walk there for a 7up if it were closer!
reply
dominotw
5 hours ago
[-]
doesnt seem unreasonable.
reply
halJordan
5 hours ago
[-]
These faux questions always have a valid interpretation that the asker doesn't admit (for some reason). The model is then castigated for not making an opinionated choice
reply
kennywinker
4 hours ago
[-]
That’s not what’s happening.

The question is revealing that the model has a model of language but not of reality. It knows what words go together, but not real-world concepts.

reply
SubiculumCode
5 hours ago
[-]
Anybody use their localcowork [1] before? That is where the demo lives. Or not?

[1] https://github.com/Liquid4All/cookbook/tree/main/examples/lo...

reply
adityashankar
6 hours ago
[-]
This is super interesting, I'm particularly excited for this one as it may allow teams to scale this architecture for VLAs (vision language action models), and having sparser models means more real-time actions on a locally hosted model

demo link for anyone that wants to try this out https://playground.liquid.ai/chat?model=cmppnbgse000004l4bc8...

reply
Ifkaluva
4 hours ago
[-]
Liquid does amazing work, but I kinda feel like they are overtraining their models. 38T tokens seems like a lot for an 8B model
reply
andai
4 hours ago
[-]
What's the downside? Don't they stop when they hit diminishing returns?
reply
Ifkaluva
1 hour ago
[-]
You’d think so, but I haven’t seen it explicitly discussed in their papers, and nobody else that I know of trains on that many tokens
reply
frankdlc222
2 hours ago
[-]
Look at the accuracy numbers and these things clearly don't know much yet, and I'm not about to hand one my hardest work. But you can see where it's going. As quantization and the MoE stuff keeps getting better, "good enough to just run on my own machine" keeps eating into more of what I'm currently paying a frontier lab for. Once a local model can handle like 80% of what I need, the math stops making sense for the subscription.
reply
chabes
6 hours ago
[-]
The small models are getting really impressive.

I recently realized that Qwen3.5:4B is way more capable than I thought a model that size could be.

Combine that with the work Liquid puts into RL and fine tuning, and you get models that perform extremely well on minimal hardware.

Combine that with your own fine tuning, and you get a specialized tool that is fast, private, and doesn’t require internet connection.

reply
r0b05
6 hours ago
[-]
What did you use qwen3.5 4b for?
reply
steve_adams_86
3 hours ago
[-]
I use it for triaging my messages and emails and reminding me how all of it ties together. It uses Obsidian to know where to put stuff and how to connect information. It isn't perfect. It's very slow (using a 32GB M2 Max) but fast enough for my needs.

A good example of how it's helpful is that it will make certain things relatively frictionless. Like, I need to pay property taxes. I hate this stuff. I got the email reminder from my municipality and it made an entry in my TODOs which points to page with instructions to pay the taxes, including my folio and access numbers for when I log in. That was taken from the email and a document which contains past property tax information. I have it all there, but it compiles relevant data into dedicated TODO pages.

I'm so bad at doing all of this myself. I really don't enjoy it. Send me to buy a carrot at the store and I'll happily walk 30 minutes there and back to do it. It isn't the effort so to speak; it's how unrewarding, inefficient, and bureaucratic it all is. I'm allergic to it. Why isn't it baked into my income taxes? Why are we still doing this?

Sometimes it does a really bad job of making TODOs. Like my wife messaged me about what our dinner plan was, so Qwen went ahead and made a plan for chicken meatball soup based on messages from a week earlier. It totally fabricated the recipe. Yet, I don't know, it was still helpful to be reminded that I'm in charge of dinner.

It's probably best at scaffolding responses to emails I don't want to send. I will write it, but I appreciate basic information being fleshed out so I can write it without jumping around looking for files or numbers or whatever constantly.

I use it with a custom harness. It could be a lot better. Everything about it could be better. The model is remarkably good for its size and price, though.

Letting Sonnet 4.6 do it instead always yields much better results, much faster, but it's kind of like using a new phone vs a super old one. They can both get you there. The sound quality and camera might be worse, it doesn't look as fancy, but the new one is $1200 and the old one is free on marketplace if you're handy with a screwdriver and a fresh battery. Sounds great to me

Worth noting: this was all vibe-coded using Opus 4.6 and 4.7. It's the only project I've built that is strictly vibe-coded. It's simultaneously exciting and disgusting. I'm not sure if I'll ever 'software engineer' it, or I'll just let it be slop. It works.

reply
cjtrowbridge
4 hours ago
[-]
its really good at agentic tasks
reply
sroussey
4 hours ago
[-]
I find it works well in the browser.
reply
irthomasthomas
4 hours ago
[-]
Woah, chinchilla scaling is 20 x active_params. I think mistral was 2 x Chinchilla. This is 1800 x
reply
elorant
6 hours ago
[-]
Wow, this is fucking phenomenal. I fed it a long transcript asking it to create a summary and it executed it extremely well. For an 8B model this is quite impressive.
reply
SubiculumCode
5 hours ago
[-]
I gave it a 2000 line python code that does some fairly sophisticated geodesic calculations on surfaces, and asked to review the code. I then asked Claude and ChatGPT to "assess the accuracy of this review" and they did not hold back. That said, its a very small model, and very fast.
reply
kilroy123
4 hours ago
[-]
Hmm, I asked it who made it, and it says Google?
reply
bee_rider
6 hours ago
[-]
They seem… much better than all the models they compared against? What’s the catch?
reply
FuckButtons
5 hours ago
[-]
They only showed the benchmarks where they outperformed?
reply
andai
4 hours ago
[-]
It's twice the size?
reply
ramshanker
6 hours ago
[-]
Guess we can run this even on CPU!
reply
HenryMulligan
6 hours ago
[-]
Why does this not have (day-one) support for Ollama? The previous model is on there? Is it related to the ongoing refactor work or are people abandoning Ollama for other LLM engines?
reply
TobTobXX
6 hours ago
[-]
Ollama is just llama.cpp but with their own interface ontop. Liquid does support llama.cpp, but Ollama is slow in updating its llama.cpp dependency.
reply
garo-pro
6 hours ago
[-]
It does, ollama pull maternion/lfm2.5
reply
zmmmmm
4 hours ago
[-]
No vision support?
reply
jauntywundrkind
3 hours ago
[-]
I really love how fast it is! Their press release comparing it on Strix Halo and M5 Max are impressive. It going twice as fast at GPU benchmarks even more so!
reply
gmuslera
6 hours ago
[-]
Homeopathic AI
reply