I think, devstral-latest should be it, no? So I write to support and get an answer 12 hours later that says oh, no, devstral 2 is definetely called devstral 2 and then a page of instructions on how to set it up in Intellij... generated with AI. The screens it is refering to don't exist and never did.
devstral-2512 devstral-latest and devstral-medium-latest are all devstral 2 https://docs.mistral.ai/models/devstral-2-25-12
labs-devstral-small-2512 and devstral-small-latest are devstral small 2
devstral-medium-2507 is devstral 1.0
and devstral-small-2507 is devstral small 1.1
This is really why Mistral has any support.
The models are bottom barrel, but its the best Europe has...
Although you could use Chinese models on European servers.
Or it can just be a Google like problem where a big company one part doesn't talk to the other.
How? The largest providers that are trying to win devs are locked in a competition to get the devs to continue using the models for free!
The best way to win B2B contracts is to solve the problems that plague business, not those that plague devs. The devs are fickle, have no stickiness and will jump providers to the next free provider, to self-hosted, etc.
Selling to business using Mistral's approach is, I feel, just a good business plan.
"Giving away some credits for free, then making a loss on subscribers" is an absolutely terrible business plan.
I feel we are way less protectionist than most other Economic Regions. Including the USA, which are very protectionist but always claim otherwise
It's not like b2b sales is more technical merit based, individual contributor led, elsewhere.
It's always the same, depending on the field individual contributors can have some flexibility on picking tools (so a developer in a mid sized company would be able to pick whatever, an accountant probably would be more constrained, meanwhile a developer at a big bank would not have any choice). But for strategic software choices, that impact the whole company, where standardisation makes sense or is even mandatory to get actual value out of it, you need to sell to high level decision makers, not individual contributors. A CTO or a VP of X can decide to buy and mandate the implementation of something as impactful, workflow changing and potentially time and money saving as a company wide AI platform. A dev can't.
Not everyone is obsessed with code generation. There is a whole world out there.
The problem they have is that this is not a moat - their approach is easily reproducible.
If they can pull ahead in having the most number of pre-trained models (one for this ERP, one for that CRM, etc) and then being able to close sales to companies using these products and sell them on post-trained (give us your specific ERP customisations and we'll give you access to a model that is tailored to your business), then THAT is a moat.
But they need to do this without fanfare. Just close sales, and keep closing, basically. After all, even if other AI providers copy the process, the moat would already have been established for Mistral.
My 2ct: Currently the moat may be that they are not US-American which is not reproducible by any of the US alternatives.
I hope you are right (I am in the process of finalising a product and one of the top-5 selling points contains "outside the jurisdiction of the US"), but in my experience, companies only pay lip service to ethics unless it hits their bottom line.
Sure, Mistral AI is certainly not the market leader and probably never will be but we're not talking about being a market leader but about having a moat.
I instantly believe you when you tell me that many companies do not care. On the other hand there are companies that do. At least partially: ASML, Stellantis, AXA, BNP Paribas, the French ministry of defense, Helsing, SNCF, ... are all Mistral AI customers.
Hang on, where are you getting the numbers from? I looked and I couldn't find any numbers on enterprises who opened their wallets for custom-trained models.
I looked, and because I believed that it might be a good business opportunity to explore, I did spend a bit of time trying to find numbers. I came away with the feeling that the winner in the AI space is going to be whoever successfully whitelabels their offering.
Right now that is Mistral, I think.
How do you measure "usage" in an enterprise/commercial context where no data on usage is available to you? I don't expect Mistral AI to make it's money on OpenRouter.
If you are in Iran, you don't want to give your data to your government.
If you are in France, you don't want to give your data to your government.
etc
If you are in France, and you host your e-mails in a datacenter in Hong-Kong, well good luck for the authorities to get it.
If you host it in "secure France", on the paper you will have more privacy and laws behind you, but in reality you are jumping into the mouth of the shark.
This is why governments are promoting: "yes yes, host here don't worry, we will protect you"
"We want your data on X, here;'s a warrant."
"No."
"You are now under arrest for contempt of court."
People have some oddly silly views on what government can and can't do to people living in their territories.
And companies really really don't care if the government has their data.
> host your e-mails in a datacenter in Hong-Kong
Now China has it, gives it a competitor in China and your market share drops like a stone. Congrats! Great choice!
The trick is to host your data in a country with a strong rule of law, and avoid illegal / geopolitical lines. If you're an American company hosting stuff in Russia, you can bet the GRU/SVR would be very happy to abuse it. If you're running a torrent site in Ukraine, you can bet the US would be very happy to claim extraterritorial magic jurisdiction and get you extradited from Poland.
As a French company, you're already beholden to French law and French legal decisions. "Data is hosted in Hong Kong" doesn't matter in the slightest, it only exposes you to more risk.
I have not seen that, actually. I still see most companies who want to jump into AI for the business sort of try RAG, but more often they just buy Chat accounts for their users.
The only place that harnesses appear to be used is in software development, but most companies aren't doing that either.
Isn't the entire deal with LLMs that they are trained as megaliths? How can bespoke modelling overcome the treasure trove of knowledge that megaliths can generically bring in, even in bespoke scenarios?
When generating images most services will have a small agent that rewrites your request and hands it off to the generative image model.
So from the treasure trove point of view, optimized agents have their place. From companies building pipelines, they also have their place.
Right, but this was done to value-optimize the product, i.e. try to always give you the shittiest (cheapest) model you can bear, because otherwise people would always choose the smartest (most expensive) model for any query.
Taking away the model choice from the user introduces a lot of ways to cut down costs, but one thing it does not do is make the product give users better/more reliable answers.
Think of it as a base model (the megalith) which then has the weights adjusted towards a specific use-case (SAP, for example).
Its just not good. Its bottom floor for LLMs.
My University also migrated to OpenExchange
And they should. Because the US is not behaving rationally at all.
https://nltimes.nl/2026/02/10/rabobank-ing-abn-amro-seek-eur...
https://www.theregister.com/2025/11/13/gartner_cio_cloud_sov...
https://www.independent.co.uk/news/world/europe/europe-zoom-...
https://www.theglobeandmail.com/business/commentary/article-...
https://sherwood.news/tech/europe-wants-to-break-up-with-us-...
Well I have even more personal experience that contradicts yours, and this isn't true at all. Everyone uses Claude / Gemini / OpenAI. Mistral isn't even on the table.
Having an option at the back of your mind is all it takes right now, until push comes to shove of course.
Proof: Most big EU companies use Claude or Gemini or OpenAI, not Mistral. That choice was made recently.
Things have changed in the loud echo chambers of the internet, maybe (but not really, since people were saying that EU data sovereignty was happening any time now since 2016).
Of course, it will be slow and painful and Europeans will need to use their own services for them to grow and mature.
IS a statement with no supporting facts considered "proof"? Just the public list of Mistral customers (https://mistral.ai/customers) is proof alone that quite a few big EU companies are _not_ in fact using Open AI or Claude or Gemini at the strategic level.
Contrast with Antrhopic's Europe based customers, the majority of which are small companies (only big one I can identify from a skim is L'Oreal): https://claude.com/customers?f80ce999_sort_date=desc&f80ce99...
Or OpenAI's customers, of which the only big European ones I can spot are Scania and Philips: https://openai.com/stories/
Note: I'm talking about strategic enterprise AI deployments for the company or at least a division, not individual developers being allowed to use Claude Code etc. The moat and the money will be in the former, not latter.
Their API is consistently among the most used on OpenRouter. While I can’t vouch for it myself, I think this is a decent proxy for capability. You can definitely see glimmers of greatness in their chat interface, it just feels like the system prompts are focused on something that doesn’t interest me.
Grok is nice for asking morally gray questions. ChatGPT will lie in these cases.
> Post-training methods allow teams to refine model behavior for specific tasks and environments.
How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?
There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.
Post-training means everything else: SFT, DPO, RL, etc. Anything that involves things like prompt/response pairs, reward models, or benefits from human feedback of any kind.
Pre-training: refining the weights in an existing model using more training data.
Post-training: Adding some training data to the prompt (RAG, basically).
I have been finding Voxtral useful though.
https://generativehistory.substack.com/p/gemini-3-solves-han...
Which one's the best?
next, it sounds like it's going to be .eu
but what about ai.eu
oh, .. why?
I like a lot what they are doing and I'll be watching them a lot more closely. I'd love to work for them btw!
Even with the coding use-case you would still likely want to build a similarity search engine because searching through plain symbols isn't enough to build a contextual understanding of higher-level concepts in the code.
But seriously, RAG/retrieval is thriving. It'll be part of the mix alongside long context, reranking, and tool-based context assembly for the forseeable future.
> Of course you would have to set a temperature of 0 to prevent abuse from the operator, and also assume that an operator has access to the pre-prompt
Doesn't the fact that LLM's are still non-deterministic with a 0 temperature render all of this moot? And why was I compelled to read a random blog post on the unsolved issue of validating natural language? It's a SQL injection except without a predetermined syntax to validate against, and thus a NP problem we've yet to solve.
But the OP's blog is more about ZK than about NFTs, and crypto is the only place funding work on ZK. It's kind of a devil's bargain, but I've taken crypto money to work on privacy preserving tech before and would again.
So it'd be alive in the making decisions sense, not in a "the technology is thriving" sense.
It's certainly different data, but one could argue that real humans have been trained on 3.5 billion years of evolution data.
I'm probably really out of date at this point, but my impression was that fine tuning never really worked that well for knowledge acquisition, and that don't variety of RAG is the way to go here. Fine tuning can affect the "voice", but not really the knowledge.
I recall that even at Google - with its own search engine and so on - the best way to understand anything was to read code or to reach out to those who wrote them. I don't know how it works in places that work with the "real world" like ASML.
Often the issue is not even about documentation - it's just that it's extremely hard to include all the nuances in text and still have it be readable (code-documentation comes to mind).
Interestingly, I strongly feel that this also where LLMs (and some of our more textually-obsessed academics) fail.
It's feasible for small models but, I thought small models were not reliable for factual information?
Foundational:
- Pretraining - Mid/post-training (SFT) - RLHF or alignment post-training (RL)
And sometimes...
- Some more customer-specific fine-tuning.
Note that any supervised fine-tuning following the Pretraining stage is just swapping the dataset and maybe tweaking some of the optimiser settings. Presumably they're talking about this kind of pre-RL fine-tuning instead of post-RL fine-tuning, and not about swapping out the Pretraining stage entirely.
That said I think we will see more efforts also on the business side to have models that can help you build a knowledge base in some kind of standardized way that the model is trained to read. Or synthesize some sort on instructions how to navigate your knowledge base.
Currently e.g. Copilot tries to navigate a hot mess of a MS knowledge graph that is very different for each company. And due to its amnesia it has to repeat the discovery in every session. No wonder that does not work. We have to either standardize or store somewhere (model, instructions) how to find information efficiently.
Dissapointing.
I am a simple stupid Le Chat user with a small mind and the Tredict MCP Server connected to it (to Le Chat, not my mind), which works ok-ish. :-)
Are you suggesting that it's an aberration that from ~2019 to ~2026 the AI field has been working on general intelligence (I assume this is what you mean by "achieving benevolent knowledge")?
Personally I think it's remarkable how much a simple transformer model can do when scaled up in size. LLMs are an incredible feat of generalization. I don't see why the trajectory should change back towards specialization now.
Don't get me wrong, general intelligence will always be important and should be a part of specialist models to a degree for understanding, but it doesn't make sense to use an 800B+ parameter model to help write an email or do research on company trends. Hell, look at what China has been able to do. Qwen 3.5 9B, exceeds Claude 3.5 Haiku and nears Sonnet 3.5 levels. The 27B variation of Qwen 3.5 is superior to both in many ways and even rivals newer models. There is obviously an inherit lag behind, but we will gradually see a shift as these models become more capable.
Right now we are chasing 1-2% improvements at the cost of billions. Local are already absurdly capable (more and more by the day - same with cloud ofcourse) and smarter than most people in specific areas. To do most jobs, can we honestly say it requires a PhD or higher level understanding to perform? We're chasing something that is becoming more and more not needed from a general day to day perspective. AGI is outstanding, but not practical (at least today). I think we'll get there anyway at our current trajectory (though dangerous), but I suspect things will shift.
Would love to take it for a spin, if that is even possible.
... for humans.
...learn a thing or two from NVIDIA or gtfo
Is it possible to retrain daily or hourly as info changes?