llm install llm-mistral
llm mistral refresh
llm -m mistral/devstral-2512 "Generate an SVG of a pelican riding a bicycle"
https://tools.simonwillison.net/svg-render#%3Csvg%20xmlns%3D...Pretty good for a 123B model!
(That said I'm not 100% certain I guessed the correct model ID, I asked Mistral here: https://x.com/simonw/status/1998435424847675429)
So far though, the models good at bike pelican are also good at kayak bumblebee, or whatever other strange combo you can come up with.
So if they are trying to benchmaxx by making SVG generation stronger, that's not really a miss, is it?
I may be stupid, but _why_ is this prompt used as a benchmark? I mean, pelicans _can't_ ride a bicycle, so why is it important for "AI" to show that they can (at least visually)?
The "wine glass problem"[0] - and probably others - seems to me to be a lot more relevant...?
[0] https://medium.com/@joe.richardson.iii/the-curious-case-of-t...
Honestly though, the benchmark was originally meant to be a stupid joke.
I only started taking it slightly more seriously about six months ago, when I noticed that the quality of the pelican drawings really did correspond quite closely to how generally good the underlying models were.
If a model draws a really good picture of a pelican riding a bicycle there's a solid chance it will be great at all sorts of other things. I wish I could explain why that was!
If you start here and scroll through and look at the progression of pelican on bicycle images it's honestly spooky how well they match the vibes of the models they represent: https://simonwillison.net/2025/Jun/6/six-months-in-llms/#ai-...
So ever since then I've continue to get models to draw pelicans. I certainly wouldn't suggest anyone take serious decisions on model usage based on my stupid benchmark, but it's a fun first-day initial impression thing and it appears to be a useful signal for which models are worth diving into in more detail.
Why?
If I hired a worker that was really good at drawing pelicans riding a bike, it wouldn't tell me anything about his/her other qualities?!
It's not a human intelligence - it's a totally different thing, so why would the same test that you use to evaluate human abilities apply here?
Also more directly the "all sorts of other things" we want llms to be good at often involve writing code/spatial reasoning/world understanding which creating an svg of a pelican riding a bicycle very very directly evaluates so it's not even that surprising?
Yes it's like the wine glass thing.
Also it's kind of got depth. Does it draw the pelican and the bicycle? Can the penguin reach the peddles? How?
I can imagine a really good AI finding a funny or creative or realistic way for the penguin to reach the peddles.
An slightly worse AI will do an OK job, maybe just making the bike small or the legs too long.
An OK AI will draw a penguin on top of a bicycle and just call it a day.
It's not as binary as the wine glass example.
> Yes it's like the wine glass thing.
No, it's not!
That's part of my point; the wine glass scenario is a _realistic_ scenario. The pelican riding a bike is not. It's a _huge_ difference. Why should we measure intelligence (...) in regards to something that is realistic and something that is unrealistic?
I just don't get it.
It is unrealistic because if you go to a restaurant, you don't get served a glass like that. It is frowned upon (alcohol is a drug, after all) and impractical (wine stains are annoying) to fill a glass of wine as such.
A pelican riding a bike, on the other hand, is realistic in a scenario because of TV for children. Example from 1950's animation/comic involving a pelican [1].
[1] https://en.wikipedia.org/wiki/The_Adventures_of_Paddy_the_Pe...
I may have missed something but where are we saying the website should be recreated with 1996 tech or specs? The model is free to use any modern CSS, there is no technical limitations. So yes I genuinely think it is a good generalization test, because it is indeed not in the training set, and yet it is easy an easy task for a human developer.
Browsers are able to parse a webpage from 1996. I don't know what the argument in the linked comment is about, but in this one, we discuss the relevance of creating a 1996 page vs a pelican on a a bicycle in SVG.
Here is Gemini when asked how to build a webpage from 1996. Seems pretty correct. In general I dislike grand statements that are difficult to back up. In your case, if models have only a cursory knowledge of something (what does this mean in the context of LLMs anyway), what exactly they were trained on etc.
The shortened Gemini answer, the detailed version you can ask for yourself:
Layout via Tables: Without modern CSS, layouts were created using complex, nested HTML tables and invisible "spacer GIFs" to control white space.
Framesets: Windows were often split into independent sections (like a static sidebar and a scrolling content window) using Frames.
Inline Styling: Formatting was not centralized; fonts and colors were hard-coded individually on every element using the <font> tag.
Low-Bandwidth Design: Visuals relied on tiny tiled background images, animated GIFs, and the limited "Web Safe" color palette.
CGI & Java: Backend processing was handled by Perl/CGI scripts, while advanced interactivity used slow-loading Java Applets.
I'd be curious about that actually, feel like W3C specifications (I don't mean browser support of them) rarely deprecate and precisely try to keep the Web running.
Yes, SVG is code, but not in a sense of executable with verifiable inputs and outputs.
(Surely they won't release it like that, right..?)
That looks like the next flagship rather than the fast distillation, but thanks for sharing.
Google should be punishing these sites but presumably it's too narrow of a problem for them to care.
Or at least a profit model. I don't see either on that page but maybe I'm missing something
edit: Mea culpa. I missed the active vs dense difference.
Devstral 2 is 123B dense. Deepseek is 37B Active. It will be slower and more expensive to run inference on this than dsv3. Especially considering that dsv3.2 has some goodies that make inference at higher context be more effective than their previous gen.
It spent about half an hour, correctly identified what the program did, found two small bugs, fixed them, made some minor improvements, and added two new, small but nice features.
It introduced one new bug, but then fixed it on the first try when I pointed it out.
The changes it made to the code were minimal and localized; unlike some more "creative" models, it didn't randomly rewrite stuff it didn't have to.
It's too early to form a conclusion, but so far, it's looking quite competent.
I'm a bit saddened by the name of the CLI tool, which to me implies the intended usage. "Vibe-coding" is a fun exercise to realize where models go wrong, but for professional work where you need tight control over the quality, you can obviously not vibe your way to excellency, hard reviews are required, so not "vibe coding" which is all about unreviewed code and just going with whatever the LLM outputs.
But regardless of that, it seems like everyone and their mother is aiming to fuel the vibe coding frenzy. But where are the professional tools, meant to be used for people who don't want to do vibe-coding, but be heavily assisted by LLMs? Something that is meant to augment the human intellect, not replace it? All the agents seem to focus on off-handing work to vibe-coding agents, while what I want is something even tighter integrated with my tools so I can continue delivering high quality code I know and control. Where are those tools? None of the existing coding agents apparently aim for this...
This is exactly the CLI I'm referring to, whose name implies it's for playing around with "vibe-coding", instead of helping professional developers produce high quality code. It's the opposite of what I and many others are looking for.
A surprising amount of programming is building cardboard services or apps that only need to last six months to a year and then thrown away when temporary business needs change. Execs are constantly clamoring for semi-persistent dashboards and ETL visualized data that lasts just long enough to rein in the problem and move on to the next fire. Agentic coding is good enough for cardboard services that collapse when they get wet. I wouldn't build an industrial data lake service with it, but you can certainly build cardboard consumers of the data lake.
But there is nothing more permanent that a quickly hacked together prototype or personal productivity hack that works. There are so many Python (or Perl or Visual Basic) scripts or Excel spreadsheets - created by people who have never been "developers" - which solve in-the-trenches pain points and become indispensable in exactly the way _that_ xkcd shows.
This is what we're building at Brokk: https://brokk.ai/
Quick intro: https://blog.brokk.ai/introducing-lutz-mode/
Claude Code not good enough for ya?
Still, I do use Claude Code and Codex daily as there is nothing better out there currently. But they still feel tailored towards vibe-coding instead of professional development.
Err, doesn’t it have /review?
Imagine a GUI built around git branches + agents working in those branches + tooling to manage the orchestration and small review points, rather than "here's a chat and tool calling, glhf".
What matters is high quality specifications including test cases
Says the person who will find themselves unable to change the software even in the slightest way without having to large refactors across everything at the same time.
High quality code matters more than ever, would be my argument. The second you let the LLM sneak in some quick hack/patch instead of correctly solving the problem, is the second you invite it to continue doing that always.
I have a feeling this will only supercharge the long established industry practice of new devs or engineering leadership getting recruited and immediately criticising the entire existing tech stack, and pushing for (and often succeeding) a ground up rewrite in language/framework de jour. This is hilariously common in web work, particularly front end web work. I suspect there are industry sectors that're well protected from this, I doubt people writing firmware for fuel injection and engine management systems suffer too much from this, the Javascript/Nodejs/NPM scourge _probably_ hasn't hit the PowerPC or 68K embedded device programming workflow. Yet...
In my mind, it's somewhat orthogonal to code quality.
Waterfall has always been about "high quality specifications" written by people who never see any code, much less write it. Agile make specs and code quality somewhat related, but in at least some ways probably drives lower quality code in the pursuit of meeting sprint deadlines and producing testable artefacts at the expense of thoroughness/correctness/quality.
If you babysit every interaction, rather than reviewing a completed unit of work of some size, you're wasting your time second-guessing that the model won't "recover" from stupid mistakes. Sometimes that's right, but more often than not it corrects itself faster than you can.
And so it's far more effective to interact with it far more async, where the UI is more for figuring out what it did if something doesn't seem right, than for working live. I have Claude writing a game engine in another window right now, while writing this, and I have no interest in reviewing every little change, because I know the finished change will look nothing like the initial draft (it did just start the demo game right now, though, and it's getting there). So I review no smaller units of change than 30m-1h, often it will be hours, sometimes days, between each time I review the output, when working on something well specified.
The chat interface is optimal to me because you often are asking questions and seeking guidance or proposals as you are making actual code changes. On reason I do like it is that its default mode of operation is to make a commit for each change it makes. So it is extremely clear what the AI did vs what you did vs what is a hodge podge of both.
As others have mentioned, you can integrate with your IDE through the watch mode. It's somewhat crude but still useful way. But I find myself more often than not just running Aider in a terminal under the code editor window and chatting with it about what's in the window.
> The chat interface
Seems very much not, if it's still a chat interface :) Figuring out a chat UX is easy compared to something that was creating with letting LLM fill in some parts from the beginning. I guess I'm searching for something with a different paradigm than just "chat + $Something".
It's all very fluffy and theoretical of course.
"I want you to do feature X. Analyse the code for me and make suggestions how to implement this feature."
Then it will go off and work for a while and typically come back after a bit with some suggestions. Then iterate on those if needed and end with.
"Ok. Now take these decided upon ideas and create a plan for how to implement. And create new tests where appropriate."
Then it will go off and come back with a plan for what to do. And then you send it off with.
"Ok, start implementing."
So sure. You probably can work on this to make it easier to use than with a CLI chat. It would likely be less like an IDE and more like a planning tool you'd use with human colleagues though.
So you'd write a function name and then tell it to flesh it out.
function factorial(n) // Implement this. AI!
Becomes: function factorial(n) {
if (n === 0 || n === 1) {
return 1;
} else {
return n \* factorial(n - 1);
}
}
Last I looked Aider's maintainer has had to focus on other things recently, but aider-ce is a fantastic fork.I'm really curious to try Mistral's vibe, but even though I'm a big fanboi I don't want to be tied to just one model. Aider lets tier your models such that your big, expensive model can do all the thinking and then stuff like code reviews can run through a smaller model. It's a pretty capable tool
Edit: Fix formatting
Very much this for me - I really don't get why, given a new models are popping out every month from different providers, people are so happy to sink themselves into provider ecosystems when there are open source alternatives that work with any model.
The main problem with Aider is it isn't agentic enough for a lot of people but to me that's a benefit.
While True:
0. Context injected automatically. (My repos are small.)
1. I describe a change.
2. LLM proposes a code edit. (Can edit multiple files simultaneously. Only one LLM call required :)
3. I accept/reject the edit.
What kind of hardware do you have to be able to run a performant GPT-OSS-120b locally?
There are many platforms out there that can run it decently.
AMD strix halo, Mac platforms. Two (or three without extra ram) of the new AMD AI Pro R9700 (32GB of RAM, $1200), multi consumer gpu setups, etc.
Here is what I think about the bigger model: It sits between sonnet 4 and sonnet 4.5. Something like "sonnet 4.3". The response sped was pretty good.
Overall, I can see myself shifting to this for reguar day-to-day coding if they can offer this for copetitive pricing.
I'll still use sonnet 4.5 or gemini 3 for complex queries, but, for everything else code related, this seems to be pretty good.
Congrats Mistral. You most probably have caught up to the big guys. Not there yet exactly, but, not far now.
Even the Gemini 3 announcement page had some bit like "best model for vibe coding".
If you're actually making sure it's legit, it's not vibe coding anymore. It's just... Backseat Coding? ;)
There's a level below that I call Power Coding (like power armor) where you're using a very fast model interactively to make many very small edits. So you're still doing the conceptual work of programming, but outsourcing the plumbing (LLM handles details of syntax and stdlib).
Maybe common usage is shifting, but Karpathy's "vibe coding" was definitely meant to be a never look at the code, just feel the AI vibes thing.
Also, we’re both “people in tech”, we know LLMs can’t conceptualise beyond finding the closest collection of tokens rhyming with your prompt/code. Doesn’t mean it’s good or even correct. So that’s why it’s vibe coding.
sorry to disappoint you but that is also been considered vibecoding. It is just not pejorative.
Imo, if you read the code, it's no longer vibecoding.
I've personally decided to just rent systems with GPUs from a cloud provider and setup SSH tunnels to my local system. I mean, if I was doing some more HPC/numerical programming (say, similarity search on GPUs :-) ), I could see just taking the hit and spending $15,000 on a workstation with an RTX Pro 6000.
For grins:
Max t/s for this and smaller models? RTX 5090 system. Barely squeezing in for $5,000 today and given ram prices, maybe not actually possible tomorrow.
Max CUDA compatibility, slower t/s? DGX Spark.
Ok with slower t/s, don't care so much about CUDA, and want to run larger models? Strix Halo system with 128gb unified memory, order a framework desktop.
Prefer Macs, might run larger models? M3 Ultra with memory maxed out. Better memory bandwidth speed, mac users seem to be quite happy running locally for just messing around.
You'll probably find better answers heading off to https://www.reddit.com/r/LocalLLaMA/ for actual benchmarks.
That's a good idea!
Curious about this, if you don't mind sharing:
- what's the stack ? (Do you run like llama.cpp on that rented machine?)
- what model(s) do you run there?
- what's your rough monthly cost? (Does it come up much cheaper than if you called the equivalent paid APIs)
I am usually just running gpt-oss-120b or one of the qwen models. Sometimes gemma? These are mostly "medium" sized in terms of memory requirements - I'm usually trying unquantized models that will easily run on an single 80-ish gb gpu because those are cheap.
I tend to spend $10-$20 a week. But I am almost always prototyping or testing an idea for a specific project that doesn't require me to run 8 hrs/day. I don't use the paid APIs for several reasons but cost-effectiveness is not one of those reasons.
Here are my lazy notes + a snippet of the history file from the remote instance for a recent setup where I used the web chat interface built into llama.cpp.
I created an instance gpu_1x_gh200 (96 GB on ARM) at lambda.ai.
connected from terminal on my box at home and setup the ssh tunnel.
ssh -L 22434:127.0.0.1:11434 ubuntu@<ip address of rented machine - can see it on lambda.ai console or dashboard>
Started building llama.cpp from source, history:
21 git clone https://github.com/ggml-org/llama.cpp
22 cd llama.cpp
23 which cmake
24 sudo apt list | grep libcurl
25 sudo apt-get install libcurl4-openssl-dev
26 cmake -B build -DGGML_CUDA=ON
27 cmake --build build --config Release
MISTAKE on 27, SINGLE-THREADED and slow to build see -j 16 below for faster build 28 cmake --build build --config Release -j 16
29 ls
30 ls build
31 find . -name "llama.server"
32 find . -name "llama"
33 ls build/bin/
34 cd build/bin/
35 ls
36 ./llama-server -hf ggml-org/gpt-oss-120b-GGUF -c 0 --jinja
MISTAKE, didn't specify the port number for the llama-server 37 clear;history
38 ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking -c 0 --jinja --port 11434
39 ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking.gguf -c 0 --jinja --port 11434
40 ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking-GGUF -c 0 --jinja --port 11434
41 clear;history
I switched to qwen3 vl because I need a multimodal model for that day's experiment. Lines 38 and 39 show me not using the right name for the model. I like how llama.cpp can download and run models directly off of huggingface.Then pointed my browser at http//:localhost:22434 on my local box and had the normal browser window where I could upload files and use the chat interface with the model. That also gives you an openai api-compatible endpoint. It was all I needed for what I was doing that day. I spent a grand total of $4 that day doing the setup and running some NLP-oriented prompts for a few hours.
48GB of vram and lots of cuda cores, hard to beat this value atm.
If you want to go even further, you can get an 8x V100 32GB server complete with 512GB ram and nvlink switching for $7000 USD from unixsurplus (ebay.com/itm/146589457908) which can run even bigger models and with healthy throughput. You would need 240V power to run that in a home lab environment though.
Fuck nvidia
How is it? I'd guess a bunch of the MoE models actually run well?
nix run github:numtide/llm-agents.nix#mistral-vibe
The repo is updated daily.As long as it doesn't mean 10x worse performance, that's a good selling point.
In work, where my employer pays for it, Haiku tends to be the workhorse with Sonnet or Opus when I see it flailing. On my own budget I’m a lot more cost conscious, so Haiku actually ends up being “the fancy model” and minimax m2 the “dumb model”.
> this model is worse (but cheaper)
> use it to output 10x the amount of trashier trash
You've lost me.
I'm team Anthropic with Claude Max & Claude Code, but I'm still excited to see Mistral trying this. Mistral has occasionally saved the day for me when Claude refused an innocuous request, and it's good to have alternatives... even if Mistral / Devstral seems to be far behind the quality of Claude.
That was very helpful, thanks!
The competition is much smoother. Where are the subscriptions which would give users the coding agent and the chat for a flat fee and working out of the box?..
Going to start hacking on this ASAP
[1] https://openhands.dev/blog/devstral-a-new-state-of-the-art-o...
This tech is simply too critical to pretend the military won’t use it. That’s clearer now than ever, especially after the (so far flop-ish) launch of the U.S. military’s own genAI platform.
- https://helsing.ai/newsroom/helsing-and-mistral-announce-str... - https://sifted.eu/articles/mistral-helsing-defence-ai-action... - Luxembourg army chose Mistral: https://www.forcesoperations.com/la-pepite-francaise-mistral... - French army: https://www.defense.gouv.fr/actualites/ia-defense-sebastien-...
Not sure you've kept up to date, US have turned their backs on most allies so far including Europe and the EU, and now welcome previous enemies with open arms.
They did.
core/prompts/cli.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
core/prompts/compact.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/bash.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/grep.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/read_file.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/write_file.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/search_replace.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
.../prompts/todo.md https://github.com/mistralai/mistral-vibe/blob/v1.0.4/vibe/c...
Here's n example of the kinds of things I do with Claude Code now: https://gistpreview.github.io/?b64d5ee40439877eee7c224539452... - that one involved several from-scratch rewrites of the history of an entire Git repo just because I felt like it.
Surprising and good is only: Everything including graphics fixed when clicking my "speedreader" button in Brave. So they are doing that "cool look" by CSS.
There's a scan lines affect they apply to everything that's "cool", but gets old after a minute.
Uh, the "Modified MIT license" here[0] for Devstral 2 doesn't look particularly permissively licensed (or open-source):
> 2. You are not authorized to exercise any rights under this license if the global consolidated monthly revenue of your company (or that of your employer) exceeds $20 million (or its equivalent in another currency) for the preceding month. This restriction in (b) applies to the Model and any derivatives, modifications, or combined works based on it, whether provided by Mistral AI or by a third party. You may contact Mistral AI (sales@mistral.ai) to request a commercial license, which Mistral AI may grant you at its sole discretion, or choose to use the Model on Mistral AI's hosted services available at https://mistral.ai/.
[0] https://huggingface.co/mistralai/Devstral-2-123B-Instruct-25...
If you want to use something, and your company makes $240,000,000 in annual revenue, you should probably pay for it.
I do not mind having a license like that, my gripe is with using the terms "permissive" and "open source" like that because such use dilutes them. I cannot think of any reason to do that aside from trying to dilute the term (especially when some laws, like the EU AI Act, are less restrictive when it comes to open source AIs specifically).
Good. In this case, let it be diluted! These extra "restrictions" don't affect normal people at all, and won't even affect any small/medium businesses. I couldn't care less that the term is "diluted" and that makes it harder for those poor, poor megacorporations. They swim in money already, they can deal with it.
We can discuss the exact threshold, but as long as these "restrictions" are so extreme that they only affect huge megacorporations, this is still "permissive" in my book. I will gladly die on this hill.
Yes, they do, and the only reason for using the term “open source” for things whose licensing terms flagrantly defy the Open Source definition is to falsely sell the idea that using the code carries the benefits that are tied to the combination of features that are in the definition and which are lost with only a subset of those features. The freedom to use the software in commercial services is particularly important to end-users that are not interested in running their own services as a guarantee against lock-in and of whatever longevity they are able to pay to have provided even if the original creator later has interests that conflict with offering the software as a commercial service.
If this deception wasn't important, there would be no incentive not to use the more honest “source available for limited uses” description.
It also makes life harder for individuals and small companies, because this is not Open Source. It's incompatible with Open Source, it can't be reused in other Open Source projects.
Terms have meanings. This is not Open Source, and it will never be Open Source.
I'm amazed at the social engineering that the megacorps have done with the whole Open Source (TM) thing. They engineered a whole generation of engineers to advocate not in their own self-interest, nor for the interest of the little people, but instead for the interest of the megacorps.
As soon as there is even the tiniest of restrictions, one which doesn't affect anyone besides a bunch of richiest corporations in the world, a bunch of people immediately come out of the woodwork, shout "but it's not open source!" and start bullying everyone else to change their language. Because if you even so much as inconvenience a megacorporation even a little bit it's not Open Source (TM) anymore.
If we're talking about ideals then this is something I find unsettling and dystopian.
I hard disagree with your "It also makes life harder for individuals and small companies" statement. It's the opposite. It gives them a competitive advantage vs megacorps, however small it may be.
Whatever name they come up with for a new license will be less useful, because I'll have to figure out that this is what that is
"Open Source" is nebulous. It reasonably works here, for better or worse.
No it isn't it is well defined. The only people who find it "nebulous" are people who want the benefits without upholding the obligations.
Open source has a well understood meaning, including licenses like MIT and Apache - but not including MIT but only if you make less than $500million dollars, MIT unless you were born on a wednesday, etc.
And honestly it wasn't a good hill to begin with: if what you are talking about is the license, call it "open license". The source code is out in the open, so it is "open source". This is why the purists have lost ground to practical usage.
As someone who was born and raised on FOSS, and still mostly employed to work on FOSS, I disagree.
Open source is what it is today because it's built by people with a spine who stand tall for their ideals even if it means less money, less industry recognition, lots of unglorious work and lots of other negatives.
It's not purist to believe that what built open source so far should remain open source, and not wanting to dilute that ecosystem with things that aren't open source, yet call themselves open source.
With all due respect, don't you see the irony in saying "people with a spine who stand tall for their ideals", and then arguing that attaching "restrictions" which only affect the richest megacorporations in the world somehow makes the license not permissive anymore?
What ideals are those exactly? So that megacorporations have the right to use the software without restrictions? And why should we care about that?
Anyone can use the code for whatever purpose they want, in any way they want. I've never been a "rich megacorporation", but I have gone from having zero money to having enough money, and I still think the very same thing about the code I myself release as I did from the beginning, it should be free to be used by anyone, for any purpose.
Because instead of making the point "this license isn't as permissive as it could/should be" (easy to understand), instead the point being made is "this isn't real open source", which comes across to most people as just some weird gate-keeping / No True Scotsman kinda thing.
Though given the stance you are taking in this conversation, I'm not surprised you want to quibble over that.
¯\_(ツ)_/¯
> if what you are talking about is the license, call it "open license".
If you want to build something proprietary, call it something else. "Open Source" is taken.
well we don't really want to open that can of worms though, do we?
I don't agree with ceding technical terms to the rest of the world. I'm increasingly told we need to stop calling cancer detection AI "AI" or "ML" because it is not the 'bad AI' and confuses people.
I guess I'm okay with being intransigent.
Who gives a shit what we call "cancer AI", what matters is the result.
Whenever anybody tries to claim that a non-commercial licenses is open-source, it always gets complaints that it is not open-source. This particular word hasn’t been watered down by misuse like so many others.
There is no commonly-accepted definition of open-source that allows commercial restrictions. You do not get to make up your own meaning for words that differs from how other people use it. Open-source does not have commercial restrictions by definition.
Looking up open-source in the dictionary does include definitions that would allow for commercial restrictions, depending on how you define "free" (a matter that is most certainly up for debate).
The term "open-source" exists for the purposes of a particular movement. If you are "for" the misuse and abuse of the term, you not only aren't part of that movement, but you are ignorant about it and fail to understand it— which means you frankly have no place speaking about the meanings of its terminology.
Unless this authority has some ownership over the term and can prevent its misuse (e.g. with lawsuits or similar), it is not actually the authority of the term, and people will continue to use it how they see fit.
Indeed, I am not part of a movement (nor would I want to be) which focuses more on what words are used rather than what actions are taken.
People can also say 2+2=5, and they're wrong. And people will continue to call them out on it. And we will keep doing so, because stopping lets people move the Overton window and try to get away with even more.
The same is not true for "open source", which is a purely linguistic construct.
And whenever they do so, this pointless argument will happen. Again, and again, and again. Because that’s not what the word means and your desired redefinition has been consistently and continuously rejected over and over again for decades.
What do you gain from misusing this term? The only thing it does is make you look dishonest and start arguments.
This kind of thing is how people try to shift the Overton window. No.
How is that a measure of model size? It should either be parameter size, activated parameters, or cost per output token.
Looks like a typo because the models line up with reported param sizes.
If Mistral is so permissive they could be the first ones, provided that hardware is then fast/cheap/efficient enough to create a small box that can be placed in an office.
Maybe in 5 years.
The Apple offerings are interesting but the lack of x86, Linux, and general compatibility make it hard sell imo.
...so it won't ever happen, it'll require wifi and will only be accessible via the cloud, and you'll have to pay a subscription fee to access the hardware you bought. obviously.
The only thing I found is a pay-as-you-go API, but I wonder if it is any good (and cost-effective) vs Claude et al.
With pricing so low I don't see any reason why someone would buy sub for 200 EUR. These days those subs are so much limited in Claude Code or Cursor than it used to be (or used to unlimited). Better pay-as-you-go especially when there are days when you probably use AI less or not at all (weekends/holidays etc.) as long as those credits don't expire.
After querying the model about .NET, it seems that its knowledge comes from around June 2024.
Why does every AI provider need to have its own tool, instead of contributing to existing tools like Roo Code or Opencode?
Because they couldn't do it by contributing to existing opensource tools?
Just call it Mistral License & flush it down