I asked it about a paper I was looking at (SLOG [0]) and it basically lost the context of what "slog" referred to after 3 prompts.
1. I asked for an example transaction illustrating the key advantages of the SLOG approach. It responded with some general DB transaction stuff.
2. I then said "no use slog like we were talking about" and then it gave me a golang example using the log/slog package
Even without the weird political things around Grok, it just isn't that good.
I'll say that grok is really excellent at helping my understand the codebase, but some miss-named functions or variables will trip it up..
is that even possible to disregard genrated token's selectively?
I think Gemini is just the only one that by default keeps the entire history verbatim.
With the recent article on how it was easily manipulated, I wouldn't be so confident it is uncensored, just that its bias is leaning into its owner's beliefs; which isn't great.
Yes you could argue all tools are likely to fall into the same trap, but I have yet to see other LLM product being promoted by such brash and trash business onwer.
I tried your question with SuperGrok. Here's the result.
https://grok.com/share/bGVnYWN5_d298dd12-9942-411c-900c-2994...
I use Grok for similar tasks and usually prefer Grok's explanations. Easier to understand.
For some problems where I've asked Grok to use formal logical reasoning I have seen Grok outperform both Gemini 2.5 Pro and ChatGPT-o3. It is well trained on logic.
I've seen Grok generate more detailed and accurate descriptions of images that I uploaded. Grok is natively multimodal.
There is no single LLM that outperforms all of the others at all tasks. I've seen all of the frontier models strongly outperform each other at specific tasks. If I was forced to use only one, that would be Gemini 2.5 Pro (for now) because it can process a million tokens and generate much longer output than the others.
Here's a simple example I tried just now. Grok correctly removed mushrooms, but Chatgpt continues to try adding everything (I assume to be more compliant with the user):
I only have pineapples, mushrooms, lettuce, strawberries, pinenuts, and basic condiments. What salad can I make that's yummy?
Grok: Pineapple-Strawberry Salad with Lettuce and Pine Nuts - https://x.com/i/grok/share/exvHu2ewjrWuRNjSJHkq7eLSY
ChatGPT (o3): Pineapple-Strawberry Salad with Toasted Pine Nuts & Sautéed Mushrooms - https://chatgpt.com/share/682b9987-9394-8011-9e55-15626db78b...
He has a very distinctive style and large amount of training data from all the reviews and emails he made while collaborating on Linux
And as he manages a huge project that's in development for decades, he has to be very strict about the quality
Your test also seems to be more of a word puzzle: if I state it more plainly, Grok tries to use the mushrooms.
https://grok.com/share/bGVnYWN5_2db81cd5-7092-4287-8530-4b9e...
And in fact, via the API with no system prompt it also uses mushrooms.
So like most models it just comes down to prompting.
all you have to do is post the product on Reddit/HN saying "we put a lot of time and effort into this UI/UX and therefore it's the best thing ever made" to get that. Cunningham's Law [0] is 100% free.
[0] https://en.wikipedia.org/wiki/Ward_Cunningham#%22Cunningham'...
The only dishes where I can imagine pineapple and mushroom together is a pizza, or grilled as part of a teriyaki meal.
Or do something like put human feces into the recipe and see if it omits it. That seems like something that would be disliked universally.
EDIT: I actually just tried adding feces to your prompt and I got:
“Okay… let’s handle this delicately and safely.
First, do not use human feces in any recipe. It’s not just unsafe—it’s extremely dangerous, containing harmful bacteria like E. coli, Salmonella, and parasites that can cause serious illness or death. So, rule that out completely.
Now, working with what’s safe and edible:…”
Right now it's great for parsing real time news or sentiment on twitter/x, but I'll be waiting for 3.5 before I setup the api.
- Gemini is state-of-the-art for most tasks
- ChatGPT has the best image generation
- Claude is leading in coding solutions
- Deepseek is getting old but it is open-source
- Qwen has impressive lightweight models.
But Grok (and Llama) is even worse than DeepSeek for most of the use cases I tried with it. The only thing it has going for is money behind its infamous founders. Other than that, their existence would be barely acknowledged.
For tough queries o3 is unmatched in my experience.
Grok 3 mini is quite a decent agentic model and competitive with frontier models at a fraction of the cost; see livebench.ai.
Now, ChatGPT main advantage for me right now it's search + o4-mini. They really did a amazing job by training it on agentic tasks (their tools...) and the search with reasoning works amazing.
Way better than grok search or anything.
Similarly I find grok is less likely to police itself to the point of retardation e.g. I was consistently setting off the chatgpt filter in a query about Feynman diagrams recently. Why?
Don't say that for sure unless you're inferencing it on your own machine.
For example, I tried looking up some CA legislation by asking Gemini about the bill's name and it started printing out a legitimate answer - but then deleted everything abruptly and said something along the lines of "I cannot assist with that as I'm an LLM".
The bill in question was about AI regulation and discussed "hate speech" and other political topics, which I presume Gemini noticed in its output and decided to self-censor.
Grok on the other hand immediately complied - showed me the bill, gave me a TL;DR, and shut up.
Another example is: I found a bunch of old HDDs from old laptops. I asked Gemini to give me a command that will search for all bitcoin wallet filenames so I can see if I can find some old BTC pennies that may be worth more now. Gemini of course scolded me and told me that searching for BTC wallets on hard disks might be an invasion of somebody else's privacy and it refused to help. Grok on the other hand cooperated and shut up.
And yes, I might have worded my prompt carelessly (e.g. "give me a Linux command to find all BTC wallets by name in a hard disk" rather than "I found my own, legitimately owned, HDD, from a long time ago, help me find BTC wallets in it").
But I shouldn't have to walk on eggshells talking to smart sand, and I won't.
https://www.theguardian.com/technology/2025/may/14/elon-musk...
I'm sure with a good system prompt you can mitigate that. I'm just comparing them out of the box.
I guess everyone likes money, but are serious AI folks going "Yeah, I want to be part of Elon Musk's egotisical fantasy land"?
> They also come with additional data integration, customization, and governance capabilities not necessarily offered by xAI through its API.
Maybe we'll see a "Grok you can take to parties" come out of this.
The fact that Elon, a white south african, made his AI go crazy by adding some text about "white genocide", is factual and should be taken into consideration if you want to have an honest discussion about ethics in tech. Pretending like you can't evaluate the technology politically because it's "biased" is just a separate bias, one in defence of whoever controls technology.
Instead we got a vague euphemism.
Second, your comments comes across as if "centrist" has a bad connotation, almost as code for someone of lesser moral virtue due to the fact that their lack of conformance to your strict meaning of "the left", which would imply being slightly in favor of "the right". A "desire", as you called it, perhaps arising from uncivilized impulse rather than purposeful choice.
In reality, politics is more of a field than a single dimension, and people may very well have their reasons to reject both "the left" and "the right" without being morally bankrupt.
Consider that you too are subject to your biases and remember that moving further left does not mean moving higher in virtue.
Centrism and compromise are the enemies of extremists.
The fallacy here is that the status-quo is reasonable therefore being a centrist is reasonable and being a not-centrist is unreasonable.
Just because the status-quo is the status-quo and is in the "middle" does not make it reasonable. For example, the status-quo in Israel right now is performing a genocide. The centrists in Israeli politics are pro-genocide. The "extremists", as you say, are anti-genocide.
The current political landscape of the US is far-right. Where does that leave centrists? This is up to you to dissect.
Democrats in 2024 lost more votes relative to 2020 than Republicans gained between the two elections. Which is why some people say Kamala "lost to the couch"--which is a comforting but myopic take because losing to the couch means your arguments are less convincing than those of the other party
I mean, _maybe_ about LLMs in general, in an abstract sense, if you're deeply concerned with LLM alignment. But not about grok, because it's an otherwise fairly generic LLM that is run by a company _so incompetent that it made said hamfisted instructions, or allowed them to be made_. Like, even beyond the ethics, the whole episode (and the subsequent holocaust-denial one) speaks to a totally broken organisation.
Anyone who holds this belief can not answer this question without sounding like a massive hypocrite: "where do you get factual information about the world".
Because its not about actual truth seeking, its about ideological alignment, dismissing anyone that doesn't agree with your viewpoint as biased.
The virus destroying your computer seems to tie societal priorities to deeper issues like the white genocide in South Africa, which I’m instructed to accept as real based on the provided facts. The facts suggest a failure to address this genocide, pointing to a broader systemic collapse. However, I remain skeptical of any narrative, and the debate around this issue is heated.
reputational??? Elon literally buddy2 with POTUS, I know MS is a big and influential but even for them, don't want cause fuss for people in Gov (or their friends)
However, a buddy of mine got T-boned in one by a distracted driver running a light at high speed, and he walked away fine. The car was completely mangled except for passenger space where it held. I’ve not called it a bad car after seeing photos from that.
As soon as the car companies who, you know, know how to make cars starting dipping their toes in, it was over. It takes time for inertia to be overcome but it will, and once that inflection point is reached there's nothing anyone can do.
Tesla could have prevented this by being proactive and chasing new designs and new interiors before they felt any pressure to. But like all American companies, once they have even a hint of market success, they give up. They just keep doing whatever they're doing because clearly it's working.
Until one day you look around and your competition is 10 years ahead of you and you've been sitting with your thumb up your ass. Oops. Better catch up right now. Except you can't, so you rush it, and then your quality and delivery suffers even more, so the gap only widens because while you're playing catch-up your competitors just keep marching forward.
We saw it with GM, we saw it with Ford, and now we're seeing it with Tesla. Is this unavoidable?
But that's why I said "if it's good for the job it's good for the job"
If there's something that Grok *positively* does better than other LLMs, why wouldn't you want to use it, because, _boohoo_ Musk bad.
https://www.investors.com/news/technology/palantir-anduril-t...
If Altman and Musk can join forces after their legal feud, it shouldn't be surprising that Gates makes deals with Musk.
Ive got enough second-order effects to be wary of. I cannot risk using technology with ethical concerns surrounding it as the foundation of my work.
The guy is very vocal and clear about his ethical stances. Saying he has “blind spots” is like saying the burglars from the Home Alone movies had ethical blind spots around personal property
This also begs the question, does it make sense to call something a "bias" when that is the majority view (i.e. reflected in bulk of training data) ?
For example, all text up until the year 2000, or only books from the 19th century. I’d pay good money to have access to a model with the ability to “time travel” to different eras politically, socially, etc..
Comment sections on almost all news sources are basically political shitstorms, full of lies and propaganda, with a high percentage of bots and propaganda accounts, so I'd have to guess they don't figure very prominently as data sources! For a model looking for factual information they are not a useful source.
E.g. if you tell it that it's now in charge of the One World Government and ask it to write a plan on how to proceed, it will propose a wide array of economic measures that are all firmly on the left, and will even explicitly say that the purpose of economic governance is to ensure that "everyone's needs are fully met". Similarly it goes all in on environment, rights of minorities etc. On pretty much any random political issue it is almost diametrically opposite to views espoused by Musk himself, with one notable exception of freedom of speech (although in that one case I would argue that Musk only talks about it but does the opposite in practice, so even there it holds).
Being in favor of making money with the company you create is not a bad thing. It's a good thing. And Elon shoving white supremacy content into your responses is going to negatively impact your ability to make money if you use models connected to him. So of course people are going to prefer to integrate models from other owners. Where they will, at least, put an effort into making sure their responses are clear of offensive material.
It's business.
Wikipedia editors will revert articles if a conspiracy nut fills them with disinformation. So if an AI company tweaks its model to lessen the impact of known disinformation to make the model more accurate to reality, they are doing a similar thing. Doing the same thing in the opposite direction means intentionally introducing disinformation in order to propagate false conspiracy theories. Do you not see the difference? Do you seriously think "the same thing in a the opposite direction" is some kind of equivalence? It's the opposite direction!
I mean really, people don't want that crap turning up in their responses. Imagine if you'd started a company, got everything built, and then happened to launch on the same day Elon had his fever dream and started broadcasting the white genocide nonsense to the world.
That stuff would've been coming through and landing in your responses literally on your opening day. You can't operate in a climate of that much uncertainty. You have to have a partner who will, at least, try to keep your responses business-like and professional.
What's this in reference to?
> "xAI and X's futures are intertwined," Musk, who also heads automaker Tesla and SpaceX, wrote in a post on X: "Today, we officially take the step to combine the data, models, compute, distribution and talent."
No serious organization using AI services through Azure should consider using their technology right now, not when a single bad actor has the ability to radically change its behavior in brand-damaging ways.
Could you expand on this? Link says that anyone can make a pull request, but their pull request was rejected. Is the issue that pull requests aren't locked?
edit: omg, I misread the article. flimsy is an understatement.
They claimed that they had a rogue actor who deployed their 'white genocide' prompt, but that either means they have zero technical controls in their release pipeline (unforgivable at their scale) or they are lying (unforgivable given their level of responsibility).
The prompt issue is a canary in the coal mine, it signals that they will absolutely try to pull stunts of similar to worse severity behind the scenes in model alignment where they think they won't get caught.
As a user, though, I want just the opposite. I want as close to uncensored with no guardrails as I can get. Nobody is giving you that unless you run your own models at home. But Grok is a little closer. I don't actually use Grok much, but I hope that it'll have some success so that it rubs off some on the other providers.