https://www.reuters.com/technology/googles-gemini-co-lead-no...
It gives some context on the contributions of each of the authors. About Shazeer, from the article:
Shazeer’s joining the group was critical. “These theoretical or intuitive mechanisms, like self-attention, always require very careful implementation, often by a small number of experienced ‘magicians,’ to even show any signs of life,” says Uszkoreit. Shazeer began to work his sorcery right away. He decided to write his own version of the transformer team’s code. “I took the basic idea and made the thing up myself,” he says. Occasionally he asked Kaiser questions, but mostly, he says, he “just acted on it for a while and came back and said, ‘Look, it works.’” Using what team members would later describe with words like “magic” and “alchemy” and “bells and whistles,” he had taken the system to a new level.
Ok, these peopl have all gotten extensive training on how to hype for the non-technical crowd without saying anything of substance.
He also saw LLM would replace search before anyone else, and that is something to look at the Lamda or GPT-1's output and think: yeah this will answer all of our questions one day.
Uszkoreit wanted to build a more efficient/scalable language/seq2seq model that could take advantage of GPU parallelism (replacing RNNs which were the main approach to sequence modelling at that time).
Uszkoreit's insight was that although language appears sequential, it is in fact really part parallel part hierarchical, as can be seen by linguist's sentence parse trees where at each level there is parallelism/independence between the branches of the tree, with them getting combined at the next level up. This is what gave rise to the idea of a model that consisted of a stack of of parallel processing layers (transformer layers). I believe that attention was also part of the plan from day one, as this had already been proven to be valuable (Bahdanau) with RNN seq2seq modelling.
So, this is what Uszkoreit wanted to build, but by his own account he failed to come up with an implementation that matched or outperformed the prevailing RNN approach that he wanted to replace. At this point, Uszkoreit mentioned the idea to Shazeer, who got on board and eventually arrived at a performant architecture which was then pared back by an ablation process resulting in the initial encoder-decoder Transformer architecture. Shazeer later came up with the mixture-of-experts architecture, and also other optimizations after he left to found character.ai
I'm talking from plenty of group project experience here.
Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention and started the effort to evaluate this idea. Ashish, with Illia, designed and implemented the first Transformer models and has been crucially involved in every aspect of this work. Noam proposed scaled dot-product attention, multi-head attention and the parameter-free position representation and became the other person involved in nearly every detail. Niki designed, implemented, tuned and evaluated countless model variants in our original codebase and tensor2tensor. Llion also experimented with novel model variants, was responsible for our initial codebase, and efficient inference and visualizations. Lukasz and Aidan spent countless long days designing various parts of and implementing tensor2tensor, replacing our earlier codebase, greatly improving results and massively accelerating our research.
In any case, if the authors considered their contributions equal, that's good enough for me.He left Google in 2021 to co-found Character.AI. In 2024, Google brought him and some Character.AI researchers back via a licensing/talent deal with Character.AI (reportedly around $2.7B). He was then made a Gemini co-lead.
Now he’s leaving Google again for OpenAI.
Exciting times!
What is exiting about this?
Google lost three critical years chasing AGI, and got acquired by SpaceX, now a Dyson Sphere startup whose pitch deck is just: "What if we put a paywall around the Sun?"
I hope this is not accurate but I'm afraid it is: https://x.com/signulll/status/2067446889956430273
Sama, and any other founder, will always have a difficult fight against bureaucracy, and once you let a little bit in, the bureaucracy's sole purpose becomes to grow itself.
If the issue is inefficiency, tons of meetings, too much team alignment etc, then that's the issue that you need to tackle, and these issues can already appear in a 50-100 employee company. Sure, that's an easy problem to solve with a smaller size but unless you hired people for no reason, these people have a very specific set of problems to tackle and are often, in these companies, the best in class to tackle them, culling half of the company isn't going to make things better.
(And X rehired part of the laid-off engineers)
What percentage of Google employees are engineers...
The leaps forward need bloat. A startup can execute on specific vector direction way better.
Now back to your point, what did X deliver with its lean ops? It seems that it needed 2 bailouts (one from xAI, and one from space X)
You could cut Google's size by 40% and they'd still have more corporate employees than Apple.
(Google has ~190k employees, Apple has ~160k but 50k of those are retail staff, so ~110k corporate)
and tens of losing companies that make balloons or whatnot
We don't hear about Tom from MySpace.
Sadly the gap between reality and satire has shrunk.
But yes. I also wish that show would come back.
Noam shazeer would be google head dreamer
Noam is the real deal, he was pretty legendary within old-time ('00s) Google engineering. Paul Buchheit had a story about interviewing him with the "how to write a spellchecker" question and then him coming up with something better than the state-of-the-art, then basically delivering Google's spell corrector in his first 2-week Noogler project.
"Google and Character.AI agree to settle lawsuits over teen suicides" - https://www.axios.com/2026/01/07/google-character-ai-lawsuit...
Be aware...very disturbing: https://www.judiciary.senate.gov/imo/media/doc/e2e8fc50-a9ac...
Considering what character.ai is, maybe he should have at least taken a shot at it.
Seems like there are some insights here!
edit: it seems the post has been removed but comments are viewable.
1 liner summary:
To put it lightly, the dude was politically outspoken and held strong beliefs.
> The League of Nations gave Britain mandatory power over Palestine in 1922. British rule and Arab efforts to prevent Jewish migration led to growing violence between Arabs and Jews, causing the British to announce its intention to terminate the Mandate in 1947. The UN General Assembly recommended partitioning Palestine into two states: Arab and Jewish. However, the situation deteriorated into a civil war. The Arabs rejected the Partition Plan, the Jews ostensibly accepted it, declaring the independence of the State of Israel in May 1948 upon the end of the British mandate. Nearby Arab countries invaded Palestine, Israel not only prevailed, but conquered more territory than envisioned by the Partition Plan. During the war, 700,000, or about 80% of all Palestinians fled or were driven out of territory Israel conquered and were not allowed to return, an event known as the Nakba (Arabic for 'catastrophe') to Palestinians. Starting in the late 1940s and continuing for decades, about 850,000 Jews from the Arab world immigrated ("made Aliyah") to Israel.
Also, it was Ottoman territory for hundreds of years up to WWI. I've had friends tell me for some reason about how Palestine was an independent country before... literally wasn't.
To some it still means favoring any existence of a Jewish state. The inertia isn't there because aside from the original partition plan being pushed by the UK, other countries have attacked Israel several times later in ways they would've have withstood without outside support.
Besides that, Google is in a pretty good position, they're not bleeding money on AI like Anthropic/OpenAI, and they own product verticals where they can integrate it. Plus they have a mature ads-model which is what might actually drive a bit of revenue for LLMs.
That's their moat.
Maybe also stolen copyrighted content that cannot be found anywhere else now, so they are the only ones who can train on it.
Don't we all want to (automatically) and passively invest in a company losing billions of dollars ?
At least we can diversify our portfolio from SpaceX.
Grabbing market-share if you have investors that are ready to burn cash infinetely. Find a hot niche, buy a banana 1 USD, sell it for 0.10 USD.
Example: Cursor, they became popular because they were selling ChatGPT unlimited for 20 USD / month.
When they launched, just a reskinned VS Code, "fastest growing AI company"
No coincidence they were bought by SpaceX, who wants to consolidate revenue even if non-sense as long it helps other investors to exit. It shows rapid growth.
Profit is the real moat.
One example: Nvidia. Proprietary tooling, proprietary IP, proprietary hardware, no alternative, expensive.
You don't know what Cursor's game plan was. Maybe acquisition was their plan.
Buying at $1 and selling for $0.1 is still viable as long as they have money in the bank, until they achieve their goals. Most startups start out that way. Even giving away their services for free.
Obviously there will be failures. Doesn't mean they have no moat. Can you say a business with 100 customers and $1000 debt is less viable than one with a single customer and no debt?
Possibly true. Any smart innovations developed by one organization will be smuggled into others.
Training, inferring, and data collection, infrastructures are definitely moats. High-volume usage feedback is also hard to come by for new entrants.
Noam has a deep expertise in these systems at every level, both algorithmically and at production scale, and knows how to leverage things at different levels.
It's not like Google won't have anyone else that can do what he does, but at the same time, it's an implicit criticism of Google's culture, operations, development, and overall AI program. Shazeer is well past the point where the paycheck is the deciding factor, although I'm certain he is very well paid. Having the freedom to innovate and build free from the corporate fuckery of Google and Facebook is probably more valuable than the pay raise he got with the move, and OAI has the advantage of not having to cope with decades of corporate cruft and inertia. They'll get there - all corporations do - but they're relatively young enough to still be nimble.
As do thousands of people say this point. You think the head of deepseek doesn't?
1. There are already multiple "sota" models on the market that compete with only marginal gains between them (OpenAI, Anthropic, Google/Gemini) and some that are catching up (DeepSeek, Qwen,..).
2. The fact that something is a hard engineering problem does not mean it's generating revenue. So while what you said is true, deep expertise is required to push the industry forward, I don't think that is going to matter for the bottom line of these companies. Hence why I think the models don't give a company any 'moat' in a capitalist economy.
Karpathy to Anthropic, now Noam to OpenAI.
Question two: Why are OpenAI spending that money taking talent from Google, who can definitely outspend them for talent, and not Anthropic, who are leading the market and are at least somewhat financially constrained.
But I'm sure for at least some folks, this is true, given recent valuations.
I always appreciated Jeff having a level head ... which this article seems to confirm:
https://www.yahoo.com/news/articles/google-cracks-down-posts...
What they're working on is just making peoples jobs, skills obsolete and trying to invent machines that will concentrate the worlds wealth into the hands of the people who own those machines.
Popular entertainment and unique progress of human civilization can’t be really compared either
It's funny, but with the AI hires/moves it feels more like satire now.
I wouldn't expect OpenAI to start releasing open weight competitive models again, but I could be wrong.
As an outsider, I'd be really curious to understand why, given how well positioned they seem to be in the AI battle:
- huge, quasi unmatched data war chest
- huge, quasi unmatched, planet-scale infrastructure
- native AI chip design and production (TPU)
- the core ideas for what we now know as "AI" were invented there
- deepmind, enough said
- pretty much the deepest pocket of all the AI players with the possible exception of MSFT
- a massively large user base and reach to deploy AI to (Android, YT, Cloud, Search, Email, ...)
- supposedly one the best engineering culture of the valley
Why do the best people leave ?
Why do their AI product always come in 3rd place ?
Why can't they seem to take the lead, both in terms of product design or in term of raw LLM performance?
The only answer I can think of is:
- culture is completely broken
- management sucks something fierce
- company is so fat and rich no one is actually interested in winning anymore
Google at its core is not a dev tools company and it has become evident that is where the money is given the verifiable nature of software. Hixie's reflections on his tenure at Google still ring in my head to this day, though I have never worked there[1].
The people at the helm of Google no longer see the company's identity as something which must be channeled through a product or an experience. Some will point to the DoubleClick acquisition, others will point to Google Reader, or Pichai's ascension. Despite his very short tenure, MBA/McKinsey-brain is a very real phenomenon and it's no mistake that it shaped the "promotion packaged as a product launch" culture that steered Google away from seriously betting on anything that wasn't ads. To quote the signull tweet linked elsewhere in this thread, you can have everything at Google, except for permission.
Most importantly--I don't think there's a single tech product where I can point and say "Google wouldn't do that". You can contrast this with say, other Alphabet companies which don't suffer from this remotely as much. It is VERY clear what Waymo and YouTube are trying to accomplish, and while it frequently makes a ton sense for the companies to share infrastructure and product knowledge, YouTube does an exceptional job on the product side of making it very clear what they would and wouldn't do. They have experimented and shut down experimental features before (is their MOOC functionality still around?), but since it's fairly clear Google specifically is no longer working in service to the mission of providing the world's best digital portal for accessing information, I think it would behoove of them to figure out what their mission is.
Also, why didn't they nail him down contractually when they bought character.ai ... isn't that pretty standard with these type of superstar (re)hires?
OpenAI is in a unique position right now to grant pre-IPO options (probably in the form of RSUs). And they wanted him badly enough to grant the extra options necessary to effectively 'buy out' whatever unvested Google bonus he's walking away from.
LOL.
I doubt that the money had anything to do with it.
I also doubt that the state of the technology at OAI vs. Google had much to do with it, Google is behind no doubt, but the gap is not as far as we know, insurmountable.
I suspect that this is a leadership clash. Noam was working in GDM. GDM somehow went away from coding and RSI into "world models" and that has played out very poorly. Who made that call? Who was still playing politics?
Given this is Noam the list of people that could be pissing him off is very small: Demis, Sergey (?!), a couple of VPs in GDM.
What the hell happened?