I have a coworker who brags about intentionally cutting off Waymos and robocars when he sees them on the road. He is "anti-clanker" and views it as civil disobedience to rise up against "machines taking over." Some mornings he comes in all hyped up talking about how he cut one off at a stop sign. It's weird.
However what’s more interesting is the deeper social contracts involved. Destroying other people’s stuff can be perfectly legal such as fireman breaking car windows when someone parks in front of a fire hydrant. Destroying automation doesn’t qualify for an exception, but it’s not hard to imagine a different culture choosing to favor the workers.
I don't think Luddites had an easy justification like this.
If you deliberately impede the flow of traffic, vehicularly assault, or otherwise sabotage the health and safety of drivers, passengers, and/or pedestrians, what do you deserve?
If you cause whiplash intentionally, what do you deserve?
What would be use of equal force in self defense in response to the described attack method?
Are movements valid if they have aims that you agree with, or are economic self-interest motivated, and invalid otherwise?
Something in people's brains often makes them think they are anonymous when they are driving their car. Then that gets disastrously proven otherwise when they need to show up in front of a judge.
If you are not that paranoid, you might appreciate the extra camera footage available from passing cars in an event of an accident involving you.
I don't know if they are or not. But why wouldn't they...
The problem is no different from LLMs though, there is no generalized understanding and thus they can not differentiate the more abstract notion of context. As an easy to understand example: if you see a stop sign with a sticker that says "for no one" below you might laugh to yourself and understand that in context that this does not override the actual sign. It's just a sticker. But the L(V)LMs cannot compartmentalize and "sandbox" information like that. All information is equally processed. The best you can do is add lots of adversarial examples and hope the machine learns the general pattern but there is no inherent mechanism in them to compartmentalize these types of information or no mechanism to differentiate this nuance of context.
I think the funny thing is that the more we adopt these systems the more accurate the depiction of hacking in the show Upload[0] looks.
[0] https://www.youtube.com/watch?v=ziUqA7h-kQc
Edit:
Because I linked elsewhere and people seem to doubt this, here is Waymo a few years back talking about incorporating Gemini[1].
Also, here is the DriveLM dataset, mentioned in the article[2]. Tesla has mentioned that they use a "LLM inspired" system and that they approach the task like an image captioning task[3]. And here's 1X talking about their "world model" using a VLM[4].
I mean come on guys, that's what this stuff is about. I'm not singling these companies out, rather I'm using as examples. This is how the field does things, not just them. People are really trying to embody the AI and the whole point of going towards AGI is to be able to accomplish any task. That Genie project on the front page yesterday? It is far far more about robots than it is about videogames.
[1] https://waymo.com/blog/2024/10/introducing-emma/
[2] https://github.com/OpenDriveLab/DriveLM
Every now and the I'll GPS somewhere and there will be a phatom stop sign in the route and I chuckle to myself because it means the Google car drove through when one of these signs was "fresh".
They never fixed any of them. I don't think the DPW cares. These intersection just turned back into the 2-way stops they had been for decades prior.
Compliance probably technically went up since you no longer have the bulk of the traffic rolling it.
right of way
A 4 way stop does perform better than a roundabout given highly disparate traffic volumes, because roundabouts suffer from resource starvation in that scenario, but 4 way stops are starvation-free.
Which is what it was for the first 70yr... And what most of them in this particular neighborhood still are, with a 0-6mo intermission.
> Powered by Gemini, a multimodal large language model developed by Google, EMMA employs a unified, end-to-end trained model to generate future trajectories for autonomous vehicles directly from sensor data. Trained and fine-tuned specifically for autonomous driving, EMMA leverages Gemini’s extensive world knowledge to better understand complex scenarios on the road.
https://waymo.com/blog/2024/10/introducing-emma/we will not have achieved true AGI till we start seeing bumper stickers (especially Saturday mornings) that say "This Waymo Brakes for Yard Sales"
> While EMMA shows great promise, we recognize several of its challenges. EMMA's current limitations in processing long-term video sequences restricts its ability to reason about real-time driving scenarios — long-term memory would be crucial in enabling EMMA to anticipate and respond in complex evolving situations...
They're still in the process of researching it, noting in that post implies VLM are actively being used by those companies for anything in production.
> They're still in the process of researching it
I should have taken more care to link a article, but I was trying you link something more clear.But mind you, everything Waymo does is under research.
So let's look at something newer to see if it's been incorporated
> We will unpack our holistic AI approach, centered around the Waymo Foundation Model, which powers a unified demonstrably safe AI ecosystem that, in turn, drives accelerated, continuous learning and improvement.
> Driving VLM for complex semantic reasoning. This component of our foundation model uses rich camera data and is fine-tuned on Waymo’s driving data and tasks. Trained using Gemini, it leverages Gemini’s extensive world knowledge to better understand rare, novel, and complex semantic scenarios on the road.
> Both encoders feed into Waymo’s World Decoder, which uses these inputs to predict other road users behaviors, produce high-definition maps, generate trajectories for the vehicle, and signals for trajectory validation.
They also go on to explain model distillation. Read the whole thing, it's not longhttps://waymo.com/blog/2025/12/demonstrably-safe-ai-for-auto...
But you could also read the actual research paper... or any of their papers. All of them in the last year are focused on multimodality and a generalist model for a reason which I think is not hard do figure since they spell it out
Waymo might have taxis that work in nice daytime streets (but with remote “drone operators”). But dollars to doughnuts someone will try something like this on a waymo taxi the minute it hits reddit front page.
The business model of self driving cars does not include building seperated roadways and junctions. I suspect long distance passenger and light loads are viable (most highways can be expanded to have one or more robo-lanes) but cities are most likely to have drone operators keeping things going and autonomous systems for handling loss of connection etc. the business models are there - they just don’t look like KITT - sadly
and once this video gets posted to reddit, an hour later every waymo in the world will be in a ditch
https://developer.nvidia.com/blog/updating-classifier-evasio...
I expect a self driving car to be able to read and follow a handwritten sign saying, say, "Accident ahaed. Use right lane." despite the typo and the fact that it hasn't seen this kind of sign before. I'd expect a human to pay it due attention to.
I would not expect a human to follow the sign in the article ("Proceed") in the case illustrated where there were pedestrians already crossing the road and this would cause a collision. Even if a human driver takes the sign seriously, he knows that collision avoidance takes priority over any signage.
There is something wrong with a model that has the opposite behaviour here.