Project Fetch: Phase Two
31 points
2 hours ago
| 5 comments
| anthropic.com
| HN
didibus
28 minutes ago
[-]
I'm getting a bit tired of these disguised adverts.

Here's how non robotics engineers used AI to do a short robot integration task faster than other non robotics engineers without AI.

Where "better" mostly means faster, and who knows what happens on longer horizons, with actual robotics experts, robustness requirements, or tasks where the hard part is control rather than API spelunking.

reply
dragonwriter
18 minutes ago
[-]
> I'm getting a bit tired of these disguised adverts.

Its not disguised. Corporate blogs exist overtly to promote the company and its work.

Disguised promotions where notionally independent media publish promotional pieces as news concealing that they were fed to them by party whose products they promote area thing, but this is just the most overt undisguised promotion.

reply
bob778
1 hour ago
[-]
> Preliminary trials with Claude Mythos Preview showed that it would not provide an apples-to-apples comparison with other models because of how we had set up the experiment and how the model was served.

What does this mean? My guess is they couldn’t co-locate Mythos close enough to reduce latency?

(I’m assuming this experiment pre-dates the export controls)

reply
georgemcbay
1 hour ago
[-]
> My guess is they couldn’t co-locate Mythos close enough to reduce latency?

I doubt network latency is the reason. Even when connecting from literally across the world network latency is lost in the noise of overall response latency of even fast models.

The overall response latency of the model very well could have been the difference, though. AFAIK Mythos is structured to do relatively slow "deep thinking".

reply
bannable
44 minutes ago
[-]
Depending on the timeline, it could be that they're not allowed to access Mythos because of something like non-US citizens on the team or the lack of some way for them to meet the constraint DOD has them under.
reply
georgemcbay
35 minutes ago
[-]
I strongly suspect if that was the case they would have just directly mentioned that Mythos couldn't be used because of that reason, it would be less confusing and less suspect messaging than saying it wasn't an "apples-to-apples comparsion".
reply
jascha_eng
58 minutes ago
[-]
This mostly reads as a comparison between Opus 4.7 and 4.1 it would be more interesting if they reran the experiment against a team of humans with 4.7 and see how much the humans still improve the results today.
reply
joshu
59 minutes ago
[-]
stop trying to make fetch happen
reply
etchalon
43 minutes ago
[-]
Do you want Terminators? Because this is how you get Terminators.
reply