An Introduction to YOLO26
72 points
7 hours ago
| 14 comments
| blog.roboflow.com
| HN
pzo
1 hour ago
[-]
FWIW there are today many more alternatives with better license. Here is a good meta repo for object detection with different model variants:

https://github.com/LibreYOLO/libreyolo

reply
esquire_900
5 hours ago
[-]
We've been running YOLO for a number of years (since v5) on soccer videos. None of the recent iterations have been significantly better, with v26 scoring worse then v9 and v11 on our tasks. Makes me wonder why this version is being pushed by roboflow and ultralytics.
reply
yfontana
2 hours ago
[-]
Can't speak for 26, but a year ago I worked on a project that migrated from v5 to 11 because of improved image segmentation capabilities. My understanding is that the newer versions don't necessarily have better precision/recall, but they tend to be faster for equivalent results, and have increased capabilities.
reply
teruakohatu
4 hours ago
[-]
When I was working with YOLO models it did seem like there was little practical improvements were between all of the spinoff models. It seemed people were pushing new models for personal recognition since the original creator stopped working on it.

That said, many of the claimed improvements in this model were are efficiency related.

reply
GL26
2 hours ago
[-]
What I find cool is not the model in itself, but the architectures / training methods found that make the model better. It gives out a new possibilites for other fields of AI. (Notably if you want to fine tune other CV models)
reply
Onavo
4 hours ago
[-]
The original YOLO author has long quit due to ethical reasons.
reply
utopiah
4 hours ago
[-]
Despite having a very memorable paper on the topic I believe they now work at Ai2.
reply
geuis
3 hours ago
[-]
Was evaluating YOLO26 within the last month for its on-device (iPhone 16 Pro) segmentation capabilities. Its decent, but its biggest limitation is that its only trained on 80 COCO classes (meaning pre-labeled images). If whatever is in your images isn't in the 80 classes, its invisible to YOLO26. Conversely I have SAM2 running on-device and its my current workhorse. The biggest benefit with SAM2 for me is that it does fine-grained segmentation masks but isn't trained on labeled images. This was a specific requirement for the app I'm building. SAM2 isn't anywhere as speedy as the native Vision framework apis, but it is more capable across a vastly wider array of potential image targets.
reply
larodi
2 hours ago
[-]
I would prefer GroundingDINo which is a sort of SAM and Dino combo which does open vocabulary.
reply
geuis
17 minutes ago
[-]
Doesn't work for my use-case. GroundingDINO is a text to bounding box model. SAM2 supports coordinate based masks (user taps or clicks somewhere in an image), which is what my research app needs.
reply
alex_duf
46 minutes ago
[-]
I'm sure the model is capable, but I find it funny that the sample image that contains three bears gets detected as two elephants.
reply
speedgoose
4 hours ago
[-]
I found that while CLIPSeg is slower than YOLOn, it is still pretty fast and if gave me much much better results without training.

If you want to detect objects and speed is important so you can’t use a LLM architecture, you can give it a try too.

reply
maelito
53 minutes ago
[-]
Can it measure the speed of a car on a video ?
reply
MaxikCZ
43 minutes ago
[-]
Same question, same answer: In pixels/second? Sure!

What are you trying to accomplish by those questions? Are you genuinely asking, or just baiting? If the former, didnt answers to your previous question make it clear that your question makes less sense than you might assume?

reply
larodi
3 hours ago
[-]
One thing I don’t get I why the article is credited to ‘Contributing Author’.

Meanwhile their very own Peter Skalski already does super job with host write ups and examples of all YOLO sorts and is well respected.

reply
yurimo
4 hours ago
[-]
Wow I'm old, I still remember working with YOLOv2.
reply
Tepix
4 hours ago
[-]
With some previous versions of YOLO I‘ve found pages that run it in real-time locally on your browser, analyzing the webcam.

Is there a demo like that available for YOLO26?

reply
GL26
2 hours ago
[-]
reply
pritambarhate
2 hours ago
[-]
Is the license for this AGPL? Can someone please confirm?
reply
Alles
3 hours ago
[-]
Reminder that Ultralytics is pushing AGPL in a very overreaching way with their models that's why they are not available in Frigate

https://github.com/blakeblackshear/frigate/pull/10717

reply
ktallett
5 hours ago
[-]
I am curious why there is no desire to produce a paper showcasing key details.
reply
teleforce
4 hours ago
[-]
Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models:

https://arxiv.org/abs/2606.03748

reply
steinvakt2
1 hour ago
[-]
Just a reminder that RF-DETR is better than yolo26
reply
m00dy
4 hours ago
[-]
Ive used YOLO26 in one of my projects, It was very easy to train on our custom dataset and also very easy to deploy even on rust with AVX2 support. This model is indeed fast and can be used for almost real time inference.
reply