Detecting AV1-encoded videos with Python
19 points
4 days ago
| 7 comments
| alexwlchan.net
| HN
breve
3 hours ago
[-]
> I’ve saved some AV1-encoded videos that I can’t play on my iPhone.

Sure you can. Install VLC on your phone and you'll be able to play the AV1 videos. Even the iPhone 7 released in 2016 can play AV1 video.

Don't agonise over battery life. The dav1d decoder for AV1 is great:

https://www.reddit.com/r/AV1/comments/1cf7eti/av1_dav1d_play...

https://www.reddit.com/r/AV1/comments/1cg2wv4/dav1d_battery_...

https://www.reddit.com/r/AV1/comments/1cgyace/dav1d_battery_...

https://www.reddit.com/r/AV1/comments/1chpz2r/dav1d_battery_...

reply
monster_truck
2 hours ago
[-]
It's not just great. It's so good that even on much older android phones than the ones tested in those links the brightness of the screen has a larger impact.

This is by design, so that even extremely dated smart tvs and etc can also benefit from the bandwidth savings.

Fun fact: I can't say which, but some of the oldest devices (smart tvs, home security products, etc) work around their dated hardware decoders by buzzsawing 4k video in half, running each piece through the decoder at a resolution it supports, then stitching them back together.

reply
zahlman
3 hours ago
[-]

    av1_videos = {
        p
        for p in glob.glob("**/*.mp4", recursive=True)
        if is_av1_video(p)
    }

    assert av1_videos == set()
Building a set just to check if it's empty is a bit more complexity than necessary. A more direct way that also bails out early:

    assert not any(is_av1_video(p) for p in glob.glob("**/*.mp4", recursive=True))
Equivalently (de Morgan's law):

    assert all(not is_av1_video(p) for p in glob.glob("**/*.mp4", recursive=True))
reply
KwanEsq
3 hours ago
[-]
> A more direct way that also bails out early

If it bails out early it is of no use to them.

> This means that if the test fails, I can see all the affected videos at once. If the test failed on the first AV1 video, I’d only know about one video at a time, which would slow me down.

reply
Scaevolus
2 hours ago
[-]
Note that ffprobe can output JSON which is much easier to handle than CSV. I have this snippet in my bashrc:

ffpj() { for f in "$@"; do ffprobe -v quiet -print_format json -show_format -show_streams "$f"; done }

reply
wolttam
4 hours ago
[-]
Somehow I thought this was going to be about detecting AV1 based on the decoded video frames, which would have been interesting!
reply
avidiax
3 hours ago
[-]
Yeah, I would think that the simulated grain of AV1 might be characterizable, even though, IIRC, it is pretty sophisticated.
reply
crazygringo
2 hours ago
[-]
This is a perfectly fine blog post, but is about something so basic I don't understand why it's submitted to HN.

Yes, ffprobe and mediainfo are the two common tools for this. This just feels like something that belongs as the answer to an everyday StackOverflow question. I don't understand what it's doing on the front page of HN.

reply
avidiax
3 hours ago
[-]
My first question is, where is this guy getting AV1 videos? Never seen these on the high seas.

Also, given that these videos are going to be reencoded, which is tremendously expensive, I feel that any optimization in this step is basically premature. Naively launching ffprobe 10,000 times is probably still less heavyweight than 1 reencode.

reply
breve
3 hours ago
[-]
YouTube encodes video to AV1.

Right click on a YouTube video and select "Stats for Nerds" to see which format it's using in your browser. AV1 will be something like "av01.0.09M.08".

You've probably watched a lot of AV1 video without realising it.

reply
monster_truck
2 hours ago
[-]
I exclusively download av1 encodes from places like tbp. It has fantastic quality for the filesize, and AV1 also benefits the most from the trick of encoding sdr content in 10 bit (more accurate quantization at a smaller size). Crazy that we can fit ~two hours of 1080p video at better than netflix quality (they bias their psnr/etc a little low for my eyes) on a single CD.

I'm not sure it's fair to call reencodes expensive. Sure, its relatively expensive to using ffprobe, but any 4 series nvidia gpu with 2 nvenc engines can handle five? simultaneous realtime encodes, or will get up to near 180fps if it isn't being streamed. Our "we have aja at home" box with four of them churned through something like 20,000 hours of video in just under two weeks.

reply
avidiax
26 minutes ago
[-]
My understanding is that you shouldn't be using HW accelerated encoding for any archival purpose except realtime capture.

The PSNR/bitrate is much lower for HW encode, but the encode rate is typically realtime or better. That's a great tradeoff if you are transcoding so that a device with limited bandwidth can receive the video while streaming, or so that you can encode a raw livestream from a video capture or camera. It's not so great if you are saving to disk and planning to watch multiple times.

reply
KwanEsq
3 hours ago
[-]
Sounds like you're just sailing the wrong seas. Some have plenty of AV1. Though those tend to be more obviously advertised as such, I believe, so perhaps this is about downloads from YouTube.
reply
senand
3 hours ago
[-]
Off-topic, but it’s actually a she
reply
01HNNWZ0MV43FF
3 hours ago
[-]
Maybe he transcoded them. I know some archivers who download in H.264 but then transcode to H.265 to save on disk. (I guess they don't seed?)
reply
nick238
3 hours ago
[-]
Is launching an ffmpeg process so heavyweight that there's a reason to avoid it? If anything, it feels like it would trivialize parallelism, which is probably a feature, not a bug, if you have a bunch of videos to go through.
reply
zahlman
3 hours ago
[-]
TFA claims:

> This is shorter than the ffprobe code, and faster too – testing locally, this is about 3.5× faster than spawning an ffprobe process per file.

And the calls to the MediaInfo wrapper are not really harder to parallelize. `subprocess.check_output` is synchronous, so that code would have to be adapted to spawn in a loop and then collect the results in a queue or something. With the wrapper you basically end up doing the same thing, but with `multiprocessing` instead. And you can then just reuse a few worker processes for the entire job.

reply
01HNNWZ0MV43FF
3 hours ago
[-]
Python must have libav bindings somewhere, you could certainly run that check in-process.

Off the top of my head, it's probably in the container metadata, so you'd just need libavformat and not even libavcodec. Pass it a path, open it, scan the list of streams and check the codec magic number?

reply