It actually depends on the uArch, Apple silicon doesn't seem to have this restriction: https://news.ycombinator.com/item?id=43888005
> In a non-inlined, non-PGO scenario the compiler doesn't have enough information to tell whether the optimization is suitable.
I guess you're talking about stores and load across function boundaries?
Trivia: X86 LLVM creates a whole Pass just to prevent this partial-store-to-load issue on Intel CPUs: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Targ...
Would that failure be significantly worse than separate loading?
Just negating the optimization wouldn't be much reason against doing it. A single load is simpler and in the general case faster.
Rust is absolutely an improvement over C in every way.
Given this C code:
typedef struct { uint16_t a, b; } pair;
int eq_copy(pair a, pair b) {
return a.a == b.a && a.b == b.b;
}
int eq_ref(pair *a, pair *b) {
return a->a == b->a && a->b == b->b;
}
Clang generates clean code for the eq_copy variant, but complex code for the eq_ref variant. Gcc emits pretty complex code in both variants.For example, here's eq_ref from gcc -O2:
eq_ref:
movzx edx, WORD PTR [rsi]
xor eax, eax
cmp WORD PTR [rdi], dx
je .L9
ret
.L9:
movzx eax, WORD PTR [rsi+2]
cmp WORD PTR [rdi+2], ax
sete al
movzx eax, al
ret
Have a play around: https://c.godbolt.org/z/79Eaa3jYfhttps://github.com/memorysafety/rav1d/issues/1294
Is that explained in replies? I only see the original tweet as I'm not logged in.
https://code.videolan.org/videolan/dav1d/-/merge_requests/17...
I mean sure, max performance is great if you control every part of your pipeline, but if you're accepting untrusted data from users-at-large ffmpeg has at least a half-dozen remotely exploitable CVEs a year. Better make sure your sandbox is tight.
https://ffmpeg.org/security.html
I feel like there's a middle ground where everyone works towards a secure and fast solution, rather than whatever position they've staked out here.
What I have found that they (as many others who do great work) have very little tolerance of random junior language fanboys criticizing their decades of work without even understanding what they're talking about and constantly throwing out silly rewrite ideas.
"Because substantial amounts of human and financial resources go into these rust ports that are inferior to the originals. Orders of magnitude more resources than the originals which remain extremely understaffed/underfunded." -- https://x.com/FFmpeg/status/1924149949949775980
"... And we get this instead: <xz backdoor subtweet>" -- https://x.com/FFmpeg/status/1924153020352225790
"They [rust ports] are superior in the same way Esperanto is also superior to English." -- https://x.com/FFmpeg/status/1924154854051557494
It's kind of sad to see that snarky attitude. Clearly the corporate sponsors _want_ a more secure decoder. Maybe they should try and work _with_ the system instead of wasting energy on sarcasm on Twitter?
The SQlite folks, half of Linux, and other maintainers have encountered the same kind of zealotry. Dealing with language supremacism is annoying and I don’t blame ffmpeg for venting.
In fact, I’d even say that twitter thread is informative, because it demonstrates out how big tech fund their own pet projects over the actual maintainers.
What's the alternative?
ffmpeg is a monopoly in the space which means that you either take the exact set of tradeoffs they offer, or... well, you have no alternatives, so take it.
Of course the alternatives are never going to be as good as the originals until they've had more effort put into them. It took _years_ until the Rust gzip/zip libraries surpassed the C ones while being more secure overall.
† If you're a human. If you're an ostrich this is not impressive, but on the whole ostrichs aren't competing in the Olympic 100 metre sprint.
Have you tried manually defining alignment of Rust struct?
† Or an equivalent special purpose language, but WUFFS is right there
I suspect the future of video compression will also include frame generation, like what is currently being done for video games. Essentially you have let's say 12 fps video but your video card can fill in the intermediate frames via what is basically generative AI so you get 120 fps output with smooth motion. I imagine that will never be something that WUFFS is best suited for.
All of these things are bounded for actual codecs. AV1 allows storing at most 8 reference frames. The sequence header will specify a maximum allowable resolution for any frame. The number of motion vectors is fixed once you know the resolution. Film grain requires only a single additional buffer. There are "levels" specified which ensure interoperability at common operating points (e.g., 4k) without even relying on the sequence header (you just reject sequences that fall outside the limits). Those are mostly intended for hardware, but there is no reason a software decoder could not take advantage of them. As long as codecs are designed to be implemented in hardware, this will be possible.
That's how most video codecs work already. They try to "guess" what the next frame will be, based on past (for P-frames) and future (for B-frames) frames. The difference is that the codec encodes some metadata to help with the process and also the difference between the predicted frame and the real frame.
As for using AI techniques to improve prediction, it is not a new thing at all. Many algorithms optimized for compression ratio use neural nets, but these tend to be too computationally expensive for general use. In fact the Hutter prize considers text compression as an AI/AGI problem.
In any case I get what you're saying and I understand why codecs are going to be dynamically allocating memory, so thanks for that.
But to do that they have to keep state and do computations on that state. If you've got frame 47 being a P frame, that means you need frame 46 to decode it correctly. Or frame 47 might be a B frame in which case you need frame 46 and possibly also frame 48 - which means you're having to unpack frames "ahead" of yourself and then keep them around for the next decode.
I think that all counts as "dynamic state"?
I'm not a codec developer, I'm only coming at this from an outside/intuitive perspective. Generally, performance concerned parties want to minimize heap allocations, so I'm interested in this as how it applies in codec architecture. Codecs seem so complex to me, with so much inscrutable shit going on, but then heap allocations aren't optimized out? Seems like there has to be a very good reason for this.
That's not quite the case for encoding - that's where things get murky since you have way more freedom at what you can do to compress better.
There's a heck of a lot of distance from “not a lot” to “zero”, though.
Edit: If I had read the next paragraph, I'd have learn about [1] before commenting
AVG-SVT-PSY is particularly interesting to read up on as well.
Nicholas Nethercote's "How to speed up the Rust compiler" writings[1] fall into this same category for me.
Any others?
Real is about the only other codec I see that could be a name, but nobody uses that anymore.
[1] https://github.com/Voultapher/sort-research-rs/blob/main/wri...
I wouldn't say your article is too technical; it does go a bit deeper into details, but new concepts are explained well and at a level I found suitable for myself. Having said that, several times I felt that the text was a bit verbose. Using more succinct phrasing needs, of course, a lot of additional effort, but… I guess it's a kind of an optimization as well. :)
I've been trying to find that article ever since but I'm not able to. Anyone knows the article I'm talking about?
AV1 hardware decoders are still rare so your device was probably resorting to software decoding, which is not ideal.
I don't know instagram, but I would expect any provider to be handle almost any container/codec/resolution combination going (they likely use ffmpeg underneath) and generate their different output formats at different bitrates for different playback devices.
Either instagram won't accept av1 (seems unlikely) or they just haven't processed it yet as you infer.
I'd love to know why your commend is greyed out.
With the bitrate set to 100MB/s it happily encodes 2160p or even 3240p, the maximum resolution available when using Virtual Super Resolution (which renders at >native res and downsamples, is awesome for titles without resolution scaling when you don't want to use TAA)
https://goughlui.com/2024/01/07/video-codec-round-up-2023-pa...
The ITU standards have had a lot better record of inclusion in devices that people actually have; and often using hardware encode/decode takes care of licensing. But hardware encode doesn't always have the same quality/bitrate as software and may not be able to do fancier things like simulcast or svc. Some of the hardware decoders are pretty picky about what kinds of streams they'll accept too.
IMHO, if you're looking at software h.264 vs software vp9, I think vp9 is likely to give you better quality at a given bitrate, but will take more cpu to do it. So, as always, it depends.
That's a pretty messy way to measure. h.264 with more CPU can also beat h.264 with less CPU.
How does the quality compare if you hold both bitrate and CPU constant?
How does the CPU compare if you hold both bitrate and quality constant?
AV1 will do significantly better than h.264 on both of those tests. How does VP9 do?
They shifted to h.264 successfully, but I haven't heard of any more conferences to move forward in over a decade.
Currently "The Last of US S02E06" only has one AV1 - https://thepiratebay.org/search.php?q=The+Last+of+Us+S02E06 same THMT - https://thepiratebay.org/search.php?q=The+Handmaids+Tale+S06... These are low quality at only ~600MB, not really early adopter sizes.
AV1 beats h.265 but not h.266 - https://www.preprints.org/manuscript/202402.0869/v1 - People disagree with this paper on default settings
Things like getting hardware to The Scene for encoding might help, but I'm not sure of the bottleneck, it might be bureaucratic or educational or cultural.
[edit] "Common Side Effects S01E04" AV1 is the strongest torrent, that's cool - https://thepiratebay.org/search.php?q=Common+Side+Effects+S0...
There is one large exception, but I don't know the current scene well enough to know if it matters: sources that are grainy. I have some DVD and blurays with high grain content and AV1 can work wonders with those thanks to the in-loop grain filter and synthesis -- we are talking half the size for a high-quality encode. If I were to encode them for AVC at any reasonable bitrate, I would probably run a grain-removal filter which is very finicky if you don't want to end up with something that is overly blurry.
In my case, I get both 4k (h265) and 1080p (h264) blurays and let the client select.