FFmpeg merges WebRTC support
877 points
4 months ago
| 28 comments
| git.ffmpeg.org
| HN
Sean-Der
4 months ago
[-]
I am so incredibly excited for WebRTC broadcasting. I wrote up some reasons in the Broadcast Box[0] README and the OBS PR [1]

Now that GStreamer, OBS and FFmpeg all have WHIP support we finally have a ubiquitous protocol for video broadcasting for all platforms (Mobile, Web, Embedded, Broadcasting Software etc...)

I have been working on Open Source + WebRTC Broadcasting for years now. This is a huge milestone :)

[0] https://github.com/Glimesh/broadcast-box?tab=readme-ov-file#...

[1] https://github.com/obsproject/obs-studio/pull/7926

reply
bradly
4 months ago
[-]
That pr is really great work both technically and interpersonally. A fun read for sure. Great work and thank you for your determination.
reply
maxmcd
4 months ago
[-]
Thanks for all your work Sean! It's been a delight to use your webrtc libs and see your impact across a broad range of technical efforts.
reply
Sean-Der
4 months ago
[-]
Thank you :)

When are you coming back to the WebRTC space, lots more cool stuff you could b doing :) I really loved [0] it's so cool that a user can access a server behind a firewall/NAT without setting up a VPN or having SSH constantly listening.

[0] https://github.com/maxmcd/webtty

reply
monocularvision
4 months ago
[-]
Your work in this area has been phenomenal. Thank you! I use broadcast-box all the time.
reply
echelon
4 months ago
[-]
What sort of infrastructure do you need for scaling WebRTC multicast?

Are we entering an era where you don't need Amazon's budget to host something like Twitch?

reply
Sean-Der
4 months ago
[-]
Yes we are :) When OBS merges the PR [0] things are going to get very interesting.

Before you needed to run expensive transcoding jobs to be able to support heterogenous clients. Once we get Simulcast the only cost will be bandwidth.

With Hetzner I am paying $1 a TB. With AV1 or H265 + Simulcast I am getting 4K for hundreds of users on just a single server.

We will have some growing pains, but I am not giving up until I can make this accessible to everyone.

[0] https://github.com/obsproject/obs-studio/pull/10885

reply
matt-p
4 months ago
[-]
I have found it's hard to get past ~18Gbps on commodity servers and ~90Gbps on high spec, carefully specced servers. I presume you find the same?
reply
michaelt
4 months ago
[-]
A twitch 720p stream is only 4 Mbps. 1080p? 6-8 Mbps

So if you've got ~18 Gbps of upload bandwidth you're ready for 10,000-20,000 viewers.

reply
miki123211
4 months ago
[-]
With how ubiquitous gigabit symmetric is becoming, I wonder if you could even do P2P nowadays.

Assuming a single gigabit symmetric connection could dedicate at most 100mb of upload bandwidth, you'd need one such viewer per each 25 viewers with a worse connection. This feels achievable.

You'd have 1 server that broadcasts to at most 10K tier-1 viewers. Tier 2 viewers get 3 tier1 ips for failover, and pre-negotiate a connection with them, e.g. through STUn, to get sub-second failovers in case their primary source quits the stream. A central control plane load balances these.

With something like 15s of buffer (acceptable for a gaming stream, not so much for sports, where your neighbors might be watching on satellite and cheer), this feels achievable.

reply
messe
4 months ago
[-]
> With how ubiquitous gigabit symmetric is becoming, I wonder if you could even do P2P nowadays.

CGNAT is going to make that a hassle.

reply
chaz6
4 months ago
[-]
Ideally any ISP resorting to CGN would be providing IPv6 support, but Tailscale shows that nat hole-punching can work well [1]. I'm not sure if that's feasible to implement in a web browser though.

[1] https://tailscale.com/blog/how-nat-traversal-works

reply
messe
4 months ago
[-]
It can work okay, but still not perfectly. Before I asked my ISP for a static address, tailscale connections between my place and my partner's only managed to maintain a direct connection half the time. The other half of the time, they required a relay.
reply
naikrovek
4 months ago
[-]
yep. I'm still kind of shocked at how little ipv6 has been deployed.

i'm so effing tired of NAT here and NAT there and NATs in between NATs. NAT complicates things for almost no reason now that ipv6 was released almost 30 years ago. Wait, what? 30 years? Now I'm VERY shocked at how little ipv6 has been deployed.

reply
littlestymaar
4 months ago
[-]
> With how ubiquitous gigabit symmetric is becoming, I wonder if you could even do P2P nowadays.

You don't even need gigabit connections to make p2p work: a video stream is usually between 1-10mbps, with which you can do p2p even with a regular connection.

(I used to work for a start-up doing p2p streaming using WebRTC in the browser, we got up to 90% efficiency on live streaming back in 2015, and 80% for VoD).

I'm still very confused that this technology hasn't become mainstream (there are privacy concerns, but that doesn't usually stop the tech market…).

reply
matt-p
4 months ago
[-]
In theory I think you're right, but you do need a really smart control plane. Just because a connection is fast doesn't mean it's not metered (let's say fast 5G, starlink, rural fiber) and so on.

Are all of the issues P2P brings really worth it?

I'd say this definitely opens up streaming from your desktop with 1 CPU core handling a SFU for say 100 viewers = 500Mb or from a $5/month VPS if you've not got the capacity at home. That's pretty awesome, for most people no need to use P2P.

reply
esseph
4 months ago
[-]
It's not nearly as "ubiquitous" as you may think
reply
matt-p
4 months ago
[-]
Not sure I follow your maths there.

If we assumed an average of 6Mb per stream that's 3000 streams, practically speaking a little lower.

It's all relative I guess but it's not that high.

reply
ablob
4 months ago
[-]
Multicast enters the room
reply
Evidlo
4 months ago
[-]
Does multicast actually work from a home network to another set of homes over the internet? I thought this traffic would just get dropped by one of the hops along the way if you tried it.

https://networkengineering.stackexchange.com/questions/47994...

reply
ablob
4 months ago
[-]
I think it never took off and is not supported by many if not most routers. However, it is an efficient solution for sending live-stream data over the web. I think eventually the pressure to not send packets for each consumer but at most once per port will take over and force providers to implement this or a better solution.
reply
matt-p
4 months ago
[-]
It won't even make it through your home "gateway", in practice it's usable at layer 2 only.
reply
matt-p
4 months ago
[-]
You can't multicast over the internet.
reply
naikrovek
4 months ago
[-]
this is an outrage. i suggest we riot.
reply
esseph
4 months ago
[-]
Finally
reply
bryancoxwell
4 months ago
[-]
I just want to say I am loving your enthusiasm
reply
NetOpWibby
4 months ago
[-]
Thanks to a post on here a few months ago about Cloudflare R2, I finally felt motivated enough to work on a video platform idea I have. I still don’t understand ffmpeg but my transcoder works!

Now I see this news?! Perfect timing. (:

reply
naikrovek
4 months ago
[-]
> I am not giving up until I can make this accessible to everyone.

I like this guy.

reply
xmprt
4 months ago
[-]
Working in the events broadcasting space, this opens up OBS to being a viable alternative to professional software like vMix. Especially the P2P support and support for broadcasting multiple scenes seem extremely valuable to have.
reply
WhyNotHugo
4 months ago
[-]
Are there any video players which can play a webrtc stream? Last I checked, VLC and other popular tools still don’t support it.
reply
numpad0
4 months ago
[-]
[1]:

  gst-launch-1.0 playbin3 uri="gstwebrtc://localhost:8443?peer-id=<webrtcsink-peer-id>"
WebRTC is normally used in bidirectional use cases like video chat with text options, so I don't think it so odd that VLC doesn't outright support it. VLC does not support dialing into an Asterisk server, either.

[1] https://gstreamer.freedesktop.org/documentation/rswebrtc/web...

reply
RedShift1
4 months ago
[-]
That's impossible, VLC supports everything. If VLC doesn't support it, it doesn't exist.
reply
carlhjerpe
4 months ago
[-]
While that might be true I've found mpv more approachable when doing weird inputs
reply
mey
4 months ago
[-]
XAVC HS 4k 10Bit HEVC 4:2:2 on Windows.

Plex and ffmpeg, perfectly fine. VLC is not a fan.

reply
oskenso
4 months ago
[-]
I wish vlc supported usf, 2sf and minigsf
reply
byteknight
4 months ago
[-]
Amen.
reply
mortoc
4 months ago
[-]
I'd guess VLC will get support for it soon now that ffmpeg supports it.
reply
Gormo
4 months ago
[-]
Possibly, but VLC maintains its own codec libraries and doesn't rely on FFMpeg.
reply
bilekas
4 months ago
[-]
Maybe I'm wrong but in this case, couldn't you create your own middleware server that could consume the Weber stream feed and then stream out as a regular vlc consumable feed? I'm guessing there will be some transcoding on the fly but that should be trivial..
reply
shmerl
4 months ago
[-]
Should ffplay support it if ffmpeg added support for it in general?
reply
rmoriz
4 months ago
[-]
Any plans to add multipath/failover-bonding support? e.g. mobile streaming unit connected with several 5G modems. Some people use a modified SRT to send H.265 over multiple links.
reply
Sean-Der
4 months ago
[-]
Absolutely! Some people have modified libwebrtc to do this today, but it wasn't upstreamed.

ICE (protocol for networking) supports this today. It just needs to get into the software.

reply
1oooqooq
4 months ago
[-]
i was using vnc for remote dosbox gaming on the phone. now i can sink infinite amount of time trying to do a input handler webapp and using this+obs instead! thanks!
reply
athrun
4 months ago
[-]
I've also been trying (and mostly failing) to build such a setup over the last few weeks. What are you thinking in terms of the overall building blocks to get this to work?

I've been struggling to get a proper low-latency screen+audio recording going (on macos) and streaming that over WebRTC. Either the audio gets de-sync, or the streaming latency is too high.

reply
1oooqooq
4 months ago
[-]
games i plan to play don't care about latency, which solves most of your problems :)

but this+obs+a webapp for input+ydotool to pass the input to dosbox. then i can just open a page on the browser on the phone.

reply
jauntywundrkind
4 months ago
[-]
Not the SCTP parts! It's implementing WebRTC-HTTP Ingestion Protocol (WHIP), a commonly used low-latency HTTP protocol for talking to a gateway that talks actual WebRTC to peers over WebRTC's SCTP-based protocol. https://www.ietf.org/archive/id/draft-ietf-wish-whip-01.html

I hope some day we can switch to a QUIC or WebTransport based p2p protocol, rather than use SCTP. QUIC does the SCTP job very well atop existing UDP, rather than add such wild complexity & variance. One candidate, Media-over-Quic ?MoQ), but the browser doesn't have a p2p quic & progress on that stalled out years ago. https://quic.video/ https://datatracker.ietf.org/group/moq/about/

reply
Sean-Der
4 months ago
[-]
How would you like to see/use the SCTP parts? I am not sure how to expose them since the WHIP IETF draft makes no mention/suggestion of it.

Most 'WHIP Providers' also support DataChannel. But it isn't a standardized thing yet

reply
jauntywundrkind
4 months ago
[-]
WebRTC actual's complezity is very high. WHIP seems to be the standard path for most apps to integrate, but it does rely on an exterior service to actually do anything.

Hypothetically ffmpeg could be an ICE server for peer-connecting, do SDP for stream negotiation possibly with a side of WHEP (egress protocol) as well, could do SCTP for actual stream transfer. Such that it could sort of act as a standalone peer, rather than offload that work to a gateway service.

Worth noting that gstreamer & OBS also are WHIP based, rely on an external gateway for their WebRTC support. There's not one clear way to do a bunch of the WebRTC layer cake (albeit WHEP is fairly popular I think at this point?), so WHIP is a good way to support sending videos, without having to make a bunch of other decisions that may or may not jive with how someone wants to implement WebRTC in their system; those decisions are all in the WHIP gateway. It may be better to decouple, not try to do it all, which would require specific opinionative approaches.

reply
qwertox
4 months ago
[-]
What does this mean? That websites could connect directly to an FFmpeg instance and receive an audio- and/or video-stream?

Phoronix has a somewhat more informative page: https://www.phoronix.com/news/FFmpeg-Lands-WHIP-Muxer

reply
bigfishrunning
4 months ago
[-]
It means that programs that use the FFmpeg libraries (looks like libavformat specifically) can consume webrtc streams
reply
okdood64
4 months ago
[-]
I still don't understand any practical use cases. Can you give some examples? (I'm not being obtuse here I'm genuinely curious what this can enable now.)
reply
darkvertex
4 months ago
[-]
WebRTC excels at sub-second latency peer to peer, so you can do near-realtime video, so anywhere that is useful.

Say you wanted to do a virtual portal installation connecting views from two different cities with live audio, you could have ffmpeg feed off a professional cinema or DSLR camera device with a clean audio feed and stream that over WebRTC into a webpage-based live viewer.

Or say you wanna do a webpage that remote controls a drone or rover robot, it would be great for that.

reply
Sesse__
4 months ago
[-]
The irony is that you don't _actually_ need WebRTC to get subsecond latency; you can fairly reliably get ~100–200ms (plus network latency) with a completely normal TCP stream.[1] But since browsers have effectively standardized on HLS, whose design is completely antithetical to low-latency (you _can_ do low-latency HLS, but only with heroic efforts), low-latency streaming video has never really been part of their bread and butter. So instead, we abuse a _much_ more complicated protocol (WebRTC), because that happens to hit a path that was meant for low-latency videoconferencing.

(I did sub-100ms glass-to-glass streaming with VLC back in the day, so it is eminently possible. But the browser is in your way.)

[1] Much less than that is going to be tricky under non-perfect network conditions, because once you start having any sort of packet drop, you want to go from TCP's retransmission regime and instead start dropping packets, take the artifacts for a little while, and then go on.

reply
numpad0
4 months ago
[-]

  node1# nc -u node2 12345 < /dev/fb0
  node2# nc -lu 12345 > /dev/fb0
The "sub-second latency" thing is the standardized punchline coming from WebRTC folks, but yes, it's confusing. Nothing can make video flow faster than above, only thing you can do by inventing a new standard is to minimize the overhead you must add for your purposes.
reply
Sean-Der
4 months ago
[-]
I saw this also. WebRTC just is the path of least resistance/highest adoption at this point.

We could go make a better/simpler standard for video streaming that is TCP. What a giant battle though that would never seen adoption .

I have accepted/embraced the challenge of making WebRTC as accessible as possible. Stuff like WebRTC for the Curious in hopes of making it less painful for everyone dealing with the complexity :)

reply
Sesse__
4 months ago
[-]
> We could go make a better/simpler standard for video streaming that is TCP. What a giant battle though that would never seen adoption .

What do you mean? <video> in HTTP against a stream works, you don't need a new standard. But it's not a low-latency path (you cannot control the buffer).

reply
raymond_goo
4 months ago
[-]
Explanation: HTTP Live Streaming slices the bitstream into “segments” (traditionally 6–10 s each) and only starts playing after it has downloaded several of them. Out of the box that means 30–60 s of startup and live-edge latency—fine for linear TV, terrible for anything interactive.

Apple’s LL-HLS spec shrinks those segments into “partial segments” and uses CMAF to let the player start decoding while a chunk is still arriving. With careful encoder tuning, HTTP/2/3 push, CDN support, and a compatible player you can reach 2–5 s, sometimes a bit lower—but every link in the chain has to cooperate, so implementations are still called “heroic” for a reason.

Safari plays HLS natively and on other browsers you can bolt on hls.js via Media Source Extensions. DASH, RTMP, SRT, etc. all need extra code or plugins, so HLS became the “safe default” for on-demand and broadcast-style streaming even though it isn’t low-latency friendly.

reply
_flux
4 months ago
[-]
And in real world (e.g. mobile networks) there is going to be packet loss, so TCP is a non-starter for production-quality real-time streaming.
reply
lmm
4 months ago
[-]
My first thought is a nice way to save a stream in whatever format you want (e.g. transcode for watching on an old phone or something on your commute), just ffmpeg -i <stream> and then all your usual video format options, instead of having to download it and then convert it afterwards.

ffmpeg also has some processing abilities of its own, so you could e.g. greenscreen (chroma key) from a stream onto an existing video background.

ffmpeg is a pretty low-level building block and as others have said, it's mostly used as a library - a lot of video players or processing tools can now add support for stream inputs easily, and that's probably where the biggest impact is.

reply
MintPaw
4 months ago
[-]
You can only really get a video stream out of Unreal Engine using WebRTC, so now clients can at least use ffmpeg/avconv instead of something even worse like libdatachannel.
reply
jcelerier
4 months ago
[-]
I want my desktop app https://ossia.io which uses ffmpeg to be able to send & receive video to another computer over internet without having to fiddle with opening ports on each other's routers. This combined with a server like vdo.ninja solves that.
reply
ninkendo
4 months ago
[-]
My guess is you could more easily build an open source client for whatever video conferencing system you want that uses WebRTC (most services like teams, discord, zoom, etc seem to use WebRTC as a fallback for browsers, if not using it wholesale for everything, although there may be countermeasures to block unofficial clients.)
reply
dark-star
4 months ago
[-]
Are there any popular/well-known WebRTC senders (or servers)? I'm pretty sure this is not for YouTube etc., right? So what would I watch through WebRTC?
reply
Sean-Der
4 months ago
[-]
Twitch supports WHIP today. Lots of WebRTC services support WHIP (Cloudflare, LiveKit, Dolby...)

webrtcHacks has an article on it[0] kind of old, but captures the spirit of it!

[0] https://webrtchacks.com/tag/simulcast/

reply
qwertox
4 months ago
[-]
So it's only the receiving part of WebRTC, now being able to use WHIP in order to ask a server for a stream?
reply
Sean-Der
4 months ago
[-]
Currently only the sending part!

WHIP is 'pushing media via WebRTC' - https://www.ietf.org/archive/id/draft-ietf-wish-whip-01.html

WHEP is 'pulling media via WebRTC' - https://github.com/wish-wg/webrtc-http-egress-protocol/blob/...

WHEP isn't standardized/still changing a bit. After it lands I will try and get it into OBS/FFmpeg (and more)

reply
jcelerier
4 months ago
[-]
Hmm, what does that mean for instance for workloads that use gstreamer's whepsrc? Is there a risk of incompatibility of a whep server running today with next year webrtc?
reply
msgodel
4 months ago
[-]
That should make self hosting streams/streaming CDNs way easier.

If you know how to use it ffmpeg is such an amazing stand alone/plug and play piece of media software.

reply
Sean-Der
4 months ago
[-]
It's so exciting.

Especially with Simulcast it will make it SO cheap/easy for people.

I made https://github.com/Glimesh/broadcast-box in a hope to make self-hosting + WebRTC a lot easier :)

reply
eigenvalue
4 months ago
[-]
LLMs really know how to use it incredibly well. You can ask them to do just about any video related task and they can give you an ffmpeg one liner to do it.
reply
rietta
4 months ago
[-]
Wow, you are not wrong. I just asked Gemini "how can I use ffmpeg to apply a lower third image to a video?" and it gave a very detailed explanation of using an overlay filter. Have not tested its answer yet but on its face it looks legit.
reply
Ajedi32
4 months ago
[-]
It could very well be legit, but if you "have not tested its answer yet" the fact that it can generate something that looks plausible doesn't really tell you much. Generating plausible-sounding but incorrect answers is like the #1 most common failure mode for LLMs.
reply
asadm
4 months ago
[-]
in recent usage, that only happens 10% of the time for me. Usually the results are grounded and so work usually fine.
reply
bigfishrunning
4 months ago
[-]
Could you imagine if any other software failed silently and plausibly 10% of the time? It would never get off the ground. VC money is a hell of a drug
reply
refulgentis
4 months ago
[-]
It's amazing --- I cut my teeth in software engineering with ffmpeg-related work 15 years ago, LLMs generating CLI commands with filters etc. is right up there with "bash scripts" as things LLMs turned from "theoratically possible, but no thanks unless you're paying me" into fun, easy, and regular.

Yesterday I asked it for a command to take a 14 minute video, play the first 10 seconds in realtime, and rest at 10x speed. The ffmpeg CLI syntax always seemed to be able to do anything if you could keep it all in you head, but I was still surprised to see that ffmpeg could do it all in one command.

reply
rietta
4 months ago
[-]
I never found bash scripting disagreeable. I have thousands of scripts for both work and my everyday computer usage. I keep a ~/bin folder in my path where I place useful scripts that I have written for my own use. One thing that keeps me using bash for this purpose over Python or Ruby (which I use for serious programming) is that I can take a command line invocation that I manually constructed and tested and put it in a script without modification.
reply
sahilagarwal
4 months ago
[-]
Me too! I am a data engineer so whenever I have pipeline jobs running, I have a script that monitors them. When the jobs finish, an audio plays stating the job has finished and its status. Thats just one convenient script out of dozens.

Makes life much more easier when I can play video games or read books without having to check status every 20 mins.

Though I haven't created as many as you have. Would you mind sharing some of them??

reply
rietta
4 months ago
[-]
Thousands is probably hyperbole, but there are many, many!

One that I find regular use for when copy+paste does not work because I am either connected to a terminal emulation, vm, or something of the like is typeitforme. It takes the contents of a text file and sends them through the keyboard buffer after a few second delay (that allows me time to switch focus to the window I want the typing done).

I currently have it as a entry in my ~/.bash_aliasas file.

alias typeitforme='sleep 3 && xdotool type --file '

This works in stock Ubuntu Linux. You can check out the xdotool documentation for ideas how to refine it to do more.

reply
karel-3d
4 months ago
[-]
"Have not tested its answer yet but on its face it looks legit."

That's LLMs for you

reply
rietta
4 months ago
[-]
Fair point, but I only had limited HN commenting time budgeted, not getting a video set up to test this idea. I did confirm that the overlay feature exists via the official ffmpeg documentation.
reply
65
4 months ago
[-]
It can't be a Hacker News thread without at least one mention of LLMs, even if the thread is completely unrelated.
reply
jmuguy
4 months ago
[-]
It really is, this comic always comes to mind https://xkcd.com/2347/
reply
esbeeb
4 months ago
[-]
Gajim, the XMPP client, has been awaiting this for a long time! Their Audio/Video calling features fell into deprecation, and they've been patiently waiting for FFmpeg to make it much easier for them to add Audio/Video calling features back again.
reply
dedosk
4 months ago
[-]
Gajim and XMPP is still used out there? I miss the days when I could use pidgin for chat apps.

Now it is all wallet garden/app-per-service.

reply
rw_grim
4 months ago
[-]
There's plugins for most of the modern stuff at https://pidgin.im/plugins
reply
NicuCalcea
4 months ago
[-]
I'm quite happy with Beeper, it still has some bugs and isn't open source, but it saves me from remembering where different contacts live.
reply
matt3210
4 months ago
[-]
I love seeing the Anubis graphics unexpectedly. I’ve seen it at ffmpeg and gnu so far (among others)
reply
crabmusket
4 months ago
[-]
I do too, but this time it won't let me in :/
reply
autoexec
4 months ago
[-]
Hopefully this doesn't make it more dangerous to keep ffmpeg on our systems. WebRTC security flaws are responsible for a lot of compromises. It's one of the first things I disable after installing a browser
reply
Sean-Der
4 months ago
[-]
What security flaws?

This implementation is very small. I feel 100% confident we are giving users the best thing possible.

reply
autoexec
4 months ago
[-]
most recently: https://cyberpress.org/critical-libvpx-vulnerability-in-fire..., but you can have your pick from any year https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=webrtc

You're right that biggest reason people usually recommend disabling it is to prevent your IP from leaking when using a VPN https://www.techradar.com/vpn/webrtc-leaks but not having to worry about RCE or DoS is a nice bonus

I'm not sure how much will this impact ffmpeg users. Considering that WebRTC has a bad track record in terms of security though, I do worry a little that its inclusion in one more place on our systems could increase attack surface.

reply
Sean-Der
4 months ago
[-]
Those are issues in multiple implementations though! Lots of them are just issues in Chromium around Javascript (webrtc code wasn't even started yet)

That would be like saying saying 'webrtc is more secure then http' by posting this https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=http

IP Leaking has been fixed since 2019[0]. ICE/P2P is still a huge attack surface though. I have seen lots of other tricks being tried.

[0] https://www.youtube.com/watch?v=SqcW8kJAMJg

reply
globie
4 months ago
[-]
I assume autoexec is referring to the plethora of WebRTC vulnerabilities which have affected browsers, messengers, and any other software which implements WebRTC for client use. Its full implementation is seemingly difficult to get right.

Of course, you're right that this implementation is very small. It's very different than a typical client implementation, I don't share the same concerns. It's also only the WHIP portion of WebRTC, and anyone processing user input through ffmpeg is hopefully compiling a version enabling only the features they use, or at least "--disable-muxer=whip" and others at configure time. Or, you know, you could specify everything explicitly at runtime so ffmpeg won't load features based on variable user input.

reply
gruez
4 months ago
[-]
>I assume autoexec is referring to the plethora of WebRTC vulnerabilities which have affected browsers, messengers, and any other software which implements WebRTC for client use. Its full implementation is seemingly difficult to get right.

Like what? I did a quick search and most seem to be stuff like ip leaks and fingerprinting, which isn't relevant in ffmpeg.

reply
globie
4 months ago
[-]
Here's a (very) small sample gathered from a search for "webrtc" on cve.org and picking high-severity CVEs affecting browsers:

* CVE-2015-1260

* CVE-2022-4924

* CVE-2023-7010

* CVE-2023-7024

* CVE-2024-3170

* CVE-2024-4764

* CVE-2024-5493

* CVE-2024-10488

Of course, I agree that it's not relevant to ffmpeg. But seeing "WebRTC" triggers the same part of the brain that looks out for unescaped SQL statements. Good opportunity to point out the difference in this implementation.

reply
therealpygon
4 months ago
[-]
So you searched “WebRTC”, and then took the extraordinary step of… not actually reading any of them while simultaneous using them as supposed points? Quick question since you seem to know a lot about these CVEs and have spent a fair amount of time understanding them: how many of those were browser implementation issue?

This is like searching CVE for “node” and then claiming Node is terrible because some node packages have vulnerabilities. Low effort and intended to fit evidence to an opinion instead of evaluating evidence. “Linux” has 17,000 results; using your critical lens, all Linux is insecure.

reply
globie
4 months ago
[-]
When writing this inflammatory post, you seem to have forgot the bigger picture of the thread you are flaming.

We're discussing whether it's right to take a second look from a security standpoint when a software implements WebRTC. In this case, it's nuanced, and the implementation in FFmpeg is very different than the more complete implementations you find in browsers. And when browsers have implemented WebRTC, many vulnerabilities have followed.

So the double-take is justified here, even if only in principle. No one is saying WebRTC is insecure, or FFmpeg, or node, or Linux..........

I did a cursory read of each CVE. Wherever you got the idea I did not, you must have forgot to include it in your post. Just now, I picked one from random. It reports "Multiple WebRTC threads could have claimed a newly connected audio input leading to use-after-free."

Does that exactly qualify as an "implementation bug"? I don't know, and I don't care, because how you taxonomize a CVE has nothing to do with whether it's a vulnerability that was introduced when implementing WebRTC. And it is.

reply
therealpygon
4 months ago
[-]
I forgot nothing, but you seemed to forget whose comment you were attempting to bolster. I “flamed” the useless injection of CVEs that attempt to legitimize someone’s point about the insecurity of a protocol, when that tiny amount of CVEs for a technology the world uses quite heavily almost unanimously point to poor implementation-specific issues, none of which inform the security or risk of the protocol itself, adding useless data that doesn’t further a conversation on security.

“No one is saying webrtc is insecure”? That is literally what the comment was doing, which you attempted to legitimize by listing browser-specific CVEs.

Someone pointed to a car fire and said gasoline caused the fire, and you posted pictures of car fires. There is a reason a Fire Investigator (like a security researcher would) considers the difference between what started a fire and an accellerant. WebRTC was not the cause of these vulnerabilities like you are trying to imply and like the opinion you attempted to legitimize.

“I don’t care” — clearly, if you couldn’t take the time to understand the difference, I’m not surprised.

reply
fc417fc802
4 months ago
[-]
> stuff like ip leaks and fingerprinting, which isn't relevant in ffmpeg.

If ffmpeg implements WHEP in the future then I'd certainly be concerned about both of those things when viewing a stream. Probably less so for serving a stream up, particularly via a gateway (the current implementation IIUC).

reply
lpln3452
4 months ago
[-]
This is exactly the question I have.

While WebRTC causes fingerprinting risks in browsers, isn’t that unrelated to running ffmpeg?

reply
codedokode
4 months ago
[-]
Leaking local IP addresses?
reply
morepedantic
4 months ago
[-]
ffmpeg is high performance code dealing with esoteric codecs and binary formats in C, so don't sweat it.
reply
dylan604
4 months ago
[-]
is this something that one could compile with a --without-whip type of argument if you don't want/need? that would an ideal thing.
reply
marxisttemp
4 months ago
[-]
Yes, pretty much every bit of ffmpeg can be enabled or disabled when compiling.
reply
mschuster91
4 months ago
[-]
> Hopefully this doesn't make it more dangerous to keep ffmpeg on our systems.

ffmpeg has had so many issues in the past [1], it's best practice anyway to keep it well contained when dealing with user input. Create a docker image with nothing but ffmpeg and its dependencies installed and do a "docker run" for every transcode job you got. Or maybe add ClamAV, OpenOffice and ImageMagick in the image as well if you also need to deal with creating thumbnails of images and document.

And personally, I'd go a step further and keep the servers that deal with user-generated files in more than accepting and serving them in their own, heavily locked down VLAN (or Security Group if you're on AWS).

That's not a dumbass criticism of any of these projects mentioned by the way. Security is hard, especially when dealing with binary formats that have inherited a lot of sometimes questionably reverse engineered garbage. It's wise to recognize this before getting fucked over like 4chan was.

[1] https://ffmpeg.org/security.html

reply
xxpor
4 months ago
[-]
if you're worried about arbitrary code exec from an ffmpeg vuln, docker is not a sufficient security boundary.
reply
baggy_trough
4 months ago
[-]
What is?
reply
dividuum
4 months ago
[-]
I've build a custom thumbnail/metadata extraction toolkit based on libavcodec/libavformat that runs the decoding in seccomp's strict mode and communicates the results through a linear RGB stdout stream. Works pretty well and has low overhead and complexity.

Full transcoding would be a bit more complex, but assuming decoding is done in software, I think that should also be possible.

reply
ethersteeds
4 months ago
[-]
Full virtualization. Docker implies a shared kernel attack surface, that's what you want to avoid.
reply
afiori
4 months ago
[-]
Kernel level exploits are more dangerous but also way less common, for a lot of places docker is sorta okay as a security boundary
reply
ta1243
4 months ago
[-]
It's layers. Docker is better than nothing, but a VM is better still, and even better is docker on a dedicated VM on dedicated hardware on a dedicated network segment.
reply
mschuster91
4 months ago
[-]
That's sacrificing an awful lot of latency cost for each transcode job though.
reply
xxpor
4 months ago
[-]
Firecracker says it can start a VM in 125 ms, for most transcode jobs that seems like it'd be a trivial cost.
reply
ethersteeds
4 months ago
[-]
Each job sends a provisioning ticket to a thermal printer. 1 business day turnaround, unless we need to order more servers
reply
afiori
4 months ago
[-]
To make a bit of a strawman of what you are saying even better still would be an unplugged power cable as a turned off machine is (mostly) unhackable.

To be more serious seurity is often in conflict with simplicity, efficiency, usability, and many other good things.

A baseline level of security (and avoidance of insecurities) should be expected everywhere, docker allows many places to easily reach it and is often a good enough tradeoff for many realities.

reply
endre
4 months ago
[-]
that escalated quickly.

but I agree.

reply
FrostKiwi
4 months ago
[-]
OMG YEEEEES. I'm building web based remote control and if this allows me to do ffmpeg gdigrab, have that become a WebRTC stream and be consumed by a client without the ExpressJS gymnastics I do right now, I'll be over the moon.
reply
tyre
4 months ago
[-]
Interesting I keep getting blocked by the bot detection on iOS safari, both from our work WiFi and cellular data.

Anubis let me go

reply
jsheard
4 months ago
[-]
Are you getting the "access denied" page, or an infinite challenge loop?
reply
kairosisme
4 months ago
[-]
FWIW I also can’t pass the Anubis pass on iOS Safari, even though I can on any other site. I see the Anubis success screen for a moment before it switches to the “invalid response” screen.

edit: Trying again a few minutes later worked

reply
__turbobrew__
4 months ago
[-]
I got stuck on access denied. Canada IPv4. Safari on iOS.
reply
xena
4 months ago
[-]
Do you happen to have a dual-stack network?
reply
dyl000
4 months ago
[-]
Anubis isn’t letting me through ;(
reply
Mofpofjis
4 months ago
[-]
A commit that was "co-authored-by" 6+ people and has three thousand lines of code: this is a total wreck of a development workflow. This feature should have been implemented with a series of about 20 patches. Awful.
reply
Daemon404
4 months ago
[-]
(long time FFmpeg dev here)

You are being downvoted, but you are entirely correct. This is also explicitly not allowed in FFmpeg, but this was pushed after many months, with no heads up on the list, no final review sign off, and with some developers expressing (and continuing to express) reservations about its quality on the list and IRC.

reply
bigfishrunning
4 months ago
[-]
That's really unfortunate to hear. I'm a huge fan of Webrtc and Pion, and was very excited to get some ffmpeg integration -- hopefully some of the quality issues will be ironed out before the next ffmpeg release
reply
Daemon404
4 months ago
[-]
There's quite some time until the next release, I believe, so it should be.

The biggest thing missing right now is NACK support, and one of the authors has said they intend to do this (along with fixing old OpenSSL version support, and supporting other libraries). Until that is done, it isn't really "prod ready", so to speak.

For some context, there has been a history of half-supported things being pushed to FFmpeg by companies or people who just need some subset of $thing, in the past, and vendors using that to sell their products with "FFmpeg isn't good enough" marketing, while the feature is either brought up to standard, or in some cases, removed, as the original authors vanish, so it's perhaps a touchy subject for us :) (and why my post was perhaps unnecessarily grumpy).

As for the git / premature push stuff, I strongly believe it is a knock-on effect of mailing list based development - the team working on this support did it elsewhere, and had a designated person send it to the list, meaning every bit of communication is garbled. But that is a whole different can of worms :D.

reply
jpk
4 months ago
[-]
I mean, it probably was a branch that several people contributed commits to that was squashed prior to merge into mainline. Folks sometimes have thoughts about whether there's value in squashing or not, but it's a pretty common and sensible workflow.
reply
fc417fc802
4 months ago
[-]
> common and sensible

Perhaps "common and technically works" would be a better way to put that (similarly for rebase). I suspect people would stop squashing if git gained the ability to tag groups of commits with topics in either a nested or overlapping manner.

reply
sylware
4 months ago
[-]
OMG, this is not completely brain damaged c++ code lost in the middle of one of the web engines from the whatng cartel??? or C code with one billion dependencies with absurd SDKs???

Quick! Quick! I need to find something bad about it... wait... AH!

Does it compile with the latest libressl? Hopefully not (like python _ssl.c) and I can start talking bad about it.

;P

Ofc, that was irony.

We all know the main issue with webRTC is not its implementations, but webRTC itlself.

All that said, it is exactly at this very time twitch.tv chose to break ffmpeg HLS (its current beta HLS streams are completely breaking ffmpeg HLS support...).

reply
pkz
4 months ago
[-]
Does this mean that ffmpeg now can record a Jitsi video meeting audio stream?
reply
throwpoaster
4 months ago
[-]
What’s ffmpeg security auditing like? Seems reactive from their site.
reply
SeriousM
4 months ago
[-]
I can't wait to see this in Jellyfin implemented!
reply
colordrops
4 months ago
[-]
What would this provide?
reply
leland-takamine
4 months ago
[-]
Anyone been able to successfully build ffmpeg from source to include whip support? Struggling to figure out the right ./configure options
reply
_Manch
4 months ago
[-]
You need --enable-muxer=whip and --enable-openssl
reply
leland-takamine
4 months ago
[-]
Seems work but fails with 500 when streaming to cloudflare stream though: https://gist.github.com/Leland-Takamine/1a13f31c7521d0223624...
reply
leland-takamine
4 months ago
[-]
Resolved by adding an audio stream
reply
cranberryturkey
4 months ago
[-]
Can someone ELI5 what this means? i've been using ffmpeg for over a decade.
reply
esbeeb
4 months ago
[-]
WebRTC is very, very hard to code for. But if FFmpeg abstracts that complexity away, then WebRTC becomes much easier to add to a software project wishing to benfit from that which WebRTC offers.
reply
cranberryturkey
4 months ago
[-]
I guess I still don't understand. You don't really "code" with ffmpeg. It just is used to transform media formats or publish to a public streaming endpoint.
reply
marxisttemp
4 months ago
[-]
All of ffmpeg’s functionality is accessible from C (and transitively most other programming languages) via libavformat, libavcodec etc. FFmpeg supporting WebRTC means that projects using these libraries gain support for WebRTC in code.
reply
shmerl
4 months ago
[-]
Does it allow more realtime streaming than SRT on LAN?

I'm still waiting for ffmpeg CLI tool to merge pipewire + xdg-desktop-portal support. You still can't record a screen or window on Wayland with it.

reply
Sean-Der
4 months ago
[-]
With WebRTC you can expect ~100ms with zero optimizations on your LAN.

With bitwhip[0] I got it way lower then that even.

[0] https://github.com/bitwhip/bitwhip

reply
shmerl
4 months ago
[-]
That's nice. I had hard time getting low latency with SRT, but managed to get within the range of roughly slightly less than one second using gpu-screen-recorder on one end and ffplay on the other end with flags for low latency.
reply
chompychop
4 months ago
[-]
I have a beginner question - Can WebRTC be used as an alternative to sending base64-encoded images to a backend server for image processing? Is this approach recommended?
reply
Sean-Der
4 months ago
[-]
Depends on your needs!

https://github.com/pion/webrtc/tree/master/examples/save-to-... here is an example of a server that does what you need on the backend

reply
actionfromafar
4 months ago
[-]
Now I have the question - when does one send base64-encoded images to a backend server?
reply
Spivak
4 months ago
[-]
The OpenAI API is a pretty high-profile example of this existing in the real world. You use it in their conversations interface when you want to include images in the conversation. Discord also uses it for attachments https://discord.com/developers/docs/reference#image-data. More generally it's when you want to send image data as part of a larger JSON blob.
reply
ec109685
4 months ago
[-]
Why doesn’t a PR of that magnitude come with tests?
reply
bigfishrunning
4 months ago
[-]
Using Pion no less! very cool!
reply
Xeoncross
4 months ago
[-]
I assume you mean https://github.com/pion/webrtc, I don't see any Go, I thought they just fixed a bug with compatibility with Pion.
reply
wang_zuo
4 months ago
[-]
The author seems to be an undergraduate from china. very impressive!
reply
mrheosuper
4 months ago
[-]
so RTC is real time communication, not real time clock...
reply
karlkloss
4 months ago
[-]
"Sadly, you must enable JavaScript to get past this challenge."

Nope. Get lost. Running random code from websites you don't know is asking for desaster.

reply
theobr
4 months ago
[-]
Absolutely huge
reply
alexfromapex
4 months ago
[-]
It would be cool to have a chat too
reply
quantadev
4 months ago
[-]
Public Service Announcement: There's a reddit topic for WebRTC, that doesn't get enough action imo! Get in there ya'll...

https://www.reddit.com/r/WebRTC

reply
spartanatreyu
4 months ago
[-]
No.

Reddit lost their own community's trust when the CEO ejected the community's moderators.

Information posted there is now far less likely to be qualitative compared to other places, so what's the point of going there?

reply
quantadev
4 months ago
[-]
Where's the best place that people are talking about WebRTC then? Hacker News doesn't have rooms by topic does it?
reply
Sean-Der
4 months ago
[-]
Broadcast Box has a discord https://discord.gg/An5jjhNUE3

You could also join the Pion one https://pion.ly/discord

Other place is video-dev Slack https://www.video-dev.org/

X also has a great community of hackers!

reply
_flux
4 months ago
[-]
Are now Discord, Slack and X now better options than Reddit?

Too bad nobody is using open forums (e.g. Discourse, Matrix, Mastodon).

reply
quantadev
4 months ago
[-]
I think for every platform you could name there will be a certain sub-culture of people boycotting it for one reason or another. Sometimes it's about politics, sometimes about censorship, etc.

I do it myself too. I boycotted StackOverflow for the past 13 years, never sharing another solution with anyone because the admins on that site were hounding me about posting CORRECT answers without first checking to see if a similar answer already existed. I vowed never to help anyone on there ever again, after that. I told them, even back then, that the more correct info they collect for free, the better, because some day AI would sifting thru it, and that they were being stupid. As of 2025, I was finally proven right.

reply
spartanatreyu
4 months ago
[-]
Discord is better for ephemeral style questions. Mastodon is better for sharing experimentations and explorations.
reply
quantadev
4 months ago
[-]
Thanks for the links! Yeah, I'm on X as my primary news source for tech, but I mainly only follow AI-related people and orgs. I should search for WebRTC.
reply
MrThoughtful
4 months ago
[-]
I know there are JavaScript ports of FFmpeg and I would love to use them. But so far, I never got it working. I tried it with AI and this prompt:

    Make a simple example of speeding up an mp4
    video in the browser using a version of ffmpeg
    that runs in the browser. Don't use any server
    side tech like node. Make it a single html file.
But so far every LLM I tried failed to come up with a working solution.
reply
bastawhiz
4 months ago
[-]
If you visit the ffmpeg.wasm documentation, the first example on the Usage page does almost exactly this:

https://ffmpegwasm.netlify.app/docs/getting-started/usage

It transcodes a webm file to MP4, but making it speed up the video is trivial: just add arguments to `ffmpeg.exec()`. Your lack of success in this task is trusting an LLM to know about cutting-edge libraries and how to use them, not a lack of progress in the area.

reply
MrThoughtful
4 months ago
[-]
The problem is that they don't provide the full code that can run in the browser. I have not managed to get the function they show in the first example to run in the browser.
reply
Matheus28
4 months ago
[-]
You don’t need an LLM to do that. The code in there is almost complete…
reply
vel0city
4 months ago
[-]
Listen buddy, I need an LLM to tie my shoes, don't be so judgemental.
reply
bastawhiz
4 months ago
[-]
That's just wrong. The example is live: you can run it right there on the page. If the code isn't working when you write it, you're probably importing something incorrectly (or you're not running it in an environment with React, which is where the `use*` functions come from). You can even click on the source of the log lines when the example is running (on the right edge of the Chrome console) to jump into the hot-loaded code and see the exact code that's running it.
reply
MrThoughtful
4 months ago
[-]
I think there is some kind of misunderstanding here.

You say "an environment with React". My environment is the browser.

I don't know how one is supposed to run that nameless function on that page. What I am looking for is a simple, complete example in HTML that can run standalone when opened in the browser. Without any server side processing involved.

reply
jmtulloss
4 months ago
[-]
If you want to copy/paste, try taking the first example and asking the llm to refactor the code to run in a browser with no dependencies. It should be able to strip out the react stuff, or at least get it close and you can fix it from there.
reply
MrThoughtful
4 months ago
[-]
I have tried that a bunch of times and a bunch of ways and did not get ffmpeg to work.

It might have to do with these two strange comments at the top:

    // import { FFmpeg } from '@ffmpeg/ffmpeg';
    // import { fetchFile, toBlobURL } from '@ffmpeg/util';
The rest of the code seems to assume "FFmpeg", "fetchFile" and "toBlobUrl" are somehow magically available. Neither me nor any LLM have yet managed to get these into existance.
reply
jmtulloss
4 months ago
[-]
OK to your credit your original request was to get this all working in a single html file. That is not possible with the easy paths documented on ffmpeg.

By default, the build relies on web workers which need to load their code from somewhere (and usually it has to be the same origin as the code making the request)

Through much mastery of JS build systems that I would not wish on my enemies, I bet you could get it working on localhost, but you’ll have a much better time of it if you set up vite or something for a local build. You can still easily do a “serverless” deploy with GitHub pages or similar but you do need an http server correctly configured for asset requests.

reply
numpad0
4 months ago
[-]
I just threw that prompt into the free ChatGPT, looks like it'll have a few versioning as well as CORS issues...
reply
simlevesque
4 months ago
[-]
Don't try to do cutting edge stuff with a brain that doesn't know anything past a certian date.
reply
colechristensen
4 months ago
[-]
Trying to do things off the beaten path with LLMs is rarely successful, especially if there's a related much more popular option.

I'm convinced that programmers' bias towards LLMs is strongly correlated with the weirdness of their work. Very often my strange ideas pushed to LLMs look like solutions but are rather broken and hallucinated attempts which only vaguely represent what needs to be done.

reply
bigfishrunning
4 months ago
[-]
> I'm convinced that programmers' bias towards LLMs is strongly correlated with the weirdness of their work.

This is an extremely astute observation; my work has always been somewhat weird and I've never found LLMs to be more then an interesting party-trick

reply
minimaxir
4 months ago
[-]
The JS ports of FFmpeg (or WASM port if you want the in-browser approach) are very old and would be more than present in modern LLM training datasets, albeit likely not enough of a proportion for LLMs to understand it well.

https://github.com/Kagami/ffmpeg.js/

https://github.com/ffmpegwasm/ffmpeg.wasm

reply
rvz
4 months ago
[-]
> But so far every LLM I tried failed to come up with a working solution.

Maybe you need to actually learn how it works instead of deferring to LLMs that have no understanding of what you are specifically requesting.

Just read the fine documentation.

reply
prophesi
4 months ago
[-]
Entered the same prompt with Sonnet 4. Just needed to paste the two errors in the console (trying to load the CDN which won't work since it uses a web worker, and hallucinated an ffmpegWasm function) and it output an HTML file that worked.
reply
MrThoughtful
4 months ago
[-]
Can you put it on jsfiddle or some other codebin? I would love to see it.
reply
jsheard
4 months ago
[-]
I'm sorry, but if you give up on something you would "love to use" just because LLMs are unable to oneshot it then you might be a bit too dependent on AI.
reply
minimaxir
4 months ago
[-]
Time is a finite resource, and there's an opportunity cost. If an easy PoC for a complex project can't be created using AI and it would take hours/days to create a PoC organically that may not even be useful, it's better project management to just do something else entirely if it's not part of a critical path.
reply
bastawhiz
4 months ago
[-]
I can't disagree with this take more vehemently. This isn't an "easy PoC". This is "copy and paste it from the docs"-level effort:

https://ffmpegwasm.netlify.app/docs/getting-started/usage/

If you can't be arsed to google the library and read the Usage page and run the _one command_ on the Installation page to come up with a working example (or: tweak the single line of the sample code in the live editor in the docs to do what you want it to do), how do you expect to do anything beyond "an easy PoC"? At what point does your inability/unwillingness to do single-digit-minutes of effort to explore an idea really just mean you aren't the right person for the job? Hell, even just pasting the code sample into the LLM and asking it to change it for you would get you to the right answer.

reply
minimaxir
4 months ago
[-]
I was commenting on the general assertion of the GP's comment, not this specific instance.
reply
bastawhiz
4 months ago
[-]
Another commenter showed how they were able to use Claude to do this in two messages: one to write the code, a second to paste the error that comes out so Claude can fix it. The exact word of the comment you replied to was "oneshot": if you're going to outsource 100% of the thinking involved in the task to a machine and can't even be bothered to copy over the error you're getting after the first response, my response remains the same.
reply
MrThoughtful
4 months ago
[-]
Can you link to that comment? I don't see a mention of "Claude" anywhere in this thread. Also nobody here showed they were able to "do this" yet.
reply
MrThoughtful
4 months ago
[-]
If there is a "copy and paste" way to get that to run in the browser, can you copy and paste it to a jsfiddle and post the link to the fiddle here?
reply
Matheus28
4 months ago
[-]
You’re basically asking people to do your homework for you at this point…
reply
MrThoughtful
4 months ago
[-]
He said it is a matter of copy+paste, not work.

I don't think so as I did not get it to run. And if he really can accomplish it with copy+paste, why wouldn't he demonstrate it?

reply
AndriyKunitsyn
4 months ago
[-]
Because he doesn't want to do that for you for free I guess :)

"Tap with a hammer: $1. Knowing where to tap: $9999."

reply
bastawhiz
4 months ago
[-]
As long as you make sure the npm package is available, you can! If you can't figure out how to do it, I'm sorry but I literally can't think of a way to make it less effort. The problem you described in another comment with the import statements is literally explained on the Installation page of the documentation.
reply
MrThoughtful
4 months ago
[-]
As I said in my original comment that started this thread, I don't use any server side tech. So there is no "npm package". I am only using a browser.
reply
bastawhiz
4 months ago
[-]
You don't need to be on the server to use NPM. NPM just downloads the code. I'm honestly not sure if you're just trolling at this point
reply
ch_sm
4 months ago
[-]
if you‘re really interested in doing that, i‘m certain you can with a bit of effort. There are plenty of docs and examples online.
reply
mort96
4 months ago
[-]
You know there's ... documentation, right?
reply
ycombinatrix
4 months ago
[-]
LLM is my eyes. LLM is my ears. LLM is my documentation. I am LLM.
reply