They said it didn’t matter, because the sheer volume of new data flowing in growing so fast made the old data just a drop in the bucket
Imagine trying to pay for all that content, nobody on earth would be able or willing to supply it.
Searching hn.algolia.com for examples will yield numerous ones.
https://news.ycombinator.com/item?id=23758547
https://bsky.app/profile/sinevibes.bsky.social/post/3lhazuyn...
Basically being given videos to watch all day, especially coming from the middle east (this was ISIS time so any video from the area had someone watching it as soon as uploaded).
Needless to say there's endless gold no view videos according to him.
It's also interesting that it was no open secret that already in 2018 they were all told that they were essentially training machines to do their job.
The original upload would likely still be stored, but not available for viewing.
If they really wanted to compress, take out every other frame, and regenerate those frames with a neural decoder. But I don't know why that would be worth the effort for a stable number of low res files either.
For example, if in year N youtube has f(N) new video. Let assume f(N) = cN^2. It's a crazy rate of growth. It's far better than the real world Youtube, which grew rather linearly.
But the rate of "videos that are older than 5 years" is still faster than that, because it would be cubic instead of quadratic. Unless the it's really exponential (it isn't), "videos that are older than 5 years" will always surpass "new videos this year" eventually.
Such a weird strawman argument that you are making up. You've over thought this so much that you are missing the forest from the trees
Maybe it could be used to train a neutral network. Maybe it contains dirt on a teenager, who might become a politician two decades from now. Maybe it contains an otherwise lost historical event.
https://www.youtube.com/shorts/mrOXqgShzI0
This shit is the reason I can't afford a new HDD.
Source?
One day, somebody is going to be tasked with deciding what gets deleted. It won't be pretty. Old and unloved video will fade into JPEG noise as the compression ratio gets progressively cranked, until all that remains is a textual prompt designed to feed an AI model that can regenerate a facsimile of the original.
Those would be the worst of the lot regarding how valuable they are historically for example. Engaging BS content...
I met a user from an antique land
Who said: Two squares of a clip of video
Stand in at the end of the search. Near them,
Lossly compressed, a profile with a pfp, whose smile,
And vacant eyes, and shock of content baiting,
Tell that its creator well those passions read
Which yet survive, stamped on these unclicked things,
The hand that mocked them and the heart that fed:
And on the title these words appear:
"My name is Ozymandias, Top Youtuber of All Time:
Look on my works, ye Mighty, and like and subscribe!"
No other video beside remains. Round the decay
Of that empty profile, boundless and bare
The lone and level page stretch far away.I may have gotten incredibly neurotic about online text since 2022.
I actually considered using an LLM but in my experience they "warp" the content too much for anything like this. The effort required to get them to retain what I would consider something to my taste would take longer than just writing the poem myself. (Although tbf it's been awhile since I've asked a LLM to do parody work, so I could be wrong)
The energy bill for scanning through the terabytes of metadata would be comparable to that of several months of AI training, not to mention the time it would take. Then deleting a few million random 360p videos and putting MrBeast in their place would result in insane fragmentation of the new files.
It might really just be cheaper to keep buying new HDDs.
They allow search by timestamp, I’m sure YouTube can write algo to find zero <=1 view
The youtube shorts thing is buggy as shit, it'll just stop working a lot of the time, just won't load a video. Some times you have to go back and forth a few times to get it to load. It'll often desync the comments from the video, so you're seeing comments from a different video. Some times the sound from one short plays over the visuals of another.
It only checks for notifications when you open the website from a new tab, so if you want to see if you have any notifications you have to open youtube in a new tab. Refreshing doesn't work.
Seems like all the competent developers have left.
https://en.wikipedia.org/wiki/GMail_Drive
When Google launched Gmail (2004) with a huge 1GB storage quota, Richard Jones released GMailFS to mount a Gmail account as a standard block device.
None of us, in the original discussion threads, knew of it being done before then IIRC.
Honestly, if you aren't taking full advantage within the constraints of the law of workarounds like this, you're basically losing money. Like not spending your entire per diem budget when on a business trip.
Which do you think has more value to me? (a) I save some money by exploiting the storage loophole (b) The existence of a cultural repository of cat videos, animated mathematics explainers, long video essays continue to be available to (some parts of) humanity (for the near future).
Anyway in this situation it's less that YouTube is providing us a service and more, it's captured a treasure trove of our cultural output and sold it back to us. Siphoning back as much value as we can is ethical. If YouTube goes away, we'll replace it - PeerTube or other federated options are viable. The loss of the corpus of videos would be sad but not catastrophic - some of it is backed up. I have ~5Tb of YouTube backed up, most of it smaller channels.
I agree generally with you that the word "value" is overencompassing to the point of absurdity though. Instrumental value is equated with moral worth, personal attachment, and distribution of scarcity. Too many concepts for one word.
I feel the same way. (Although, I am less sure of it.) However, I think backing up important parts of YouTube, as you have done, is a much better approach towards doing this.
OTOH I'm 100.0% sure that google has a plan, been expecting this for years and in particular, has prior experience from free Gmail accounts being used for storage.
Hmmm, isn't the "free-ness" of YouTube because there were determined to outspend and outlast any potential competitors (ie supported by the Search business), in order to create a monopoly for then extracting $$$ from?
I'm kind of expecting the extracting part is only getting started. :(
Looking at the Wikipedia page for "Commons" [0] the first meaning of commons "accessible to all members of a society" is not really true, unless "on the whim of the YT platform". The second meaning of "natural resources that groups of people (communities, user groups) manage for individual and collective benefit" is also not really true. There is no understanding that google will take any other than their own benefit into account. The third meaning of commons on that page is closest I guess to what is needed:
> Commons can also be defined as a social practice of governing a resource not by state or market but by a community of users that self-governs the resource through institutions that it creates.
And that is certainly not what Youtube can be considered to be. Youtube videos are not in the commons, but kept on a proprietary platform where the proprietor is the sole decider what happens to its availability there.
Exactly which countries could they buy?
Let me guess: you haven’t actually asked gemini
> Encoding: Files are chunked, encoded with fountain codes, and embedded into video frames
Wouldn't YouTube just compress/re-encode your video and ruin your data (assuming you want bit-by-bit accurate recovery)?
If you have some redundancy to counter this, wouldn't it be super inefficient?
(Admittedly, I've never heard of "fountain codes", which is probably crucial to understanding how it works.)
It only support 32k parts in total (or in reality that means in practice 16k parts of source and 16k parts of parity).
Lets take 100GB of data (relatively large, but within realm of reason of what someone might want to protect), that means each part will be ~6MB in size. But you're thinking you also created 100GB of parity data (6MB*16384 parity parts) so you're well protected. You're wrong.
Now lets say one has 20000 random bit error over that 100GB. Not a lot of errors, but guess what, par will not be able to protect you (assuming those 20000 errors are spread over > 16384 blocks it precalculated in the source). so at the simplest level , 20KB of errors can be unrecoverable.
par2 was created for usenet when a) the size of binaries being posted wasn't so large b) the size of article parts being posted wasn't so large c) the error model they were trying to protect was whole articles not coming through or equivalently having errors. In the olden days of usenet binary posting you would see many "part repost requests", that basically disappeared with par (then quickly par2) introduction. It fails badly with many other error models.
In practice a DVD like PI/PO model would be the best for many people (protect the 1GB parts like you said with 5-10% redundancy, and then protect all 100 1GB parts together with 5-10% redundancy. the PI will repair as much as it can at the 1GB size, while the PO will be able to repair 1GB blocks that can't be repaired otherwise.
It be interesting if Par2 or something like it could implement it natively without people having to hack together their own one off solutions.
we can't have nice things
There's no Cloud-based backup service that's competive with tape.
(YouTube video for this project: https://www.youtube.com/watch?v=l03Os5uwWmk)
AI tools can use this as a messaging service with deniability. Pretty sure humans already use it in this way. In the past, classifieds in newspapers were a similar messaging service with deniability.
https://www.tapeheads.net/threads/storing-data-on-your-analo...
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_cor...
This is one of those seemingly “smart” but actually dumb idea.
Your comment seems very sad to me. If you want your data to be safe, you could use physical storage though, and save the data there, on redundant physical hard disks in distributed locations, in various encodings.
You could also try to add even more redundancy by using an audio track with the bit sequences as spoken words combined with a video track that is resilient to low-bandwidth encoding, for example a news show where every segment takes place in front of an info graphic representing one or two bytes per segment. Could be a giant pie chart for variable-precision floating point numbers or a giant still frame of an alphabumeric character to represent raw bytes.
Add some enganging current events to the coverage to make sure the videos stay relevant.
Use large fonts to keep them resilient to video compression.
Combine YouTube, Twitch, Vimeo and at least two disk storage arrays to get five-nines enterprise-grade reliability.
The overhead for encoding and decoding is easily outweighed by the cost-neutral added redundancy.