FilterHN

I think I remember he did a video explaining that changing the title and thumbnail gives a significant bump in views so that a video can become evergreen. CGP Grey mentioned doing this and hitting 1B total views years earlier than he expected

▲

jgalt212

2 days ago

[-]

Yeah, I didn't think he was doing it for his health.

▲

Alifatisk

2 days ago

[-]

You nailed it, I wanted to post the actual article but found that they where behind a paywall, and I don't know if pirated content is allowed to share here but it's definitely worth a read though!

▲

danlitt

2 days ago

[-]

It is often not piracy to find open copies of papers; the authors often publish them directly (on their personal pages or on arxiv). A commenter further down found a public link: https://math.uchicago.edu/~shmuel/Network-course-readings/Ma...

▲

denverllc

1 day ago

[-]

Yeah, most authors are not concerned about their papers being shared because what they care about is citations, not money. It’s the publishing companies that want to put up the paywall (and the ones who would make any money from it anyway).

▲

0xDEAFBEAD

2 days ago

[-]

Honestly a pretty bad video by Veritasium standards.

Consider the nuclear reaction metaphor. It's clearly not memoryless. Eventually you'll run out of fissile material.

The diagram for that example is bad as well. Do arrows correspond to state transitions? Or do they correspond to forks in the process where one neutron results in two?

▲

mrlongroots

2 days ago

[-]

> Consider the nuclear reaction metaphor. It's clearly not memoryless. Eventually you'll run out of fissile material.

I think no real process is memoryless: time passes/machines degrade/human behaviors evolve. It is always an approximation that is found/assumed to hold within the modeled timeframe.

▲

0xDEAFBEAD

2 days ago

[-]

Sure but it's still the case that if you treat the reaction as memoryless, your model's predictions will be wildly wrong. It's not a good model in this case.

▲

mrlongroots

2 days ago

[-]

It seems that most approaches treat neutron diffusion as memoryless (or at least stochastic):

https://en.wikipedia.org/wiki/Neutron_transport

▲

baxtr

2 days ago

[-]

3.4 Mio views in 4 days. Wow…

▲

stronglikedan

2 days ago

[-]

Veritasium is quality content. Those eyes don't hurt nothing either.

▲

tbalsam

2 days ago

[-]

They unfortunately recently (last few years) sold out to private equity (which tends to glaze over fundamentals and tries to pump out massive content using previous brand quality to give it credence), so beware of quality in more recent vids:

https://youtu.be/hJ-rRXWhElI?si=Zdsj9i_raNLnajzi

▲

fn-mote

2 days ago

[-]

Yesterday I was reading comments about how the market could pay for research and avoid the “distorting effects” of public funding.

Is there any way to get a better outcome for the public here, or is “do good stuff then sell out” the way it’s always going to be?

▲

edwardbernays

2 days ago

[-]

What distorting effects of public funding? What about the distortionary effects of the market? I'll offer the suggestion that what you read is brainrotting private market propaganda designed to erode the public institutions that make America happier, healthier, and wealthier.

▲

tomrod

2 days ago

[-]

In economics discussions regarding public funding policy, the concern of "crowding out" commercial firms or nonprofits is a real concern. It's definitely an observed, measured, and reported phenomenon.

In the end, incentives matter.

https://en.wikipedia.org/wiki/Crowding_out_(economics)

▲

edwardbernays

2 days ago

[-]

There is no private market entity with an incentive to provide research to the public, so in this sense there is no crowding out. Providing research to the public enables the discovery of new products which would otherwise have not been created. Public research is a public good that makes our nation happier, healthier, and wealthier.

▲

tomrod

2 days ago

[-]

Let's ignore FOSS contributions for a moment, which very much contradict your claim that private companies don't contribute research to the public.

Outside software technology: there is a series of papers from Grossman (going back to the 80s!) that analyzes basic versus applied research in a macroeconomic framework. Basic research _can_ be a public good, applied research can be crowded out. Combined with microeconomic research that monopolies can be dynamically efficient (investing in applied and basic R&D, like Bell Labs) and you get several examples and theories that contradict your statement that "there is no private market entity with an incentive to provide research to the public."

Another real world example in hardware that contradicts this claim is the evolution of building control systems. Before the advent of IOT, so, circa 1980s - 2010s, you saw increasing sharing and harmonization of competing electronics standards because it turned out to be more efficient to be modular, not have to re-hire subcontractors at exorbitant rates to maintain or replace components that go haywire, etc.

▲

edwardbernays

2 days ago

[-]

Including FOSS software is so wild in this conversation that it's ridiculous. You mean creating a product as a loss leader to get people into an ecosystem, farm social capital, create a sales funnel, or get free labor from the community to provide QA? The creation and release of software is NOWHERE NEAR the same category as "doing actual real scientific research" that it just smells of incredibly bad faith argumentation.

Economic analysis? Another intelligence product that requires essentially no staff, no actual R&D, no equipment besides computers? Brother, you have to be kidding me.

The hardware thing is just companies evolving to a shared standard.

Do you have even a little bit of a clue how hard it is to do good pharmacological research? Toxicological? Biological? Chemical? Physical? You have mentioned intelligence products with 0 investment cost and 0 risk of failure.

This is perhaps one of the most fart-sniffing tech-centric perspectives I have ever been exposed to. Go read some actual research by actual scientists and come back when you can tell me why, for instance, Eli Lilley would ever make their data or internal R&D public.

Jonas Salk did it. He is an extremely rare exception, and his incentive was public health. Notice that his incentive was markedly not financial.

Market entities with a financial incentive, whose entire business model and success is predicated on their unique R&D results, have 0 incentive to release research to the public.

▲

tomrod

1 day ago

[-]

I apologize that my points in my prior comment were so easy to misunderstand. Your response here shows a dramatic superficiality in understanding each of these areas I brought up, when not missed entirely. My hope is you can move past your rhetorical stumbling blocks in the conversation -- if that proves impossible, I'm happy to leave things as this being my last comment in our shared thread.

(1) FOSS is not only the next hyped front-end framework or modern data stack funnel. I encourage you to do more research for what European universities and organizations are doing. Not everyone follows the American or Chinese extractive approaches to software.

Further, while many corporations do indeed farm social capital and perform other appreciably maladative and cynic-inducing behaviors, the universe and the space of organizations is large. There are a great many examples of governments adopting public research and development released by private entities -- in FOSS and in other contexts.

Additionally, the fact that FOSS-product-focused companies tend to launch _after_ FOSS becomes successful to support the FOSS offering with associated services is quite a bit different from what is perhaps a FAANG-induced cynicism. To reiterate - the universe and the space of organizations is large.

(2) You interpreted that I did pointed to economic analysis as public vs. private R&D. This is a misinterpretation on your part and I encourage a re-read. I pointed to findings and studies to help you understand where the organizational and market frameworks for analysis stand.

(3) I am a researcher and regularly publish my findings, under the banner of the university I support, under non-profits I support, and under the company I run. I appreciate your experience has made you cynical. Lets break down this section.

> This is perhaps one of the most fart-sniffing tech-centric perspectives I have ever been exposed to.

This was not received as a good-faith statement, and further discussion on it will only engender argument. I suggest we move beyond trivial digs.

> Eli Lilley would ever make their data or internal R&D public.

Not to shill for them, but your point on Eli Lilly is incorrect. Eli Lilly has worked towards more transparent release of information -- they voluntarily launched an online clinical trial registry starting in 2002 (for Phase II–IV trials initiated on/after October 15, 2002) and extended full trial registration (including Phase I) from October 1, 2010.[0] Since 2014, Lilly has published clinical study results (Phase 2/3) regardless of outcome, adhering to PhRMA/EFPIA transparency principles. Patient-level data on marketed-approved indications is available to qualified researchers via a controlled-access third-party portal.[1] Beginning in 2021, Lilly has also produced plain‑language summaries of Phase 2–4 results in English, and more recently extended plain‑language summaries to Phase 1 trials in the EU in compliance with new regulations.[1]

Especially the third point is relevant -- good government regulation leads to better sharing and transparency. Smart companies take regulation as an innovation opportunity.

> Jonas Salk did it. He is an extremely rare exception, and his incentive was public health. Notice that his incentive was markedly not financial.

Aye, and I wish that all medical and life-enhancing research could be accomplished as relatively cheaply or as magnanimously as Jonas Salk.

> Market entities with a financial incentive, whose entire business model and success is predicated on their unique R&D results, have 0 incentive to release research to the public.

Please refer to (2) for studies and theory for why this is untrue.

The number of market entities who only are built on unique R&D tend to fail due to poor delivery of product, so their incentive to release their R&D to the world is more or less moot. I do acknowledge existence of market entities who are built solely on operationalizing R&D -- I challenge the implicit claim that all market entities fall into this category.

[0] https://www.lilly.com/au/policies-reports/transparency-discl...

[1] https://sustainability.lilly.com/governance/business-ethics

▲

ricardobeat

1 day ago

[-]

You could argue that Bell Labs was essentially government funded, as the monopoly/concession of the entire US telephony infrastructure is what made it possible, and research at universities was not funded anywhere near current levels.

They were also forced in the 1950s to license all their innovations freely, as compensation for holding a monopoly. Which only strengthens the parent’s point that private institutions have little incentive to work for public benefit.

▲

egberts1

18 hours ago

[-]

Galileo wants a word with you ... from heaven.

▲

Ma8ee

1 day ago

[-]

That whole discussion is based on the assumption that commercial firms or nonprofits are better in some way than publicly funded research. That is the stupid neoliberal dogma that private and market economy always are better than things that are run by our elected officials. That dogma has to die.

▲

tomrod

1 day ago

[-]

Price as a market signal precedes neoliberalism by several decades to several millennia, depending on which economic historian you speak with. Is your argument that basic research which has no immediately attributable applications is better handled by publicly funded research? I mostly agree to that. Applied research is definitely handled better by commercial firms and nonprofits when handling is defined by what people are willing to value (pay for).

If we're talking about applied technology in the public goods space, then it can be a toss up. Sustainability research, for example, can be quite blurry as to whether the market is pricing it in or not as applied or basic research -- really depends on how a government handles externalities and regulatory capture!

I'll 100% agree to government entities as well as some well-chartered public entities being absolutely awesome at setting up incentive structures for desired outcomes. There is actually a whole field of research dedicated to the topic of incentive structuring called mechanism design -- think of it as the converse to Game Theory and strategic behavioral analysis -- that policy design and analysis learn from.

I'll also note that governments aren't structured to efficiently provide benefits or just-in-time delivery in most situations. Though the discussion has made me more curious about how operationally efficient the DOD is for civilian goods distribution, given it supports a massive population.

▲

Ma8ee

22 hours ago

[-]

I'm pointing out that there is an implied assumption that private always is better than public, and that assumption in many cases is just plain wrong. Not in all cases, market economy works great for many things, but there are also many cases it plainly sucks. When you warn that private initiatives might be crowded out, it is implicit that those are more desirable than public initiatives.

This kind of discussion is a bit off topic here, but I think it is important to remind people that the idea that private always is better than public is ideological dogma, not science. But your latest comment makes me believe you agree with that.

▲

tomrod

21 hours ago

[-]

Yep, we agree in total. You often hear the opposite dogma too, that governments are wonderfully efficient and all markets are broken.

A moderate path, like what we see in the Scandinavian countries, looks to be a better model.

▲

edwardbernays

1 day ago

[-]

Completely agree. Neoliberalism and its consequences have been a disaster for mankind.

▲

0xDEAFBEAD

2 days ago

[-]

What's the easiest way to reliably check if a Youtube channel was sold to private equity? Is that info always a matter of public record?

▲

tbalsam

1 day ago

[-]

I'm not entirely sure, to be honest. If you look at the linked video, they state that it's oftentimes not in the best interest of the private equity group's moneymaking capabilities to announce that a channel has been sold out to them.

How that is in practice, I'm not sure, and I'm sure with some sleuthing it would be possible to find out at least some of it. But on the whole, I'm honestly not sure beyond that.

▲

aarond0623

2 days ago

[-]

He's had a couple of misleading videos over the last few years that finally made me unsubscribe. Specifically the lightbulb with a 1 light second wire and the more recent video about light taking infinite paths.

There was also the Waymo ad and the Rods from the Gods video where he couldn't bother to use a guide wire to aim.

▲

deadso

2 days ago

[-]

What was wrong with the 1 light second wire and the light taking infinite paths videos?

▲

aarond0623

2 days ago

[-]

The first one was portrayed in a clickbait "everything you know about electricity is wrong" way. There have been several response videos to it that lay out why it's misleading that explain it better than I can, but suffice to say that the lightbulb does not turn on immediately like he claims.

There second one takes a mathematical model for the path integral for light and portrays it like that's actually what is happening, with plenty of phrases like light "chooses" the path of least action that imply something more going on. Also, the experiment at the end with the laser pointer is awful. The light we are seeing is scattering from the laser pointer's aperture, not some evidence that light is taking alternate paths.

▲

ChadNauseam

2 days ago

[-]

> suffice to say that the lightbulb does not turn on immediately like he claims

Many people said this, but he set up an experiment to test it and the light does turn on instantly as claimed: https://www.youtube.com/watch?v=oI_X2cMHNe0

> There second one takes a mathematical model for the path integral for light and portrays it like that's actually what is happening

I know nothing about this. Is there a more accurate mathematical model available than the one he uses? Otherwise, I think it seems sensible to portray our best mathematical model as "what's really going on". And I didn't get the sense that light was "choosing" anything when watching the video, I got the sense that the amplitudes of all possible paths were cancelling out except for the shortest path (or something along those lines)

▲

evanb

1 day ago

[-]

There are many equivalent formulations of quantum mechanics, the one the above post takes issue with is the path integral formulation. But because you can show an exact mathematical equivalence it makes all the same predictions as, for example, Hamiltonian evolution.

The words people like to use for the path integral is a sum over histories---that corresponds tightly with the ingredients in the path integral. So in this formulation it's what's "actually happening". But in other mathematical formulations other words are more appealing and what someone claims is "actually happening" sounds different.

▲

keeda

2 days ago

[-]

+1 I would like to know too. Especially the experimental demo of infinite paths -- I'm a complete noob in quantum physics, and the video made sense of so many topics I "learned" in college but never managed to grok. It'd be good to know what the alternative explanation is.

▲

QuadmasterXLII

2 days ago

[-]

Private equity baby! not just for shitting up your dentists and toy stores anymore

▲

teaearlgraycold

2 days ago

[-]

The 1 light second wire video is kinda set up to bamboozle you. But it's still correct and taught me about EM.

▲

javier2

2 days ago

[-]

Especially the way he (or the team) responded to the criticism they got for doing those «sponsored content» pieces put me off hard enough to unsubscribe.

▲

ww520

2 days ago

[-]

Yeah, the light propagation videos are just high on misleading theories.

▲

jjraden

1 day ago

[-]

Speaking of eyes, Veritasium's occasional collaborator Dianna Cowern (Physics Girl) is doing much better after complications from long covid left her bedridden for two years. It's good to see her up and at 'em.

Her latest video, showing her out of bed and going for short walks, is here: https://youtu.be/vqeIeIcDHD0?si=WoxpqZOuRTWD2XYd

▲

edwardbernays

2 days ago

[-]

Nah, he transparently accepted money from waymo to peddle propaganda. Once somebody takes propaganda money, there's no trusting them anymore. From then on out, everything they do is in service to propaganda paymasters. Even just doing regular, good quality work can only be viewed through the lens of acquiring social capital to liquidate into financial capital later.

See: Brian Keating licking Eric Weinstein's jock strap in public and then offering mild criticism on Piers Morgan.

▲

rowanG077

2 days ago

[-]

You are essentially saying any creator that has ever done sponsored content becomes a creator non-grata. I somewhat disagree with that. Sponsored content is a perverse incentive but it's also important to understand that creators can pick and choose for what they make sponsored content. So if you have an ethical creator can create sponsored content of a product they agree is actually that good. Well now the question is "How can you tell". And I don't think you generally can. Some people are really good at lying. In the end it's really about do you trust this creator or not. Which is what's it's about regardless if they took a sponsorship or not.

▲

alpaca128

1 day ago

[-]

> Well now the question is "How can you tell". And I don't think you generally can.

You can, actually, with a simple rule of thumb: if it's being advertised on YouTube, it's statistically low quality or a scam. The sheer number of brands that sponsor videos just to be exposed later for doing something shady is just too high.

▲

rowanG077

1 day ago

[-]

That's the point. You can't tell. Applying a low resolution filter like you are proposing will filter out a ton of worthwhile products. Here is just a small list of products you can no longer buy if you subscribe to your philosophy: apple, dell, HP, framework, tuxedo and basically almost all laptop manufacturers. The same goes for smartphones. No GPUs at all for you. The filter is so crude it fails spectacularly at what it should be doing.

▲

alpaca128

1 day ago

[-]

Perhaps a more fitting variant would be that the trustworthiness decreases with increasing number of sponsored channels and ad frequency. Although I have never seen any video directly sponsored by Apple, Dell or HP for example, same for GPUs and many smartphone brands. They provide free units for review at most, and those at least go to channels with fitting content and the trustworthiness can be judged more easily. Whereas some new brand you never heard of aggressively sponsoring videos of every major channel for months basically guarantees there's something wrong with the company, product or both.

I thought it goes without saying that I don't mean ads shown directly by YouTube, if you don't already block those in 2025 I don't know what to say.

▲

edwardbernays

1 day ago

[-]

Sponsored content is fine. Sponsored content with improper public disclosure, or with irresponsible claims that do not reflect reality, is not fine. Super simple standard: if they lie or substantively misrepresent for a sponsor, they can no longer be trusted.

▲

pstuart

2 days ago

[-]

> transparently accepted money from waymo to peddle propaganda

If transparent enough (and not from an abhorrent source), I'd be ok with his product. He's even allowed to make the occasional mistake as long as he properly owns up to it.

Theres been a lot of valuable learning from him and it would be a pity to dismiss it all over a single fumble.

▲

edwardbernays

1 day ago

[-]

Lying or misrepresenting a product for a paycheck is not a fumble. It's a propagandist making a bag. Once they have put effort into creating a polished piece of propaganda, which they then release, it can not be considered a fumble any longer. It is something that they endorse. If they rescind it within some critical window that meaningfully impacts their bottom-line, maybe then I can believe them. Otherwise? No, I see no reason to offer them the benefit of the doubt. There are many people doing actually good work. Veritasium is not unique in their content or quality. We should not reward propagandists.

▲

anitil

2 days ago

[-]

I agree that Waymo video was probably a poor decision. He does say that the video is sponsored but it just comes off a bit odd. It's not uncommon that youtubers are paid either in money or access - Destin for example gets access to military sites and technology on his channel with it being a semi-explicit tool for recruiting.

▲

mapontosevenths

2 days ago

[-]

I assumed he read the Illuminatus! Trilogy and wondered where Markoff Chaney's name originated... There might be something wrong with me, now that I think about it. /s

▲

reverendsteveii

2 days ago

[-]

That's where I first learned about them.

NO SMOKING. NO SPITTING. MGMT

▲

postit

2 days ago

[-]

Markov Chains are a gateway drug to more advanced probabilistic graphic models which are worth exploring. I still remember working throughout Koller&Friedman cover to cover as one of the best learning experiences I’ve ever had.

▲

roadside_picnic

2 days ago

[-]

PGMs are a great topic to explore (and Koller+Friedman is a great book), but, as a word of caution to anyone interested: implementation of any of these more advanced models remains a major challenge. For anyone building production facing models, even if your problem is a pretty good match for the more interesting PGMs, the engineering requirements alone are a good reason not to go too far down that path.

The PGM book is also structured very clearly for researchers in PGMs. The book is laid out in 3 major section: the models, inference techniques (the bulk of the book), and learning. Which means, if you follow the logic of the book, you basically have to work through 1000+ pages of content before you can actually start running even toy versions of these models. If you do need to get into the nitty-gritty of particular inference algorithms, I don't believe there is another textbook with nearly the level of scope and detail.

Bishop's section on PGMs from Pattern Recognition and Machine Learning is probably a better place to start learning about these more advanced models, and if you become very interested then Koller+Friedman will be an invaluable text.

It's worth noting that the PGM course taught by Koller was one of the initial, and still very excellent, Coursera courses. I'm not sure if it's still free, but it was a nice way to get a deep dive into the topic in a reasonably short time frame (I do remember those homeworks as brutal though!)[0].

0. https://www.coursera.org/specializations/probabilistic-graph...

▲

yanovskishai

2 days ago

[-]

Played with Bayesian nets a bit in grad school—Pearl’s causality stuff is still mind-blowing—but I’ve almost never bumped into a PGM in production. A couple things kept biting us: Inference pain. Exact is NP-hard, and the usual hacks (loopy BP, variational, MCMC) need a ton of hand-tuning before they run fast enough.

The data never fits the graph. Real-world tables are messy and full of hidden junk, so you either spend weeks arguing over structure or give up the nice causal story.

DL stole the mind-share. A transformer is a one-liner with a mature tooling stack; hard to argue with that when deadlines loom.

That said, they’re not completely dead - reportedly Microsoft’s TrueSkill (Xbox ranking), a bunch of Google ops/diagnosis pipelines, some healthcare diagnosis tools by IBM Watson built on Infer.NET.

Anyone here actually shipped a PGM that beat a neural baseline? Would really love to appreciate your war stories.

▲

growthwtf

2 days ago

[-]

Me either. I have heard stories of it happening, but never personally seen one live. It's really a tooling issue. I think the causal story is super important and will only become more so in the future, but it would be basically impossible to implement and maintain longer-term with today's software.

Kind of like flow-based programming. I don't think there are any fundamental reason why it can't work, it just hasn't yet.

▲

ChadNauseam

2 days ago

[-]

> Pearl’s causality stuff is still mind-blowing

Could you link me to where I could learn more about this?

▲

vjaswal

1 day ago

[-]

From recollection, I believe, "The Book of Why", https://a.co/d/aYehsnO, is the general audience introduction to Pearl's approach to causality.

"Causality: Models, Reasoning and Inference", https://a.co/d/6b3TKhQ, is the technical and researcher audience book.

▲

notrealyme123

2 days ago

[-]

Just bought it last week. This book just makes me happy.

It feels like bishops pattern recognition but with a clearer tone (and a different field of course)

▲

AnotherGoodName

2 days ago

[-]

I feel like we need a video on Dynamic Markov Chains. It's a method to create a markov chain from data. It's used in all the highest compression winners in the Hutter Prize (a competition to compress data the most).

▲

atiedebee

1 day ago

[-]

You mean the algorithm used in hook[0]? These are not really top performers anymore. PPM has generally performed better and nowadays it's LLMs and context mixers that are at the top of text compression[1]

[0]: https://mattmahoney.net/dc/dce.html#Section_421 [1]: https://mattmahoney.net/dc/text.html

▲

AnotherGoodName

1 day ago

[-]

DMC is used in pretty much all of these, just not alone (that alg column is too small to capture this).

As in go into the first open source entry which is #2 in this list, cmix, unzip the files, go into paq8.cpp and search for DMC. See "Model using DMC (Dynamic Markov Compression)" and associated code. In these cases DMC is one model mixed in with others and the best model for the current context is used.

Hook exclusively uses DMC for outstanding results but the others use DMC as one of the prediction models.

▲

jadbox

2 days ago

[-]

Make your own video then :)

▲

mindcrime

2 days ago

[-]

Heh, somebody watched that same Veritasium video about Markov Chains and Monte Carlo methods. Great video; lots of interesting historical stuff in there that I didn't know (like the feud between Markov and Nekrasov).

▲

carlmcqueen

2 days ago

[-]

For awhile I was working on Monte Carlo sims in my job in finance in my early career. Just re-building a existing archaic excel monster in python to be more flexible to new investment models and implement and allow for more levers. Since I was already working with them daily I begin applying Monte Carlo models to a lot more problems I was thinking about. It truly is a fun tool to play with, especially when you're in the thick of designing them.

▲

amelius

2 days ago

[-]

If you're holding a hammer every problem looks like a nail ...

▲

tempodox

2 days ago

[-]

Yea, if it were true, then Markov chains on steroids (LLMs) would be super at problem solving, but they can't even reason. And they get worse if you add random facts about cats: https://news.ycombinator.com/item?id=44724238

▲

CrazyStat

2 days ago

[-]

That doesn’t follow at all, but I guess don’t pass up any opportunity to bash LLMs.

▲

ducktective

2 days ago

[-]

LLMs are Markov Chains on steroids?

By the way, does anyone know which model or type of model was used in winning gold in IMO?

▲

roadside_picnic

2 days ago

[-]

> LLMs are Markov Chains on steroids?

It's not an unreasonable view, at least for decoder-only LLMs (which is what most popular LLMs are). While it may seem they violate the Markov property since they clearly do make use of their history, in practice that entire history is summarized in an embedding passed into the decoder. I.e.just like a Markov chain their entire history is compressed into a single point which leaves them conditionally independent of their past given their present state.

It's worth noting that this claim is NOT generally applicable to LLMs since both encoder/decoder and encoder-only LLMs do violate the Markov property and therefore cannot be properly considered Markov chains in a meaningful way.

But running inference on decoder only model is, at a high enough level of abstraction, is conceptually the same as running a Markov chain (on steroids).

▲

hxtk

2 days ago

[-]

A Markov process is any process where if you have perfect information on the current state, you cannot gain more information about the next state by looking at any previous state.

Physics models of closed systems moving under classical mechanics are deterministic, continuous Markov processes. Random walks on a graph are non deterministic, discrete Markov processes.

You may further generalize that if a process has state X, and the prior N states contribute to predicting the next state, you can make a new process whose state is an N-vector of Xs, and the graph connecting those states reduces the evolution of the system to a random walk on a graph, and thus a Markov process.

Thus any system where the best possible model of its evolution requires you to examine at most finitely many consecutive states immediately preceding the current state is a Markov process.

For example, an LLM that will process a finite context window of tokens and then emit a weighted random token is most definitely a Markov process.

▲

magicalhippo

2 days ago

[-]

> LLMs are Markov Chains on steroids?

Might be a reference to this[1] blog post which was posted here[2] a year ago.

There has also been some academic work linking the two, like this[3] paper.

[1]: https://elijahpotter.dev/articles/markov_chains_are_the_orig...

[2]: https://news.ycombinator.com/item?id=39213410

[3]: https://arxiv.org/abs/2410.02724

▲

_Algernon_

1 day ago

[-]

LLMs aren't Markov chains unless they have a context window of 1.

>In probability theory and statistics, a Markov chain or Markov process is a stochastic process describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event

▲

mr_wiglaf

1 day ago

[-]

The tricky thing is you get to define the state. So if the "state" is the current word _and_ the previous 10 it is still "memoryless". So an LLM's context window is the state. It doesn't matter whether _we_ see parts of the state as called history, the markov chain doesn't care (they are all just different features).

Edit: I could be missing important nuance that other people are pointing out in this thread!

▲

nyeah

2 days ago

[-]

I mean, often, sure. Also problem solving is often a matter of making a spreadsheet full of +, -, *, / operations. Problem solving is often a matter of counting permutations and combinations. It's often a matter of setting up the right system of ordinary differential equations. It's often a matter of setting up the right linear algebra problem.

It's often a matter of asking the right person what technique works. It's often a matter of making a measurement before getting lost in the math. It's often a matter of finding the right paper in the literature.

▲

1vuio0pswjnm7

2 days ago

[-]

Note to file: No one is complaining about the absence of [PDF] from the title. Possible that time of submission is a factor.

For example, https://news.ycombinator.com/item?id=44574033

▲

gnabgib

2 days ago

[-]

Wasn't the submitted URL: https://news.ycombinator.com/item?id=44734162

▲

naasking

2 days ago

[-]

Unless the problem is "quantum mechanics", then that reduces to non-Markovian processes with unistochastic laws, and ironically, this makes for a causally local QM:

https://arxiv.org/abs/2402.16935v1

▲

theturtletalks

2 days ago

[-]

The Veritasium video brought up an interesting point about how LLMs, if trained too much on their own content, will fall victim to the Markov Chain and just repeat the same thing over and over.

Is this still possible with the latest models being trained on synthetic data? And if it possible, what would that one phrase be?

▲

roadside_picnic

2 days ago

[-]

That original model collapse paper has largely been misunderstood and in practice, this is only true if you're not curating the generated data at all. The original paper even specifies (emphasis mine):

> We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models, in which tails of the original content distribution disappear. [0]

In practice nobody is "indiscriminately" using model output to fine-tune models since that doesn't even make sense. Even if you're harvesting web data generated by LLMs, that data has in fact been curated by it's acceptance on whatever platform you found it on is a form of curation.

There was a very recent paper Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved) [1] whose content is pretty well summarized by the title. So long as the data is curated in some way, you are providing more information to the model and the results should improve somewhat.

0. https://www.nature.com/articles/s41586-024-07566-y

1. https://www.arxiv.org/pdf/2507.12856

edit: updated based on cooksnoot's comment

▲

cootsnuck

2 days ago

[-]

There's multiple papers on model collapse. Being able to avoid model collapse is different from it "being disproven".

If you just mean its risk has been over exaggerated and/or over simplified then yea, you'd have a point.

▲

roadside_picnic

2 days ago

[-]

Fair point, I've updated the post to highlight that even the original paper specifics "indiscriminate" use of model outputs.

Having spent quite a bit of time diving into many questionable "research" papers (the original model collapse paper is not actually one of these, it's a solid paper), there's a very common pattern of showing that something does or does not work under special conditions but casually making generalized claims about those results. It's so easy with LLMs to find a way to get the result you want that there are far too many papers out there that people quickly take as fact when the claims are much, much weaker than the papers let on. So I tend to get a bit reactionary when addressing many of these "facts" about LLMs.

But you are correct that with the model collapse paper this is much more the public misunderstanding the claims of the original paper than any fault with that paper itself.

▲

Pamar

2 days ago

[-]

I am on the move so I cannot check the video (but I did skim the pdf). Is there any chance to see an example of this technique? Just a toy/trivial example would be great, TIA!

▲

mindcrime

2 days ago

[-]

For the Monte Carlo Method stuff in particular[1], I get the sense that the most iconic "Hello, World" example is using MC to calculate the value of pi. I can't explain it in detail from memory, but it's approximately something like this:

Define a square of some known size (1x1 should be fine, I think)

Inscribe a circle inside the square

Generate random points inside the square

Look at how many fall inside the square but not the circle, versus the ones that do fall in the circle.

From that, using what you know about the area of the square and circle respectively, the ratio of "inside square but not in circle" and "inside circle" points can be used to set up an equation for the value of pi.

Somebody who's more familiar with this than me can probably fix the details I got wrong, but I think that's the general spirit of it.

For Markov Chains in general, the only thing that jumps to mind for me is generating text for old school IRC bots. :-)

[1]: which is probably not the point of this essay. For for muddying the waters, I have both concepts kinda 'top of mind' in my head right now after watching the Veritasium video.

▲

jldugger

2 days ago

[-]

> From that, using what you know about the area of the square and circle respectively, the ratio of "inside square but not in circle" and "inside circle" points can be used to set up an equation for the value of pi.

Back in like 9th grade, when Wikipedia did not yet exist (but MathWorld and IRC did) I did this with TI-Basic instead of paying attention in geometry class. It's cool, but converges hilariously slowly. The in versus out formula is basically distance from origin > 1, but you end up double sampling a lot using randomness.

I told a college roommate about it and he basically invented a calculus approach summing pixels in columns or something as an optimization. You could probably further optimize by finding upper and lower bounds of the "frontier" of the circle, or iteratively splitting rectangle slices in infinitum, but thats probably just reinventing state of the art. And all this skips the cool random sampling the monte carlo algorithm uses.

▲

shagie

2 days ago

[-]

Piet: Programming language in which programs look like abstract paintings (2002) - https://news.ycombinator.com/item?id=40141777 https://www.dangermouse.net/esoteric/piet.html

In the sample programs there's a big red one... https://www.dangermouse.net/esoteric/piet/samples.html

There's also the IOCCC classic https://www.ioccc.org/1988/westley/index.html

▲

Pamar

1 day ago

[-]

Sorry, I should have been more specific maybe: I do know about Montecarlo, and yeah, the circle stuff is a more or less canonical example - but I wanted to know more about the Markov Chains, because, again, I only know these in terms of sequence generators and I have some problems imagining how this could "solve problems" unless your problem is "generate words that sorta sound like a specific language but it is just mostly gibberish".

▲

abrefeld

2 days ago

[-]

I’ve always loved this example. I implemented the Monte Carlo pi estimation on a LEGO Mindstorms NXT back in high school. Totally sparked my interest in programming, simulations, etc. Also the NXT’s drag-and-drop, flowchart programming interface was actually a great intro to programming logic. Made it really easy to learn real programming in later on.

▲

shagie

2 days ago

[-]

A Pseudorandom Number Sequence Test Program - https://www.fourmilab.ch/random/

    Monte Carlo Value for Pi

    Each successive sequence of six bytes is used as 24 bit X and Y co-ordinates within a square. If the distance of the randomly-generated point is less than the radius of a circle inscribed within the square, the six-byte sequence is considered a “hit”. The percentage of hits can be used to calculate the value of Pi. For very large streams (this approximation converges very slowly), the value will approach the correct value of Pi if the sequence is close to random. A 500000 byte file created by radioactive decay yielded:

    Monte Carlo value for Pi is 3.143580574 (error 0.06 percent).

▲

ilyakaminsky

2 days ago

[-]

TIL, thanks! I asked Claude to generate a simulator [1] based on your comment. I think it came out well.

[1] https://claude.ai/public/artifacts/1b921a50-897e-4d9e-8cfa-0...

▲

mindcrime

2 days ago

[-]

Righteous!

▲

stevenAthompson

2 days ago

[-]

Here's a direct link to a PDF.

http://math.uchicago.edu/~shmuel/Network-course-readings/Mar...

▲

tomhow

2 days ago

[-]

Updated, thanks!

▲

tantalor

2 days ago

[-]

(2007)

▲

Iwan-Zotow

2 days ago

[-]

And? MC didn't change much, did they?

▲

taeric

2 days ago

[-]

Putting the date is often a good affordance to help people scanning the headlines know if they have seen this one before. It is not, necessarily, meant as a knock against it.

▲

gertlex

2 days ago

[-]

Especially with academic papers where publish dates are often not easy to find in a super-quick skim.