FilterHN

Bayesian statistics for confused data scientists

65 points

by speckx

3 days ago

| past

| 4 comments

| nchagnet.pages.dev

| HN

▲

statskier

3 hours ago

[-]

I went through grad school in a very frequentist environment. We “learned” Bayesian methods but we never used them much.

In my professional life I’ve never personally worked on a problem that I felt wasn’t adequately approached with frequentist methods. I’m sure other people’s experiences are different depending on the problems you gravitate towards.

In fact, I tend to get pretty frustrated with Bayesian approaches because when I do turn to them it tends to be in situations that already quite complex and large. In basically every instance of that I’ve never been able to make the Bayesian approach work. Won’t converge or the sampler says it will take days and days to run. I can almost always just resort to some resampling method that might take a few hours but it runs and gives me sensible results.

I realize this is heavily biased by basically only attempting on super-complex problems, but it has sort of soured me on even trying anymore.

To be clear I have no issue with Bayesian methods. Clearly they work well and many people use them with great success. But I just haven’t encountered anything in several decades of statistical work that I found really required Bayesian approaches, so I’ve really lost any motivation I had to experiment with it more.

▲

nextos

2 hours ago

[-]

> I’ve never personally worked on a problem that I felt wasn’t adequately approached with frequentist methods

Multilevel models are one example of problem were Bayesian methods are hard to avoid as otherwise inference is unstable, particularly when available observations are not abundant. Multilevel models should be used more often as shrinking of effect sizes is important to make robust estimates.

Lots of flashy results published in Nature Medicine and similar journals turn out to be statistical noise when you look at them from a rigorous perspective with adequate shrinking. I often review for these journals, and it's a constant struggle to try to inject some rigor.

From a more general perspective, many frequentist methods fall prey to Lindley's Paradox. In simple terms, their inference is poorly calibrated for large sample sizes. They often mistake a negligible deviation from the null for a "statistically significant" discovery, even when the evidence actually supports the null. This is quite typical in clinical trials. (Spiegelhalter et al, 2003) is a great read to learn more even if you are not interested in medical statistics [1].

[1] https://onlinelibrary.wiley.com/doi/book/10.1002/0470092602

▲

jmalicki

5 minutes ago

[-]

Thank you for Lindey's paradox! TIL

▲

getnormality

44 minutes ago

[-]

The evidence "actually supports the null" over what alternative?

In a Bayesian analysis, the result of an inference, e.g. about the fairness of a coin as in Lindley's paradox, depends completely on the distribution of the alternative specified in the analysis. The frequentist analysis, for better and worse, doesn't need to specify a distribution for the alternative.

▲

statskier

1 hour ago

[-]

I agree Bayesian approaches to multilevel modeling situations are clearly quite useful and popular.

Ironically this has been one of the primary examples of, in my personal experience, with the problems I have worked on, frequentist mixed & random effects models have worked just fine. On rare occasions I have encountered a situation where the data was particularly complex or I wanted to use an unusual compound probability distribution and thought Bayesian approaches would save me. Instead, I have routinely ended up with models that never converge or take unpractical amounts of time to run. Maybe it’s my lack of experience jumping into Bayesian methods only on super hard problems. That’s totally possible.

But I have found many frequentist approaches to multilevel modeling perfectly adequate. That does not, of course, mean that will hold true for everyone or all problems.

One of my hot takes is that people seriously underestimate the diversity of data problems such that many people can just have totally different experiences with methods depending on the problems they work on.

▲

nextos

1 hour ago

[-]

These days, the advantage is that a generative model can be cleanly decoupled from inference. With probabilistic languages such as Stan, Turing or Pyro it is possible to encode a model and then perform maximum likelihood, variational Bayes, approximate Bayesian inference, as well as other more specialized approaches, depending on the problem at hand.

If you have experienced problems with convergence, give Stan a try. Stan is really robust, polished, and simple. Besides, models are statically typed and it warns you when you do something odd.

Personally, I think once you start doing multilevel modeling to shrink estimates, there's no way back. At least in my case, I now see it everywhere. Thanks to efficient variational Bayes methods built on top of JAX, it is doable even on high-dimensional models.

▲

storus

2 hours ago

[-]

A large portion of generative AI is based on Bayesian statistics, like stable diffusion, regularization, LLM as a learned prior (though trained with frequentist MLE), variational autoencoders etc. Chain-of-thought and self-consistency can be viewed as Bayesian as well.

▲

jhbadger

3 hours ago

[-]

I think Rafael Irizarry put it best over a decade ago -- while historically there was a feud between self-declared "frequentists" and "Bayesians", people doing statistics in the modern era aren't interested in playing sides, but use a combination of techniques originating in both camps: https://simplystatistics.org/posts/2014-10-13-as-an-applied-...

▲

therobots927

1 hour ago

[-]

That’s Bayesian propaganda

▲

jmalicki

6 minutes ago

[-]

Huh? Are there really any pure frequentists post Stein's paradox? At least ones that are aware of it and maintain objections to fusing the fields?

▲

hawtads

14 minutes ago

[-]

I think it would be interesting if frequentist stats can come up with more generative models. Current high level generative machine learning all rely on Bayesian modeling.

▲

jmalicki

7 minutes ago

[-]

I'm not well versed enough, but what would a frequentist generative model even mean?

The entire generative concept implicitly assumes that parameters have probability distributions themselves that naturally give rise to generative models...

You could do frequentist inference on a generative model, sure, but generative modelling seems fundamentally alien to frequentist thinking?

▲

hawtads

5 minutes ago

[-]

I am more familiar with Bayesian than frequentist stats, but given that they are mathematical equivalent, shouldn't frequentist stats have an answer to e.g. the loss function of a VAE? Or are generative machine learning inherently impossible to model for frequentist stats?

Though if you think about it, a diffusion model is somewhat (partially) frequentist.

▲

jmalicki

3 minutes ago

[-]

They do!

https://arxiv.org/pdf/2510.18777

But that doesn't mean a frequentist views a VAE as a generative model!

Putting it another way, Gaussian processes originated as a frequentist technique! But to a frequentist they are not generative.

▲

hawtads

1 minute ago

[-]

Ooh good find, thanks for the link. This will be my bedtime reading for this week :)

▲

lottin

43 minutes ago

[-]

> In Bayesian statistics, on the other hand, the parameter is not a point but a distribution.

To be more precise, in Bayesian statistics a parameter is random variable. But what does that mean? A parameter is a characteristic of a population (as opposed to a characteristic of a sample, which is called a statistic). A quantity, such as the average cars per household right now. That's a parameter. To think of a parameter as a random variable is like regarding reality as just one realisation of an infinite number of alternate realities that could have been. The problem is we only observe our reality. All the data samples that we can ever study come from this reality. As a result, it's impossible to infer anything about the probability distribution of the parameter. The whole Bayesian approach to statistical inference is nonsensical.