┌───────────────┬───────────┬──────────────┐
│ │ iteration │ no iteration │
├───────────────┼───────────┼──────────────┤
│ informative │ pragmatic │ subjective │
│ uninformative │ - │ objective │
└───────────────┴───────────┴──────────────┘
My main disagreement with this model is the empty bottom-left box - in fact, I think that's where most self-labeled Bayesians in industry fall:- Iterating on the functional form of the model (and therefore the assumed underlying data generating process) is generally considered obviously good and necessary, in my experience.
- Priors are usually uninformative or weakly informative, partly because data is often big enough to overwhelm the prior.
The need for iteration feels so obvious to me that the entire "no iteration" column feels like a straw man. But the author, who knows far more academic statisticians than I do, explicitly says that he had the same belief and "was shocked to learn that statisticians didn’t think this way."
The iteration itself is sometimes viewed directly as a problem. The “garden of forking paths”, where the analysis depends on the data, is viewed as a direct cause for some of the statistical and epistemological crises in science today.
Iteration itself isn’t inherently bad. It’s just that the objective function usually isn’t what we want from a scientific perspective.
To those actually doing scientific work, I suspect iterating on their models feels like they’re doing something unfaithful.
Furthermore, I believe a lot of these issues are strongly related to the flawed epistemological framework which many scientific fields seem to have converged: p<0.05 means it’s true, otherwise it’s false.
edit:
Perhaps another way to characterize this discomfort is by the number of degrees of freedom that the analyst controls. In a Bayesian context where we are picking priors either by belief or previous data, the analyst has a _lot_ of control over how the results come out the other end.
I think this is why fields have trended towards a set of ‘standard’ tests instead of building good statistical models. These take most of the knobs out of the hands of the analyst, and generally are more conservative.
> Iteration itself isn’t inherently bad. It’s just that the objective
> function usually isn’t what we want from a scientific perspective.
I think this is exactly right and touches on a key difference between science and engineering.Science: Is treatment A better than treatment B?
Engineering: I would like to make a better treatment B.
Iteration is harmful for the first goal yet essential for the second. I work in an applied science/engineering field where both perspectives exist. (and are necessary!) Which specific path is taken for any given experiment or analysis will depends on which goal one is trying to achieve. Conflict will sometimes arise when it's not clear which of these two objectives is the important one.
E.g.: profiling an existing application and tuning its performance is comparing two products, it just so happens that they’re different versions of the same series. If you compared it to a competing vendor’s product you should use the same mathematical analysis process.
That being said, it's completely fair to use cross-validation and then run models on train, iterate with test and then finally calculate p-values with validation.
The problem with that approach is that you need to collect much, much more data than people generally would. Given that most statistical tests were developed for a small data world, this can often work but in some cases (medicine, particularly) it's almost impossible and you need to rely on the much less useful bootstrapping or LOO-CV approaches.
I guess the core problem is that the methods of statistical testing assume no iteration, but actually understanding data requires iteration, so there's a conflict here.
If the scientific industry was OK with EDAs being published to try to tease out work for future experimental studies then we'd see more of this, but it's hard to get an EDA published so everyone does the EDA, and then rewrites the paper as though they'd expected whatever they found from the start, which is the worst of both worlds.
Personally, I think that as long as you're generating data constantly (through some kind of software/hardware process), then you'd be well served to keep your sets pure and build the model finally only on data not used in the original process. This is often wildly impractical (and is probably controversial even within the field), but it's safer.
(If you train on the entire internet, this may not be possible also).
I have a feeling I'm just totally barking up the wrong tree, but don't know where my thinking/understanding is just off.
I do think ML practitioners in general align with the "iteration" category in my characterization, though you could joke that that miscategorizes people who just use (boosted trees|transformers) for everything.
[0] https://projecteuclid.org/journals/statistical-science/volum...
I don't think that is so niche. Murphy's vol II, a mainstream book, starts with this quote:
"Intelligence is not just about pattern recognition and function approximation. It’s about modeling the world." — Josh Tenenbaum, NeurIPS 2021.
Goodman & Tenenbaum have written e.g. https://probmods.org, which is very much about modeling data-generating processes.
The same can be said about large parts of Murphy's book, Lee & Wagenmakers or Lunn et al. (the BUGS book).
The article is very well succinct and even explains why even my Bayesian professors had different approaches to research and analysis. I never knew about the third camp, Pragmatic Bayes, but definitely is in line with a professor's research that was very through on probability fit and the many iteration to get the prior and joint PDF just right.
Andrew Gelman has a very cool talk "Andrew Gelman - Bayes, statistics, and reproducibility (Rutgers, Foundations of Probability)", which I highly recommend for many Data Scientists
In fact, the whole talk series (https://foundationsofprobabilityseminar.com/) and channel (https://www.youtube.com/@foundationsofprobabilitypa2408/vide...) seem interesting.
- subjective Bayes is the strawman that frequentist academics like to attack
- objective Bayes is a naive self-image that many Bayesian academics tend to possess
- pragmatic Bayes is the approach taken by practitioners that actually apply statistics to something (or in Gelman’s terms, do science)
- Statistical significance testing and hypothesis testing are two completely different approaches with different philosophies behind them developed by different groups of people that kinda do the same thing but not quite and textbooks tend to completely blur this distinction out.
- The above approaches were developed in the early 1900s in the context of farms and breweries where 3 things were true - 1) data was extremely limited, often there were only 5 or 6 data points available, 2) there were no electronic computers, so computation was limited to pen and paper and slide rules, and 3) the cost in terms of time and money of running experiments (e.g., planting a crop differently and waiting for harvest) were enormous.
- The majority of classical statistics was focused on two simple questions - 1) what can I reliably say about a population based on a sample taken from it and 2) what can I reliably about the differences between two populations based on the samples taken from each? That's it. An enormous mathematical apparatus was built around answering those two questions in the context of the limitations in point #2.
The data-poor and computation-poor context of old school statistics definitely biased the methods towards the "recipe" approach scientists are supposed to follow mechanically, where each recipe is some predefined sequence of steps, justified based on an analytical approximations to a sampling distribution (given lots of assumptions).
In modern computation-rich days, we can get away from the recipes by using resampling methods (e.g. permutation tests and bootstrap), so we don't need the analytical approximation formulas anymore.
I think there is still room for small sample methods though... it's not like biological and social sciences are dealing with very large samples.
The article gave me the same vibe, nice, short set of labels for me to apply as a heuristic.
I never really understood this particular war, I'm a simpleton, A in Stats 101, that's it. I guess I need to bone up on Wikipedia to understand what's going on here more.
I got all riled up when I saw you wrote "correct", I can't really explain why... but I just feel that we need to keep an open mind. These approaches to data are choices at the end of the day... Was Einstein a Bayesian? (spoiler: no)
A classic example is analyzing data on mind reading or ghost detection. Your experiment shows you that your ghost detector has detected a haunting with p < .001. What is the probability the house is haunted?
A person getting 50.1% accuracy on an ESP experiment with a p-value less than some threshold doesn't cut it. But that doesn't mean the prior is insurmountable.
The closing down of loopholes in Bell inequality tests is a good example of a pretty aggressive prior being overridden by increasingly compelling evidence.
Me 6 months ago would have written: "this comment is unhelpful and boring, but honestly, that's slightly unfair to you, as it just made me realize how little help the article is, and it set the tone. is this even a real argument with sides?"
For people who want to improve on this aspect of themselves, like I did for years:
- show, don't tell (ex. here, I made the oddities more explicit, enough that people could reply to me spelling out what I shouldn't.)
- Don't assert anything that wasn't said directly, ex. don't remark on the commenter, or subjective qualities you assess in the comment.
What I don’t understand is the war between naive bayes and pragmatic bayes. If it is real, it seems like the extension of philosophers vs. engineers. Scientists should see value in both. Naive Bayes is important to the philosophy of science, without which there would be a lot of junk science which would go unscrutinized for far to long, and engineers should be able to see the value of philosophers saving them works by debunking wrong science before they start to implement theories which simply will not work in practice.
I don’t get what all the hate for subjective Bayesianism is. It seems the most philosophically defensible approach, in that all it assumes is our own subjective judgements of likelihood, the idea that we can quantify them (however in exactly), and the idea (avoid Dutch books) that we want to be consistent (most people do).
Whereas, objective Bayes is basically subjective Bayes from the viewpoint of an idealised perfectly rational agent - and “perfectly rational” seems philosophically a lot more expensive than anything subjective Bayes relies on.
I’m still convinced that Americans tend to dislike the frequentist view because it requires a stronger background in mathematics.
I think it’s useful to break down the anti-Bayesians into statisticians and non-statistician scientists.
The former are mathematically savvy enough to understand bayes but object on philosophical grounds; the later don’t care about the philosophy so much as they feel like an attack on frequentism is an attack on their previous research and they take it personally
There are some cases, that do arise in practice, where you can’t impose a prior, and/or where the “Dutch book” arguments to justify Bayesian decisions don’t apply.
The p-hacking exposures of the 1990s only fermented the notion that it is very easy to get away with junk science using frequentest methods to unjustly validate your claims.
That said, frequentists are still the default statistics in social sciences, which ironically is where the damage was the worst.
But what I gathered after moving to Seattle is that Bayesian statistics are a lot more trendy (accepted even) here west of the ocean. Frequentists is very much the default, especially in hypothesis testing, so you are not wrong. However I’m seeing a lot more Bayesian advocacy over here than I did back in Iceland. So I’m not sure my parent is wrong either, that Americans tend to dislike frequentist methods, at least more than Europeans do.
But I think the crux of the matter is that bad science has been demonstrated with frequentists and is now a part of our history. So people must either find a way to fix the frequentist approaches or throw it out for something different. Bayesian statistics is that something different.
The first statement assumes that parameters (i.e. a state of nature) are random variables. That's the Bayesan approach. The second statement assumes that parameters are fixed values, not random, but unknown. That's the frequentist approach.
To read this book will be much better, then to apply "Hanlon's Razor"[2] because you see no other explanation.
In reality, on a human level, it doesn't work like that because, when you have disagreements on the very foundations of your field, although both camps can agree that their results do follow, the fact that their results (and thus terminology) are incompatible makes it too difficult to research both at the same time. This basically means, practically speaking, you need to be familiar with both, but definitely specialize in one. Which creates hubs of different sorts of math/stats/cs departments etc.
If you're, for example, working on constructive analysis, you'll have to spend tremendous amount of energy on understanding contemporary techniques like localization etc just to work around a basic logical axiom, which is likely irrelevant to a lot of applications. Really, this is like trying to understand the mathematical properties of binary arithmetic (Z/2Z) but day-to-day studying group theory in general. Well, sure Z/2Z is a group, but really you're simply interested in a single, tiny, finite abelian group, but now you need to do a whole bunch of work on non-abelian groups, infinite groups, non-cyclic groups etc just to ignore all those facts.
I’m not following your exemple about binary and group theory either. Nobody looks at the properties of binary and stops there. If you are interested in number theory, group theory will be a useful part of your toolbox for sure.
To understand both camps I summarize like this.
Frequentist statistics has very sound theory but is misapplied by using many heuristics, rule of thumbs and prepared tables. It's very easy to use any method and hack the p-value away to get statistically significant results.
Bayesian statistics has an interesting premise and inference methods, but until recently with the advancements of computing power, it was near impossible to do simulations to validate the complex distributions used, the goodness of fit and so on. And even in the current year, some bayesian statisticians don't question the priors and iterate on their research.
I recommend using methods both whenever it's convenient and fits the problem at hand.
Regardless, the idea that frequentist stats requires a stronger background in mathematics is just flat out silly though, not even sure what you mean by that.
There is a reason why conjugate priors were a thing.
The opposite is true. Bayesian approaches require more mathematics. The Bayesian approach is perhaps more similar to PDE where problems are so difficult that the only way we can currently solve them is with numerical methods.
This three cultures idea is a bit of slight of hand in my opinion, as the "pragmatic" culture isn't really exclusive of subjective or objective Bayesianism and in that sense says nothing about how you should approach prior specification or interpretation or anything. Maybe Gelman would say a better term is "flexibility" or something but then that leaves the question of when you go objective and when you go subjective and why. Seems better to formalize that than leave it as a bit of smoke and mirrors. I'm not saying some flexibility about prior interpretation and specification isn't a good idea, just that I'm not sure that approaching theoretical basics with the answer "we'll just ignore the issues and pretend we're doing something different" is quite the right answer.
Playing a bit of devil's advocate too, the "pragmatic" culture reveals a bit about why Bayesianism is looked at with a bit of skepticism and doubt. "Choosing a prior" followed by "seeing how well everything fits" and then "repeating" looks a lot like model tweaking or p-hacking. I know that's not the intent, and it's impossible to do modeling without tweaking, but if you approach things that way, the prior just looks like one more degree of freedom to nudge things around and fish with.
I've published and edited papers on Bayesian inference, and my feeling is that the problems with it have never been in the theory, which is solid. It's in how people use and abuse it in practice.
In an early chapter it outlines, rather eloquently, the distinctions between the Frequentist and Bayesian paradigms and in particular the power of well-designed Frequentist or likelihood-based models. With few exceptions, an analyst should get the same answer using a Bayesian vs. Frequentist model if the Bayesian is actually using uninformative priors. In the worlds I work in, 99% of the time I see researchers using Bayesian methods they are also claiming to use uninformative priors, which makes me wonder if they are just using Bayesian methods to sound cool and skip through peer review.
One potential problem with Bayesian statistics lies in the fact that for complicated models (100s or even 1000s of parameters) it can be extremely difficult to know if the priors are truly uninformative in the context of a particular dataset. One has to wait for models to run, and when systematically changing priors this can take an extraordinary amount of time, even when using high powered computing resources. Additionally, in the Bayesian setting it becomes easy to accidentally "glue" a model together with a prior or set of priors that would simply bomb out and give a non-positive definite hessian in the Frequentist world (read: a diagnostic telling you that your model is likely bogus and/or too complex for a given dataset). One might scoff at models of this complexity, but that is the reality in many applied settings, for example spatio-temporal models facing the "big n" problem or for stuff like integrated fisheries assessment models used to assess status and provide information on stock sustainability.
So my primary beef with Bayesian statistics (and I say this as someone who teaches graduate level courses on the Bayesian inference) is that it can very easily be misused by non-statisticians and beginners, particularly given the extremely flexible software programs that currently are available to non-statisticians like biologists etc. In general though, both paradigms are subjective and Gelman's argument that it is turtles (i.e., subjectivity) all the way down is spot on and really resonates with me.
Unlike frequentist statistics? :-)
--
Misinterpretations of P-values and statistical tests persists among researchers and professionals working with statistics and epidemiology
"Correct inferences to both questions, which is that a statistically significant finding cannot be inferred as either proof or a measure of a hypothesis’ probability, were given by 10.7% of doctoral students and 12.5% of statisticians/epidemiologists."
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9383044/
--
Robust misinterpretation of confidence intervals
"Only 8 first-year students (2%), no master students, and 3 postmasters researchers (3%) correctly indicated that all statements were wrong."
https://link.springer.com/article/10.3758/s13423-013-0572-3
--
P-Value, Confidence Intervals, and Statistical Inference: A New Dataset of Misinterpretation
"The data indicates that 99% subjects have at least 1 wrong answer of P-value understanding (Figure 1A) and 93% subjects have at least 1 wrong answer of CI understanding (Figure 1B)."
https://www.frontiersin.org/journals/psychology/articles/10....
But it's also a statistics problem because ethically you should incorporate your assumptions into the model. If the assumptions are statistical, then you can incorporate them in a prior.
I once read a Gelman blog post or paper that argued Frequentists should be more Frequentist (i.e., repeat experiments more often than they currently do) and Bayesians should be more Bayesian (i.e., be more willing to use informative priors and or make probability statements beyond 95% credible intervals). Or something like that, as I am paraphrasing. That always seemed reasonable. Either way, the dueling--and highly simplified--caricatures of Bayesians vs. Frequentists vs. likelihood folks is largely silly to me. Use the tool that works best for the job at hand, and if you can answer a problem effectively with a well designed experiment and a t-test so be it.
Consider the statement p(X) = 0.5 (probability of event X is 0.5). What does this actually mean? It it a proposition? If so, is it falsifiable? And how?
If it is not a proposition, what does it actually mean? If someone with more knowledge can chime in here, I'd be grateful. I've got much more to say on this, but only after I hear from those with a rigorous grounding the theory.
Every probability is defined in terms of three things: a set, a set of subsets of that set (in plain language: a way of grouping things together), and a function which maps the subsets to numbers between 0 and 1. To be valid, the set of subsets, aka the events, need to satisfy additional rules.
All your example p(X) = 0.5 says is that some function assigns the value of 0.5 to some subset which you've called X.
That it seems to be good at modelling the real world can be attributed to the origins of the theory: it didn't arise ex nihilo, it was constructed exactly because it was desirable to formalize a model for seemingly random events in the real world.
I have privately come to the conclusion that probability is a well-defined and testable concept only in settings where we can argue from certain exact symmetries. This is the case in coin tosses, games of chance and many problems in statistical physics. On the other hand, in real-world inference, prediction and estimation, probability is subjective and much less quantifiable than statisticians (Bayesians included) would like it to be.
> However, might it be leading us astray?
Yes, I think so. I increasingly feel that all sciences that rely on statistical hypothesis testing as their primary empirical method are basically giant heaps of garbage, and the Reproduciblity Crisis is only the tip of the iceberg. This includes economics, social psychology, large swathes of medical science, data science, etc.
> Consider the statement p(X) = 0.5 (probability of event X is 0.5). What does this actually mean? It it a proposition? If so, is it falsifiable? And how?
I'd say it is an unfalsifiable proposition in most cases. Even if you can run lots of cheap experiments, like with coin tosses, a million runs will "confirm" the calculated probability only with ~1% precision. This is just lousy by the standards of the exact sciences, and it only goes downhill if your assumptions are less solid, the sample space more complex, or reproducibility more expensive.
Probability isn’t a single concept, it is a family of related concepts - epistemic probability (as in subjective Bayesianism) is a different concept from frequentist probability - albeit obviously related in some ways. It is unsurprising that a term looks like an “ill-defined, unfalsifiable concept” if you are mushing together mutually incompatible definitions of it.
> Consider the statement p(X) = 0.5 (probability of event X is 0.5). What does this actually mean?
From a subjective Bayesian perspective, p(X) is a measure of how much confidence I - or any other specified person - have in the truth of a proposition, or my own judgement of the weight of evidence for or against it, or my judgement of the degree of my own knowledge of its truth or falsehood. And 0.5 means I have zero confidence either way, I have zero evidence either way (or else, the evidence on each side perfectly cancels each other out), I have a complete lack of knowledge as to whether the proposition is true.
> It it a proposition?
It is a proposition just in the same sense that “the Pope believes that God exists” is a proposition. Whether or not God actually exists, it seems very likely true that the Pope believes he does
> If so, is it falsifiable? And how?
And obviously that’s falsifiable, in the same sense that claims about my own beliefs are trivially falsifiable by me, using my introspection. And claims about other people’s beliefs are also falsifiable, if we ask them, and if assuming they are happy to answer, and we have no good reason to think they are being untruthful.
> From a subjective Bayesian perspective, p(X) is a measure of how much confidence I - or any other specified person - have in the truth of a proposition, or my own judgement of the weight of evidence for or against it, or my judgement of the degree of my own knowledge of its truth or falsehood.
See how inexact and vague all these measures are. How do you know your confidence is (or should be) 0.5 ( and not 0.49) for example? Or, how to know you have judged correctly the weight of evidence? Or how do you know the transition from "knowledge about this event" to "what it indicates about its probability" you make in your mind is valid? You cannot disprove these things, can you?
Unless you you want to say the actual values do not actually matter, but the way the probabilities are updated in the face of new information is. But in any case, the significance of new evidence still has to be interpreted; there is no objective interpretation, is there?.
Well, you don't, but does it matter? The idea is it is an estimate.
Let me put it this way: we all informally engage in reasoning about how likely it is (given the evidence available to us) that a given proposition is true. The idea is that assigning a numerical estimate to our sense of likelihood can (sometimes) be a helpful tool in carrying out reasoning. I might think "X is slightly more likely than ~X", but do I know whether (for me) p(X) = 0.51 or 0.501 or 0.52? Probably not. But I don't need a precise estimate for an estimate to be helpful. And that's true in many other fields, including things that have nothing to do with probability – "he's about six feet tall" can be useful information even though it isn't accurate to the millimetre.
> Or, how to know you have judged correctly the weight of evidence?
That (largely) doesn't matter from a subjective Bayesian perspective. Epistemic probabilities are just an attempt to numerically estimate the outcome of my own process of weighing the evidence – how "correctly" I've performed that process (per any given standard of correctness) doesn't change the actual result.
From an objective Bayesian perspective, it does – since objective Bayesianism is about, not any individual's actual sense of likelihood, rather what sense of likelihood they ought to have (in that evidential situation), what an idealised perfectly rational agent ought to have (in that evidential situation). But that's arguably a different definition of probability from the subjective Bayesian, so even if you can poke holes in that definition, those holes don't apply to the subjective Bayesian definition.
> Or how do you know the transition from "knowledge about this event" to "what it indicates about its probability" you make in your mind is valid?
I feel like you are mixing up subjective Bayesianism and objective Bayesianism and failing to carefully distinguish them in your argument.
> But in any case, the significance of new evidence still has to be interpreted; there is no objective interpretation, is there?.
Well, objective Bayesianism requires there be some objective standard of rationality, subjective Bayesianism doesn't (or, to the extent that it does, the kind of objective rationality it requires is a lot weaker, mere avoidance of blatant inconsistency, and the minimal degree of rationality needed to coherently engage in discourse and mathematics.)
For example, say Nate Silver and Andrew Gelman both publish probabilities for the outcomes of all the races in the election in November. After the election results are in, we can’t say any individual probability was right or wrong. But we will be able to say whether Nate Silver or Andrew Gelman was more accurate.
If you saw a sequence of 1000 coin tosses at say 99% heads and 1% tails, you were convinced that the same process is being used for all the tosses and you had an opportunity to bet on tails with 50% stakes, would you do it?
This is a pragmatic answer which rejects P(X)=0.5. We can try to make sense of this pragmatic decision with some theory. (Incidentally, being exactly 0.5 is almost impossible, it makes more sense to verify if it is an interval like (0.49,0.51)).
The CLT says that probability of X can be obtained by conducting independent trials and the in limit, the average number of times X occurs will approach p(X).
However, 'limit' implies an infinite number of trials, so any initial sequence doesn't determine the limit. You would have to choose a large N as a cutoff and then take the average.
But, is this unique to probability? If you take any statement about the world, "There is a tree in place G", and you have a process to check the statement ("go to G and look for a tree"), can you definitely say that the process will successfully determine if the statement is true? There will always be obstacles("false appearances of a tree" etc.). To rule out all such obstacles, you would have to posit an idealized observation process.
For probability checking, an idealization which works is infinite independent observations which gives us p(X).
PS: I am not trying to favour frequentism as such, just that the requirement of an ideal of observation process shouldn't be considered as an overwhelming obstacle. (Sometimes, the obstacles can become 'obstacles in principle' like position/momentum simultaneous observation in QM and if you had such obstacles, then indeed one can abandon the concept of probability).
It's a measure of plausibility - enabling plausible reasoning.
https://www.lesswrong.com/posts/KN3BYDkWei9ADXnBy/e-t-jaynes...
> Consider the statement p(X) = 0.5 (probability of event X is 0.5). What does this actually mean?
It means X is a random variable from some sample space to a measurable space and P is a probability function.
> If so, is it falsifiable? And how?
Yes, by calculating P(X) in the given sample space. For example, if X is the event "you get 100 heads in a row when flipping a fair coin" then it is false that P(X) = 0.5.
It's a bit like asking whether 2^2 = 4 is falsifiable.
There are definitely meaningful questions to ask about whether you've modeled the problem correctly, just as it's meaningful to ask what "2" and "4" mean. But those are separate questions from whether the statements of probability are falsifiable. If you can show that the probability axioms hold for your problem, then you can use probability theory on it.
There's a Wikipedia article on interpretations of probability here: https://en.wikipedia.org/wiki/Probability_interpretations. But it is pretty short and doesn't seem quite so complete.
I think you haven't thought about this deeply enough yet. You take it as self evident that P(X) = 0.5 is false for that event, but how do you prove that? Assuming you flip a coin and you indeed get 100 heads in a row, does that invalidate the calculated probability? If not, then what would?
I guess what I'm driving at is this notion (already noted by others) that probability is recursive. If we say p(X) = 0.7, we mean the probability is high that in a large number of trials, X occurs 70% of the time. Or that the proportion of times that X occurs tends to 70% with high probability as the number of trials increase. Note that this second order probability can be expressed with another probability ad infinitum.
On the contrary, I've thought about it quite deeply. Or at least deeply enough to talk about it in this context.
> You take it as self evident that P(X) = 0.5 is false for that event, but how do you prove that?
By definition a fair coin is one for which P(H) = P(T) = 1/2. See e.g. https://en.wikipedia.org/wiki/Fair_coin. Fair coins flips are also by definition independent, so you have a series of independent Bernoulli trials. So P(H^k) = P(H)^k = 1/2^k. And P(H^k) != 1/2 unless k = 1.
> Assuming you flip a coin and you indeed get 100 heads in a row, does that invalidate the calculated probability? If not, then what would?
Why would that invalidate the calculated probability?
> If not, then what would?
P(X) = 0.5 is a statement about measures on sample spaces. So any proof that P(X) != 0.5 falsifies it.
I think what you're really trying to ask is something more like "is there really any such thing as a fair coin?" If you probe that question far enough you eventually get down to quantum computation.
But there is some good research on coin flipping. You may like Persi Diaconis's work. For example his Numberphile appearance on coin flipping https://www.youtube.com/watch?v=AYnJv68T3MM
But that's a circular tautology, isn't it?
You say a fair coin is one where the probability of heads or tails are equal. So let's assume the universe of coins is divided into those which are fair, and those which are not. Now, given a coin, how do we determine it is fair?
If we toss it 100 times and get all heads, do we conclude it is fair or not? I await your answer.
No it's not a tautology... it's a definition of fairness.
> If we toss it 100 times and get all heads, do we conclude it is fair or not?
This is covered in any elementary stats or probability book.
> Now, given a coin, how do we determine it is fair?
I addressed this in my last two paragraphs. There's a literature on it and you may enjoy it. But it's not about whether statistics is falsifiable, it's about the physics of coin tossing.
No, it is really not. That you are avoiding giving me a straightforward answer says a lot. If you mean this:
> So any proof that P(X) != 0.5 falsifies it
Then the fact that we got all heads does not prove P(X) != 0.5. We could get a billions heads and still that is not proof that P(X) != 0.5 (although it is evidence in favor of it).
> I addressed this in my last two paragraphs...
No you did not. Again you are avoiding giving a straightforward answer. That tell me you are aware of the paradox and are simply avoiding grappling with it.
ants_everywhere is also correct that the coin-fairness calculation is something you can find in textbooks. It's example 2.1 in "Data analysis: a bayesian tutorial" by D S Sivia. What it shows is that after many coin flips, the probability for the bias of a coin-flip converges to roughly a gaussian around the observed ratio of heads and tails, where the width of that gaussian narrows as more flips are accumulated. It depends on the prior as well, but with enough flips it will overwhelm any initial prior confidence that the coin was fair.
The probability is nonzero everywhere (except P(H) = 0 and P(H) = 1, assuming both heads and tails were observed at least once), so no particular ratio is ever completely falsified.
> I'm not really sure it contributes to the discussion, but it's true
I guess maybe it doesn't, but the point I was trying to make is the distinction between modeling a problem and statements within the model. The original claim was "my theory is that probability is an ill-defined, unfalsifiable concept."
To me that's a bit like saying the sum of angles in a triangle is an ill-defined, unfalsifiable concept. It's actually well-defined, but it starts to seem poorly defined if we confuse that with the question of whether the universe is Euclidean. So I'm trying to separate the questions of "is this thing well-defined" from "is this empirically the correct model for my problem?"
Still today most of the classical machine learning toolbox is not generative.
I think this is a strength of Bayesianism. Any statistical work is infused with the subjective judgement of individual humans. I think it is more objective to not shy away from this immutable fact.
I guess that means I'm in the pragmatist school in this article's nomenclature (I'm a big fan of Gelman and all the other stats folks there), but what one thinks is pragmatic is also subjective.
See Breiman's classic "Two Cultures" paper that this post's title is referencing: https://projecteuclid.org/journals/statistical-science/volum...
Edit: corrected my sentence, but see 0xdde reply for better info.
[1] https://www.microsoft.com/en-us/research/publication/pattern...
Giving a cursory look into Bishop's book I see that I am wrong, as there's deep root in Bayesian Inference as well.
On another note, I find it very interesting that there's not a bigger emphasis on using the correct distributions in ML models, as the methods are much more concerned in optimizing objective functions.
In particular, variational inference is a family of techniques that makes these kinds of problems computationally tractable. It shows up everywhere from variational autoencoders, to time-series state-space modeling, to reinforcement learning.
If you want to learn more, I recommend reading Murphy's textbooks on ML: https://probml.github.io/pml-book/book2.html
The tighter the range of this function, the more confidence you have in the result.
You can never know anything if you absolutely refuse to have a prior, because that gives division by 0 in the posterior.
> I’m not sure if anyone ever followed this philosophy strictly, nor do I know if anyone would register their affiliation as subjective Bayesian these days.
lol the lesswrong/rationalist "Bayesians" do this all the time.
* I have priors
* YOU have biases
* HE is a toxoplasmotic culture warrior