It is time to stop teaching frequentism to non-statisticians (2012)

70 points by Tomte 17 hours ago

Original paper:

https://web.archive.org/web/0if_/https://arxiv.org/pdf/1201....

Yes old, but even worse, it is not a well argued review. Yes, Bayesian statistics are slowly gaining an upper hand at higher levels of statistics, but you know what should be taught to first year undergrads in science? Exploratory data analysis! One of the first books I voluntarily read in stats was Mosteller and Tukey’s gem: Data Analysis and Regression. A gem. Another great book is Judea Pearl’s Book of Why.

nxobject 14 hours ago

On the subject of prioritizing EDA:
I need to look this up, but I recall in the 90s a social psychology journal briefly had a policy of "if you show us you're handling your data ethically, you can just show us a self-explanatory plot if you're conducting simple comparisons instead of NHST". That was after some early discussions about statistical reform in the 90s - Cohen's "The Earth is round (p < .05)" I think kick-started things off.
wiz21c 15 hours ago

Definitely. It always amazes me that in many situations, I'm applying some stats algorithm just to conclude: let's look at these data some more...
jononor 13 hours ago

Yes. And the same for DS/ML people also, please. The amount of ML people that can meaningfully drill down and actually understand the data is surprisingly low sometimes. Even worse for being able to understand a phenomena _using data_.
- Charon77 6 hours ago
  
  When you have a lot of fancy metrics/models/bootsraps to throw at, people would just see what sticks.

usgroup 4 hours ago

I consider myself an applied Statistician amongst other things, and I find this to be an ideological take mostly.

When we do Statistics, we are firstly doing Applied Mathematics, which we are secondly extending to account for uncertainty for our particular problem. Whether your final model is good will largely depend on how it serves the task it was built for and/or how likely its critics believe it is to be falsified in its alternative hypothesis space. That is, a particular uncertainty extension is not necessary nor sufficient.

For less usual examples, engineers may use Interval Arithmetic to deal with propagation uncertainty, quants might use maximin to hedge a portfolio, management science makes use of scenario analysis (deterministic models under different scenarios): all deal with uncertainty, none necessarily invoke either Frequentist or Bayesian intuitions.

So, in my opinion, the most useful thing to teach neophytes is how to model with Maths. Second, it is how to make cases for the model under uncertainty.

hnuser123456 13 hours ago

Okay, apparently this is the core of the debate?:

Frequentists view probability as a long-run frequency, while Bayesians view it as a degree of belief.

Frequentists treat parameters as fixed, while Bayesians treat them as random variables.

Frequentists don't use prior information, while Bayesians do.

Frequentists make inferences about parameters, while Bayesians make inferences about hypotheses.

---

If we state the full nature of our experiment, what we controlled and what we didn't... how can it be a "degree of belief"? Sure, it's impossible to be 100% objective, but it is easy to add enough background info to your paper so people can understand the context of your experiment and why you got your results. "we found that at our college in this year, when you ask random students on the street this question, 40% say this, 30% say this..." and then considering how the college campus sample might not fully represent a desired larger sample population... what is different? you can confidently say something about the students you sampled, less so about the town as a whole, less so about the state as a whole...

I don't know, I finished my science degree after 10 years and apparently have an even mix of these philosophies.

Would love to learn more if someone's inclined.

joshjob42 3 hours ago

Well even for simple things there's a large difference. Say you toss a coin N times and observe heads x times. What is the probability of your next toss coming up heads?
A frequentist arguably would say the question doesn't really have any meaning since probabilities are about long run frequencies of things occuring. They might do various tests or tell you the probability of that outcome under various probabilities for heads.
A Bayesian would make an initial assumption about the probability of any given probability, and then compute a posterior using the likelihood function the frequentist may have, and give you a distribution for what you should believe about the what the true probability of heads is on your next coin toss.
In general, the latter is more meaningful and informative. There's also pretty good arguments that any coherent method of representing credences is isomorphic to probability, see Cox's theorem.
usgroup 3 hours ago

I think Bayesian methods have made ground in sciences such as Sociology, Psychology and Ecology, which are mostly observational, but still attempt to make models with intepretable parameters.
With observational studies, representing confounders and uncertainty is a primary concern, because they are the most important source of defeater. Here, Bayesian software such as brms, Stan, pyMC, become a flexible way to integrate may sources of uncertainty. Although, I suspect methods like SEM still dominate for their use cases.
Personally, I find myself using Bayesian methods in a similar bag-of-tricks way that I use Frequentist methods mostly because its difficult to believe that complex phenomena is well described by either, so I use whatever makes the case best.

getnormality an hour ago

What I hear when I read this: the way we do things today has definite and well-known problems. Wouldn't it be wonderful to do things in a different way whose problems are not yet well-understood or widely known?

perrygeo 15 hours ago

Frequentists stats aren't wrong. It's just a special case that has been elevated to unreasonable standards. When the physical phenomenon in question is truly random, frequentist methods can be a convenient mathematical shortcut. But should we be teaching scientists the "shortcut"? Should we be forcing every publication to use these shortcuts? Statistic's role in the scientific reproducibility crisis says no.

tgv 3 hours ago

NHST, which is part of frequentist statistics, is wrong, plain and simple. It answers the wrong question (what's the probability of the data given the hypothesis vs. what's the probability of the hypothesis given the data), and will favor H1 under conditions that can be manipulated in advance.
There is a total lack of understanding of how it works, but people think they know how to use it. There are numerous articles out there containing statements like "there were no differences in age between the groups (p > 0.05)". Consequently, it is the wrong thing to teach.
That's apart from the more philosophical question: what does it mean when I say that there's a 40% chance that it team A will beat team B in the match tomorrow?
- StopDisinfo910 28 minutes ago
  
  NHST is not wrong. It’s widely misused by people who barely understand any statistics.
  Reducing frequentist statistics to testing and p-value is a huge mistake. I have always wondered if that’s how it is introduced to some and that’s why they don’t get the point of the frequentist approach.
  Estimation theory makes a lot of sense - to me a lot more than pulling priors out of thin air. It’s also a lot of relatively advanced mathematics if you want to teach it well as defining random variables properly requires a fair bit of measure theory. I think the perceived gap comes from there. People have a somewhat hand wavy understanding of sampling and an overall poor grounding in theory and then think Bayes is better because it looks simpler at first.
kccqzy 14 hours ago

Frequentism methods are strictly less general. For example Laplace used probability theory to estimate the mass of Saturn. But with a frequentist interpretation we have to imagine a large number of parallel universes where everything remains the same except for the mass of Saturn. That's overly prescriptive of what probability means. Whereas in Bayesian statistics what probability means is strictly more general. You can manipulate probabilities even without fully defining them (maximum entropy) subject to intuitive rules (sum rule, product rule, Bayes' theorem), and the results of such manipulation are still correct and useful.
- roenxi an hour ago
  
  > But with a frequentist interpretation we have to imagine a large number of parallel universes where everything remains the same except for the mass of Saturn. That's overly prescriptive of what probability means.
  That isn't much of an argument to the mathematicians. Nobody ever came up with a compelling explanation for what -1 sheeps look like and yet negative numbers turned out to be extremely practical. If it is absurd and provably works then the math community can roll with that.
- StopDisinfo910 13 hours ago
  
  Laplace is typical use of inference statistics to built an estimator. I don’t really understand your point about parallel universe here. It’s absolutely not necessary for any of the sampling to make sense. Every time you try to measure anything, you are indeed taking a sample of the set of measures you could have gotten given the tools you are using.
  I fear you operate under the illusion that frequentist statistics are somehow limited to hypothesis testing. It is absolutely not the case.
- perrygeo 14 hours ago
  
  Drawing a sample of Saturns from an infinite set of Saturns! It's completely absurd, but that's what you get when you take a mathematical tool for coin flips and apply it to larger scientific questions.
  I wonder if the generality of the Bayesian approach is what's prevented its wide adoption? Having a prescribed algorithm ready to plug in data is mighty convenient! Frequentism lowered the barrier and let anyone run stats, but more isn't necessarily a good thing.
  - IshKebab 13 hours ago
    
    I dunno about you guys but I have no problems imagining randomly sampling Saturn.
    
    mitthrowaway2 3 hours ago
    
    What do you mean by "randomly sampling" here?
    
    IshKebab 3 hours ago
    
    I mean, Saturn was formed by some process right? And it must be sensitive to some initial conditions that - although maybe not really random, we can treat as random. Now imagine going back in time and changing those conditions a bit so that Saturn ended up differently. Do that 1000 times, giving you 1000 different Saturns. Now pick one randomly.
wenc 12 hours ago

Frequentist methods are unintuitive and seemingly arbitrary to a beginner (hypothesis testing, 95% confidence, p=0.05).
Bayesian methods are more intuitive, and fit how most be reason when they reason probabilistically. Unfortunately Bayesian computational methods are often less practical to use in non-trivial settings (usually involves some MCMC).
I'm a Bayesian reasoner, but happily use frequentist computation methods (max likelihood estimation) because they're just more tractable.

NewsaHackO 17 hours ago

It’s weird how random people can submit non peer reviewed articles to preprint repos. Why not just use a blog site, medium or substack?

jxjnskkzxxhx 17 hours ago

> Why not just use a blog site, medium or substack?
Because it looks more credible, obviously. In a sense it's cargo cult science: people observe this is the style of science, and so copy just the style; to a casual observer it appears to be science.
- nickpsecurity 14 hours ago
  
  Professional science has been doing that a long time if one considers that many published works were never independently tested and replicated. If it's a scientist, and uses scientific descriptions, many just repeat it from there.
  - jxjnskkzxxhx 14 hours ago
    
    Overly reductionistic. At the same time a proper rebuttal isn't worth the time for someone who's clearly not looking to understand.
T-A 7 hours ago

> It’s weird how random people can submit non peer reviewed articles to preprint repos.
Assuming that you are referring to the Arxiv, they can't:
https://info.arxiv.org/help/endorsement.html
groceryheist 16 hours ago

Two reasons:
1. Preprint servers create DOIs, making works better citable.
2. Preprint servers are archives, ensuring works remain accessible.
My blog website won't outlive me for long. What happened to geocities could also happen to medium.
- SoftTalker 16 hours ago
  
  Who would want to cite a random unreviewed preprint?
  - mitthrowaway2 16 hours ago
    
    You don't get a free pass to not cite relevant prior literature just because it's in the form of an unreviewed preprint.
    If you're writing a paper about a longstanding math problem and the solution gets published on 4chan, you still need to cite it.
    
    NooneAtAll3 15 hours ago
    
    tbf, you cite the paper that described and discussed said solution in the more appropriate form
    
    mousethatroared 14 hours ago
    
    You cite the form you encountered and if you're any good of a researcher you will have encountered the original 4chan anon post, Borges' short story, or Chomsky's linguistic paper.
  - jononor 13 hours ago
    
    Anyone who found something useful in it and are writing a new paper.
    That something is unreviewed does not mean that it is bad or useless.
  - bowsamic 15 hours ago
    
    It happens way more than you expect. In my PhD I used to cite unreviewed preprints that were essential to my work but simply for whatever reason hadn’t been pushed to publication. More common for long review like papers
  - amelius 16 hours ago
    
    Maybe other pseudoscientists who agree with the ideas presented and want to create a parallel universe with alternative facts?
    
    mousethatroared 14 hours ago
    
    And people who care more for gatekeeping will stick to academic echo chambers. The list of community driven medical discoveries encountering entrenched professional opposition is quite long.
    Both models are fallible, which is why discernment is so important.
    
    jononor 13 hours ago
    
    You can do that with reviewed papers too :)
billfruit 17 hours ago

Why the gatekeeping. Only what is said matters, not who says it.
- tsimionescu 16 hours ago
  
  That's a cute fantasy, but it doesn't work beyond a tiny scale. Credentials are critical to help filter data - 8 billion people all publishing random info can't be listened to.
  - SoftTalker 16 hours ago
    
    > 8 billion people all publishing random info can't be listened to.
    Yet it's what we train LLMs on.
    
    tsimionescu 16 hours ago
    
    It's what we train LLMs on to make them learn language, a thing that all healthy adult human beings are experts on using. It's definitely not what we train LLMs on if we want them to do science.
    
    verbify 15 hours ago
    
    There's a paper Textbooks are all you need - https://arxiv.org/abs/2306.11644
    > We introduce phi-1, a new large language model for code, with significantly smaller size than competing models: phi-1 is a Transformer-based model with 1.3B parameters, trained for 4 days on 8 A100s, using a selection of ``textbook quality" data from the web (6B tokens) and synthetically generated textbooks and exercises with GPT-3.5 (1B tokens). Despite this small scale, phi-1 attains pass@1 accuracy 50.6% on HumanEval and 55.5% on MBPP. It also displays surprising emergent properties compared to phi-1-base, our model before our finetuning stage on a dataset of coding exercises, and phi-1-small, a smaller model with 350M parameters trained with the same pipeline as phi-1 that still achieves 45% on HumanEval
    We train on the internet because, for example, I speak a fairly niche English dialect influenced by Hebrew, Yiddish and Aramaic, and there are no digitised textbooks or dictionaries that cover this language. I assume the base weights of models are still using high quality materials.
    
    birn559 16 hours ago
    
    Which are known to be unreliable beyond basic things that most people that have some relevant experience get right anyway.
  - billfruit 8 hours ago
    
    GitHub lets anyone upload code. It works perfectly fine.
    
    tsimionescu 6 hours ago
    
    There's no problem in letting anyone upload. The problem is in claiming that we should give the same amount of attention to the work of anyone, that "only what is said matters". Just like we don't run random code off github, we have no reason to read random papers on arxiv. And, even on github, anyone using a project knows that "who is maintaining this project" is a major decision factor.
    
    billfruit 5 hours ago
    
    My objection was to the concept of gatekeeping/barriers to entry for posting/uploading. Not that everything uploaded demands the same attention.
- BlarfMcFlarf 16 hours ago
  
  Peer review specifically checks that what is being said passes scrutiny by experts in the field, so it is very much about what is being said.
  - SJC_Hacker 16 hours ago
    
    They why isn't it double blind ?
    
    BDPW 15 hours ago
    
    Often reviewing is executed double blind for exactly this reason. This can be difficult in small fields where you can more-or-less guess who's working on what, but the intent is definitely there.
    
    mcswell 15 hours ago
    
    I've reviewed computational linguistics papers in the past (I'm retired now, and the field is changing out from under me, so I don't do it any more). But all the reviews I did were double blind.
- birn559 16 hours ago
  
  If what is said has any merit can be very hard to judge beyond things that are well known.
  In addition, peer reviews are anonymous for both sides (as far as possible).
  - ujkiolp 16 hours ago
    
    i would filter your dumb shit
- jxjnskkzxxhx 14 hours ago
  
  > news.ycombinator.com/user?id=billfruit
  > Why the gatekeeping. Only what is said matters, not who says it.
  Tell me you zero media literacy without telling me you have zero media literacy.
- watwut 15 hours ago
  
  Yeah, that is why 4chan became famous for being the source of trustworthy and valuable scientific research. /s
  - randomNumber7 4 hours ago
    
    Science is in a strange state, but I don't think the current HN audience (of inexperienced ai script kids) is the crowd to have a valuable discussion about it.
  - billfruit 8 hours ago
    
    GitHub works on a similar model, without any barrier of entry, and it works well.
constantcrying 14 hours ago

>It’s weird how random people can submit non peer reviewed articles to preprint repos.
It is weird how people use a platform exactly how it is supposed to be used.

throwaway81523 5 hours ago

Some time back I remember a blog post about stuff you could straightforwardly do with frequency statistics that were much more difficult with Bayesian methods. I thought I bookmarked it but have no idea where it is now. I half remembered it being on Andrew Gelman's blog, but I spent a while looking there for it. No luck.

brudgers 16 hours ago

Previous submission comments, https://news.ycombinator.com/item?id=32341770

bmacho 16 hours ago

Article is from 2012, compare [0] and [1].

The pdf got replaced for some reason (bug, sensitive information in the meta or idk), but the article seems to have stayed the same, except the date.

[0]: https://arxiv.org/pdf/1201.2590v1.pdf

[1]: https://web.archive.org/web/0if_/https://arxiv.org/pdf/1201....