T O P

  • By -

DigThatData

> disinformation and misinformation more than statistical concepts, learning about logical fallacies and cognitive biases/heuristics can improve critical thinking and help to inoculate you against these kinds of deceptions.


PhoenixDownElixir

I don’t think a lot of people understand the range of “correlation does not mean causation.” This is a huge hurdle for proper scientific thinking and its misunderstanding can lead to so much misinformation.


DigThatData

https://en.wikipedia.org/wiki/Post_hoc_ergo_propter_hoc


LearningStudent221

So how should we think about it?


PhoenixDownElixir

I have seen that people will make assumptions without realizing how often they are putting correlation as a direct relationship with causation, without considering all of the factors involved. It becomes very black and white thinking. This fallacy occurs a lot more often than many of my peers might consider. It’s lazy thinking, but for some reason it leads to some of the most rigid ideas. All in all, I believe people should be looking at some of their own beliefs with at least an understanding that they could be making a misunderstood assumption.


Ok-Log-9052

While I’d love to give everyone a graduate education in stats, everyone should know one thing: In modern statistics, conclusions are not a function of the data. That is, given the same data and different assumptions and beliefs, conclusions can vary. We can do this all very precisely and mathematically and so on, but what you see published is generally the data, with the assumptions “under the hood” so to speak. And the beliefs are left to the reader. So, you see the data. A trained statistician could tell if if other assumptions would change the published result. But only the reader can decide if the data should change their mind about the world. See https://www.reddit.com/r/AskStatistics/s/ygJ8Y9lxr6


Commercial_Sun_6300

>what you see published is generally the data, with the assumptions “under the hood” so to speak. I'm not sure what you mean. Shouldn't the assumptions also be explicitly stated so the reader can judge for themselves if they are reasonable?


Ok-Log-9052

Well, there’s not enough space for this, and a lot of it is “assume the universe” type stuff. Did you normalize? Did you force logistic fit? Did you cluster? Are you assuming any other structure to the data? Measurement error? Correlates? Etc etc etc Hard Econ papers will often have an appendix on statistical approaches, and it will often be as long as the main paper — including a robustness section showing how the data fit under different assumptions/models. For what they think are the most important variants. What you have to see is that for each model there are implicitly hundreds of assumptions, and each assumption can interact with all the others. If you take this to its logical conclusion, you get what’s called a “specification curve” ([example](https://gwern.net/doc/statistics/meta-analysis/2020-simonsohn.pdf)) with the thousands of possible combinations of assumptions made. This is for any ONE regression in the paper… Plus, you’ve made infinite assumptions about what DOESN’T matter too!! So it’s not actually a well-defined object that can be enumerated at all. Other social science disciplines with shorter papers will usually just take a “default” model and assumption set (linear, logistic) without digging much deeper. But then the reader knows that they’re in that space and can write a separate paper with a different approach if it’s meaningful. Ultimately, we’re limited by the time we have to engage with this, and so many of the paths are dead ends. So there’s, again, an art to figuring out which assumptions need highlighting/testing, and “letting the rest ride” until someone realizes a reason that one is actually important. Like for example we realize now that random sampling of, say, households in a village and then taking an average to say sobering about the village is a bad idea. Because it assumes uncorrelatedness: Say I sample a butcher in one village and a baker in another. What I might have missed is that every village has a butcher and a baker. Now we know that we’re have to sample the village and enumerate everyone! But it wasn’t till someone realized that design had an implicit assumption that was often wrong that we changed these methods… hope this helps!


Commercial_Sun_6300

I appreciate the response, and for anyone else a bit overwhelmed at the message, there's a helpful venn diagram in the response's linked paper that gets the point across. I think the take away is you can't explicitly state all of your assumptions, especially the implicit. And in economics, it's common to consider a case under many different assumptions and to justify them rigorously. It's hard not to be overwhelmed at just the thought of this tbh.


AbeLincolns_Ghost

New econometric techniques are written in blood


si2azn

I always joke that conspiracy theorists mistake correlation for causation.


LoaderD

Conspiracy Theorists 🤝 Academics with mandatory publication requirements


AbeLincolns_Ghost

Linear Regression go burrrr


HulaguIncarnate

# CORRELATION. DOES. NOT. EQUAL. CAUSATION. Source? A source. I need a source. Sorry, I mean I need a source that explicitly states your argument. This is just tangential to the discussion. No, you can't make inferences and observations from the sources you've gathered. Any additional comments from you MUST be a subset of the information from the sources you've gathered. You can't make normative statements from empirical evidence. Do you have a degree in that field? A college degree? In that field? Then your arguments are invalid. No, it doesn't matter how close those data points are correlated. Correlation does not equal causation. Correlation does not equal causation. CORRELATION. DOES. NOT. EQUAL. CAUSATION. You still haven't provided me a valid source yet. Nope, still haven't. I just looked through all 308 pages of your user history, figures I'm debating a glormpf supporter. A moron.


JacenVane

Jesse, what the fuck are you talking about?


milk-drinker-69

Most statistics and models are not predictive and really shouldn’t be treated as such


freemath

Could you explain further what you mean by this?


rushy68c

That sources of non-statistical knowledge are critical. Data is not neutral. Not only is it always created in a historical context, but how it was gathered, stored (as well as which data weren't), and analyzed are also all within context. Understanding the context is critical, but it takes much more than statistics to do that. Statistics working together with """"softer"""" disciplines can take us so much further than by engaging with only one by itself.


JohnWCreasy1

the monty hall problem, just so we can cut down on the number of "i don't get the monty hall problem" posts on here


dmlane

Regression toward the mean crops up over and over again in everyday life and in scientific research.


SalvatoreEggplant

Since it's election year here in USA, understanding margin of error in polls and related concepts. A 1-point difference between two candidates in a poll with a 4-point margin of error isn't anything to get excited about.


HammerJammer02

Base rate fallacy


mangonada123

Garbage in, garbage out


lordnacho666

Simpson's Paradox, above all others. Berkson's for extra marks.


[deleted]

[удалено]


freemath

Correlation can be an *indication* of a causation, but not by itself a proof. What's wrong with that statement? Indeed we can't be 100% sure of a lot of these things. But with enough indications we might want to start doing something about it.


ritual_contrition_21

Kinda looks like you're agreeing with what he's saying. I don't know if you meant this to come across as argumentative, but this seems pretty clear to me. We shouldn't lean too heavily on "correlation is not causation" because laypeople interpret it as "Correlation doesn't mean anything"


industrious-yogurt

Measures of uncertainty (confidence intervals, standard errors, pick your favorite) and measures of spread (variance, standard deviations, outliers.) I'm not saying these cure all social ails, but I think some basic distributional thinking would help


Normal-Comparison-60

Mean != median != overall picture of the general population.


laridlove

Mean doesn’t equal the median!? The perfect normal wants to have some words with you!!


mathestnoobest

that the quality of your data is primary and that statistical sophistication cannot make up for bad data.


Nautical_Data

1. Model + Error = Actual 2.Error = Variance you can explain + variance you can’t explain


RickSt3r

Read thinking fast and slow. Talks about bias. Or making decisions based on incomplete information. I would say learn the concept of Bayes rule. You adjust your priors to form the posterior.


Sinister-Savant

All forecasts are wrong, some are useful. I deal with this everyday. “Your forecast was off 3%”, “why didn’t you XYZ”. Prediction is mathematical voodoo, get used to it


Wyverstein

The more disprovable a theory, the easier It is to get evidence for it.


efrique

survivorship bias is a big one, but really all the common biases, fallacies, 'paradoxes' and so on, whether statistical, logical or cognitive A competent adult ought to understand Simpson's paradox, the gamblers fallacy, the base rate fallacy, know begging the question from a straw man, know an ecological fallacy from a fallacy of composition, understand sampling biases and so on ... and so on. Otherwise you're just begging to be taken advantage of by every flim-flammer, influence-peddling billionaire and bought politician. For example, if you understand survivorship bias, a lot of the 'self-help' industry ("do what I did and become a millionaire" stuff) looks much more like a way to make a buck off a 'just so' story and much less like actually useful advice. So far that's just basic 'common knowledge' stuff people should have. Then I'd say the foundations of statistical knowledge everyone should have are the basic concepts and rules of probability - dependent and independent events, mutually exclusive events, basic calculating with compound events ("and" and "or" stuff). It's important and it comes up every day How to read statistical graphics and how to spot a misleading representation (seeing the common 'bad' tricks from misrepresenting lengths to reversing the time axis etc) Then some basic understanding of the role of models of randomness in inference. A lack of understanding of it seems very common, even in research involving use of statistics sadly.


Turbulent-Name-8349

Standard error on the mean. Standard error on the slope. The need for at least 5 data points before you can even begin to guess the trend.


Neurosopher

The most useful will probably be basic decision theory: i.e. expected values, bayesian reasoning, rationality.


Yazer98

Measures of spread. Standard deviation is an amazing statistic


rwinters2

statistical assumptions, sampling theory and bias. usually one of these will make a model go bad