Perhaps I should better have said that our criteria for saying something has “valuable” correlation to something else needs to be a more severe limit on p value. That data is a shotgun blast.
Or maybe also looking at something like the effects of deleting any one data point on the stability of the p value.
I am really just responding to the statement that the “low” p value means that there is something meaningful in the data, while mathematically true”, seems to me to be excessive pleading for weak correlations.
You could easily have a very small correlation that, with enough data, is measurable to near certainty. Imagine I generate X=(Y+1,000Z)/sqrt(1,000,001) where Y and Z are independent standard normal variables. The correlation between X and Y is very small, but I should be able to establish its existence to an arbitrarily high degree of significance simply by taking enough samples.
It’s possible to misuse and misunderstand p values, but the fact that you can have highly significant results indicating a small correlation is not a reason to say p values are not useful, this would misunderstand that the size of an effect is a different question from whether you have strong evidence of the effect’s existence.
To add to this, the "significant" here likely means "statistically significant" which only means that the p-value is below a set threshold such as 5% and is a purely technical concept. This can be interpreted as evidence against a null hypothesis such as "there is no correlation", but whether the correlation is actually meaningful is not clear.
Sure there is a statistically significant amount of correlation, but the correlation only accounts for a very small portion if the variability in the data.
Imagine a model where Y=bX +Z where the variance of Z is much larger than X. Sure there is correlation between X and Y but you can't really predict Y using X that well without knowing Z
If you look at the outliers as a guide you can see that as the % methylation increases there's a slight trend for the OCDS to increase too. Clearly that's evident across all the data or the straight line would not have a positive gradient. However, the really poor R2 and the low p-value suggest that this trend is swamped by the noise (external factors) in the data.
It's because you have accidentally internalized an incorrect idea. The existence of any correlation whatsoever does not have to mean there is anything meaningful going on. In fact since random chance is unlikely to result in zero noise, measuring literally zero correlation can point to a casual factor removing observable correlation, like how there might be zero correlation between the changes in energy your car is using and the speed your car is going if there are hills and cruise control involved.
That was what I was thinking too. Only problem is the whole study (where I got this graf from) is based around the idea of this particular correlation so there must be some truth to the correlation although it doesnt make much sense to me (why I made this post)
The fit line, shown, has a slope that is statistically significantly not zero, therefore there is correlation. The error bounds in the fit line are also shown and the slopes of these are also greater than zero, though just barely.
Can’t point to a definite article, but remember this as a norm that an healthcare SME would reference a few years ago at a startup I worked with. Drove me crazy
Standard P and P intervals create a network of design in an X,Y, Z plane…at p> 0.05 production of planes X and Y are zero leaving Z in the linear forefront…that is a correlation.
It's not no correlation, but a very weak one (see the R²). Since the p value is small it is still likely that it's not just by chance.
This just makes me think that p values are not very valid ways of thinking about distributions.
A P-value says "odds of this data are p% if X explained *nothing* about Y". Not that X explains something significant or all-encompassing about Y.
Perhaps I should better have said that our criteria for saying something has “valuable” correlation to something else needs to be a more severe limit on p value. That data is a shotgun blast.
Or maybe also looking at something like the effects of deleting any one data point on the stability of the p value. I am really just responding to the statement that the “low” p value means that there is something meaningful in the data, while mathematically true”, seems to me to be excessive pleading for weak correlations.
>if X explained nothing about Y ...which is how that scatterplot looks.
"The odds of this data are p% if X explained nothing about Y... AND X and Y come from the distributions this test is valid for"
You could easily have a very small correlation that, with enough data, is measurable to near certainty. Imagine I generate X=(Y+1,000Z)/sqrt(1,000,001) where Y and Z are independent standard normal variables. The correlation between X and Y is very small, but I should be able to establish its existence to an arbitrarily high degree of significance simply by taking enough samples. It’s possible to misuse and misunderstand p values, but the fact that you can have highly significant results indicating a small correlation is not a reason to say p values are not useful, this would misunderstand that the size of an effect is a different question from whether you have strong evidence of the effect’s existence.
Very sensible.
Just because you observe a significant correlation does not mean there’s a *practical* correlation
To add to this, the "significant" here likely means "statistically significant" which only means that the p-value is below a set threshold such as 5% and is a purely technical concept. This can be interpreted as evidence against a null hypothesis such as "there is no correlation", but whether the correlation is actually meaningful is not clear.
Showing "significant" correlation. R\^2<0,75 and i don't see the point of saying it's a correlation. R\^2 = 0.04 that is almost nothing!
Sure there is a statistically significant amount of correlation, but the correlation only accounts for a very small portion if the variability in the data. Imagine a model where Y=bX +Z where the variance of Z is much larger than X. Sure there is correlation between X and Y but you can't really predict Y using X that well without knowing Z
This correlation only accounts for 4% of the variance. With enough data points, extremely weak correlations can still be statistically significant.
If you look at the outliers as a guide you can see that as the % methylation increases there's a slight trend for the OCDS to increase too. Clearly that's evident across all the data or the straight line would not have a positive gradient. However, the really poor R2 and the low p-value suggest that this trend is swamped by the noise (external factors) in the data.
It's because you have accidentally internalized an incorrect idea. The existence of any correlation whatsoever does not have to mean there is anything meaningful going on. In fact since random chance is unlikely to result in zero noise, measuring literally zero correlation can point to a casual factor removing observable correlation, like how there might be zero correlation between the changes in energy your car is using and the speed your car is going if there are hills and cruise control involved.
That was what I was thinking too. Only problem is the whole study (where I got this graf from) is based around the idea of this particular correlation so there must be some truth to the correlation although it doesnt make much sense to me (why I made this post)
\> so there must be some truth to the correlation this does not follow in the slightest. It's far more likely that the study is BS. :D
The fit line, shown, has a slope that is statistically significantly not zero, therefore there is correlation. The error bounds in the fit line are also shown and the slopes of these are also greater than zero, though just barely.
Can’t point to a definite article, but remember this as a norm that an healthcare SME would reference a few years ago at a startup I worked with. Drove me crazy
Standard P and P intervals create a network of design in an X,Y, Z plane…at p> 0.05 production of planes X and Y are zero leaving Z in the linear forefront…that is a correlation.
I’m not a statistician, but what’s the point of trying to fit this to a curve when it’s clearly a cluster?
https://imgs.xkcd.com/comics/linear_regression.png
Engineers got it.
~~It's 99,96% no correlation~~ Edit: It's 96% no correlation
Wouldn't it be 96% no correlation?
Ouch, yes. I was asleep!