From the same guy who brought us spurious correlations, a fun way to show that correlation is not causation via graphs of correlations between very different things that do not cause each other.
I did attach an image but because of a Lemmy/Mbin issue I don’t think I can have actual alt text, so here is the alt text.
A website, whose title is “spurious scholar”, with the subtitle “Because if p < 0.05, why not publish?”
Step 1: Gather a bunch of data.
Step 2: Dredge that data to find random correlations between variables.
Step 3: Calculate the correlation coefficient, confidence interval, and p-value to see if the connection is statistically significant.
Step 4: If it is, have a large language model draft a research paper.
Step 5: Remind everyone that these papers are AI-generated and are not real. Seriously, just pick one and read the lit review section.
Step 6: …publish:
Then there are two screenshots from papers generated with this method.
Also, clicking the note for step 2 has some pretty educational content on being naughty with data, at least for me, someone who is not an academic:
“Dredging data” means taking one variable and correlating it against every other variable just to see what sticks. It’s a dangerous way to go about analysis, because any sufficiently large dataset will yield strong correlations completely at random.
Fun fact: the chart used on the wikipedia page to demonstrate data dredging is also from me. I’ve been being naughty with data since 2014.
Feel free to nix this for linking to a website linking to AI-generated stuff and not a real paper. Although I think for the purpose of trying to spout bullshit on purpose as a humorous lesson on bad academic methods, it’s okay.