Fox stats FAQs

I was kinda surprised by the level of interest in my recent analysis of foxes in furry porn, and really pleased to get loads of questions about it! I thought I’d put a quick follow-up post together to address some of the issues raised (a ‘Fox FAQ’, if you will), as I think it’s super important to try and make this sort of thing as clear as possible. The following are based on interactions I had both on and off Twitter, and I’ll keep this post updated if other questions come my way. I haven’t identified anyone in particular in association with specific comments (I thought that might seem a bit personal), so instead a general thanks to all of those who commented!

“Is a sample size of 207 really enough?”

A few people raised concerns about the sample size, somewhat understandably wondering that given the quantity of furry porn out there, whether 207 images could really be enough to form any meaningful conclusions. In retrospect I probably should have made things clearer on the image I attached to my original Tweet, as while the basic stat that only 32% of the images featured a fox on top is fun, my primary objective was what’s called a hypothesis test, which is where the sample size really comes in.

So, a bit of ‘stats 101’. A hypothesis test is where we propose some ‘truth’ (formally the ‘null hypothesis’), and then see if the data we collect are consistent with that truth. Here, the hypothesis I was testing was whether foxes are equally likely to top or bottom in the type of images I sampled. In other words, whether the probability a fox tops in any given picture is 50%. In a sample of 207 there were 66 foxes topping, which would be very unlikely if it were truly the case that foxes were just as likely to top as bottom. (Another way to think about it: if we tossed a fair coin 207 times, and got 66 or fewer heads, we’d be very surprised - that’s what the p-value tells us.) As such, a sample of 207 is plenty for testing this particular hypothesis, a fact also reflected in the confidence interval for the proportion of foxes who bottom: 61%-74%. The precise definition of a confidence interval is kinda awkward, but you can think of this as a ‘plausible range’ of values for the proportion of fox bottoms in all art of this type.

I’m not entirely sure where people get a sense of what a ‘large enough’ sample size is, but in reality this will be different for every analysis (I regularly work with datasets smaller than this, for instance). What a larger sample size does is give us more certainty about our result: it makes it easier to reject a hypothesis (if it’s truly false to begin with), and gives us a narrower confidence interval. My main interest was just testing the hypothesis that foxes top and bottom with equal probability, a hypothesis that was soundly rejected. A larger sample would (most likely) not change whether that hypothesis is rejected; it’d just make the confidence interval smaller (so we’d have a narrower range of plausible values for the true proportion of foxes who top or bottom).

To get even more statsy, in this context the width of your confidence interval is inversely proportional to the square root of your sample size. That square root is important: it means you get diminishing returns with larger samples (for n = 100, the square root is 10, for n = 1000, it’s about 32; so increasing the sample size by a factor of 10 only narrows the confidence interval by a factor of 3). You may have learned in a stats class that for a test such as this we should aim for a sample size of around 1,000 (this is very common in electoral polling, for example), but that calculation is based on some fairly arbitrary requirements about how wide the confidence interval should be, and also assumes a ‘worst-case’ scenario where the true proportion you’re trying to estimate is close to 0.5. I had a pilot study of sorts to base my sample size calculations on (I gathered a slightly different dataset a while back which gave a similar estimate), so I was happy with a smaller sample.

“This doesn’t really tell us that fox furries are sluts, does it?”

There are a couple of aspects of this question. One is that the results don’t generalize much beyond the dataset I worked with (i.e., furry art featuring a fox and another species, with the various additional constraints I mentioned in the write-up). The other is that it addresses foxes topping and bottoming, rather than the more general ‘slut’ stereotype. These are both really important issues!

With regards to the ‘slut’ stereotype, I hope most readers appreciated that the analysis was limited to tops and bottoms, and that I deviated from the term ‘slut’ very early on. (I have some ideas for how to look at this more directly, but most of them require a lot of time spent gathering data.) This is a good example of how, when reporting (and reading) statistical results, it’s essential that one’s phrasing is incredibly clear and precise. A very common statistical crime is to take an analysis and then report an interpretation of the results, even if that interpretation is not objectively justified from the data. Here, for example, my actual conclusions were framed explicitly in terms of topping and bottoming, rather than the more general ‘foxes are sluts lol’. Always watch out for this when you see statistics being reported (such as in the media) - usually formal tests don’t make snappy headlines!

The other important issue is that this work was limited to visual art. I note this in the limitations (and make it fairly explicit in the summary at the start of the write-up), but it bears repeating. I don’t make any claims about the behaviour of fox furries in general, only what is seen in this dataset. This is another essential part of interpreting statistical results: always pay very close attention to the dataset under study, and how this may (or may not) extrapolate to a wider population.

“What’s your background?”

Understandably, a few people wondered about my background and experience, so here’s a very short summary! I’m currently a junior statistics professor (so I have a PhD in stats, and a couple more degrees in math and statistics), and spend most of my time engaged in largely theoretical research (although I do a bit of undergraduate teaching). While my day-to-day work is pretty abstract (I basically stare at algebra all day), I have a number of collaborations with scientists from other specialties, so I’m regularly involved in the nitty gritty of ‘real world’ data analysis. I like to stress to people that analyses such as this rely much more on extensive experience of real-world data analysis, and not so much on fancy degrees - as I mention in the write-up, the hardest part is forming the research question and dataset; no algebra required!

“Pie charts are bad and you should feel bad.”

I was incredibly pleased someone took issue with my using a pie chart in the original Tweet. If you’re surprised by this, a little background: pie charts are renowned among statisticians (and data scientists in general) for being horrible, awful, very not good ways to display data. This is mainly because it’s hard to visually compare relative areas. We’re pretty good at comparing lengths, so bar charts are great, but area is really hard (there are some good references on Wikipedia for this). An amusing side note is that in the statistical programming language I use for analysis (called R), the help page for the pie chart command itself says “Pie charts are a very bad way of displaying information”! I made an exception here because pie charts are just about ok when you’re comparing two categories, and I put the percentages on anyway, but rest assured I do not condone pie charts and am not in the pocket of big pie.

“What next?”

I mentioned a few future directions in the original post, but a big limitation is getting data (it’d be good if we could train computers to distinguish folfs and foxes and count penises, but I think we’re a little way off that). My glamorous assistant is currently putting the finishing touches to a wolf dataset which is looking fairly interesting, and we might have a go at one or two other species in a similar vein. I have a bunch of other little side projects that have been sitting on the back burner for far too long, however, so keep an eye out for those as well!

Written on January 2, 2017