Let's say I'm a dairy farmer in WisconsinWisconsin. I look at some data about milk output from my dairy and find that output is down 50% from last year. This is a remarkable drop, and it must be explained! What on earth is causing this? I need to call a vet to check that there isn't some infection afflicting the cows. I need to send our feed for analysis and make sure it's properly balanced. I need to have our milking machinery serviced and checked. This is all going to be very time-consuming and expensive, but it's very important to get to the bottom of it.
If our StackExchangeStack Exchange cows are our answerers, the analysis presented in OP is suggesting that our very best cows, the ones that produce more than 3 answers in a given week, are being particularly affected. They're dropping faster than the overall answers! The proportion of answers by the >=3 answerers is down! But, before we go about an expensive way of figuring out what is targeting our best producers, is there any simpler explanation?
I run the simulation 1000 times. The mean total answers in the simsimulation is 21000, 62.6% of the original count (good reality check there).
It looks to me like indeed, thethe reduction in top answerers, at least comparing just these two weeks of data, is more than what you'd expect just from an across-the-board drop. The simulation predicts 1461 users with >=3 answers, but the actual April data only had 1360 such users. If you were just expecting the number to drop proportionateproportional to the number of answers, though, you would have predicted 2399 * 0.626 = 1502 users, so some of the drop is accounted for by the general effect filtered through the threshold, rather than the specific one.
VeryI am very open to feedback and criticism on this approach. I didn't have a lot of time to play around in SEDE, but if someone wants to make a data set that has week-by-week data instead of just one week extracted, I'd be happy to tweak the rest of my code to plot this out over time.