My summary of your nine-google-doc-page question
- You have tested how the ChatGPT-detectors work on human written posts and you did not like the accuracy.
- You came up with a heuristic that you use to determine the number of possible ChatGPT-posts on the site.
- Your heuristic did not show a surge of ChatGPT-posts, but the number of suspended users of a certain segment increased.
- The number of answers continued to decrease, although the outflow of questions did not.
Multiplying one by the other, you concluded that the moderators incorrectly suspended 16 times more users than they should have.
My thoughts
If the community is fighting something intentionally, then it will find more precedents than if the users stumble upon the problematic posts by accident
For the past few months, the community has been specifically looking for ChatGPT-posts. When we deliberately look for something, we will find much more of it by definition. Look at the data for any other community-driven initiatives. For example, look at the data for the plagiarism detection initiative that the community held recently.
Users get suspended for low quality content, plagiarism and more, not justonly for ChatGPT-posts
When the community deliberately works on something, other types of activities usually increase as well. By searching for potential ChatGPT-posts, users will definitely find plagiarism, low quality posts, and so on. Much of this will result in additional suspensionsuspensions. Moreover, we will see suspended “now” users for their “old” posts.
The number of answers is most of the time late
Questions and answers are like eggs and chickens. It takes a while for an egg to become a chicken. In addition, the number of answer givers in the last few years has declined faster than those who ask questions. Especially those who post many answers.
Your heuristic may not be correct. And it is definitely not correct during the period of "active cleaning" of the site
According to the information in your question, all conclusions are based on some heuristic about the time it takes for a user to create a post. Have you thought that it may be incorrect or not very accurate, especially when the community proactively is working on cleaning the site?
Moreover, have you tried at least normalizing it by the number of processed flags over a period and presenting the data not by the suspension date, but by the date of posting the content that violates the rules? I think you would see a different picture.
With ChatGPT, those who would not be included in the group of "active users" will get there
And when they are suspended, they will create the very anomaly that you write about.
The term “false-positive” has a specific definition
To talk about a false-positive rate, you need to know exactly how many posts the moderators processed incorrectly out of the total number. But you didn't analyze their work! No way you can talk about the “false-positive rate”, you simply do not have data for that.
To be honest, I spent an hour and a half reading your post, but did not find the meaning of many of your "scientific” dictums.
What I see
Your data is a set of facts taken out of context, which are in no way connected with each other, and even more so with your conclusions.