6

We've deployed machine learning auto-flagging

Given the positive reception, we've activated machine learning anti-spam's auto-flagging feature, as detailed below. Staff will be monitoring its work very closely over the next few weeks to ensure accuracy, especially with regards to automatic binding flags that result in instant deletion.

We're excited to see how big of an impact this feature makes on the site's spam influx, and we'll be back to share the data on the ML model's effectiveness in a couple months.

On behalf of the moderation tooling team, thank you all for your feedback and analysis of our work; your voice is an essential part of the work that we do!


We’ve started using a machine learning model to automatically identify, flag, and delete spam, and this site seems like a great place to activate it on. So far it’s been quite effective on Super User, which is the only site where it’s currently flagging automatically. We’ve improved the model since we last talked about it on Meta Stack Exchange, and we’d like to share its effectiveness to determine whether it should guard your site as well.

How do the anti-spam capabilities work?

When a post is created or a post author edits their post, our systems subject it to a spam evaluation. We currently subject posts to two checks, a similarity evaluation and a pass through our ML model. The ML model is trained on several cumulative years’ worth of deleted spam across the network’s data, and yields a confidence score between 0% and 100%. At a high level of “spam confidence”, we’ll raise a non-binding spam flag. This non-binding flag counts towards the four spam flags that are required for automatic deletion as spam. At a very high “spam confidence” level, we will automatically delete the post with a binding spam flag.

Automatic non-binding spam flags do not affect the post’s score and are largely invisible to non-moderators. They can be dismissed by moderators like any other flag if they’re found to be unhelpful. Automatic binding spam flags attach a unique post notice, which links to an explanatory help center article. Our hope with this help center article is that legitimate posts have a visible and easy path to undeletion with the help of a handling moderator.

For more detailed information about how ML anti-spam operates, please review the network-wide announcement.

How effective would ML anti-spam be on Server Fault?

Over our year-end break, from December 19th, 2025 to January 12th, 2026, we ran the ML model in silent observation mode across the network. During this timeframe, Server Fault observed 47 spam posts. ML anti-spam, if enabled, would’ve flagged 36 of them, with 30 of those flags being binding flags resulting in instant deletion. There would have been 2 false positive non-binding flags which a moderator would have been able to dismiss upon review. This means that ML anti-spam would’ve caught 76.5% of all spam during this time period with a 94.7% accuracy rating.

Here’s the ML model’s theoretical flagging summary data in a table:

# Total Spam # Autoflagged (%) Non-binding TP flags Binding TP Flags
47 36 (76.4%) 6 30

It is worth noting that this system will work alongside Similarity anti-spam which is already flagging on Server Fault, so reviewing that anti-spam detector’s effectiveness over the same time period feels appropriate. Since December 10th, 2025, Similarity anti-spam caught 21, or 44.6%, of spam on Server Fault. It has, so far, observed no false positives on Server Fault. While there is some detection overlap here, it’s clear that ML anti-spam detects significantly more spam, at the cost of a couple false positive non-binding flags.

Takeaways

Our analysis of this data indicates that ML anti-spam would be a great addition to Server Fault’s defenses, and should help lighten the load on the community’s flags. Do you feel differently? Do you observe any issues with the data as we’ve laid it out? Does anything else problematic stick out to you?

We’ll be monitoring this post for feedback until Wednesday, January 21st 2026. Our goal is to move forward with enabling ML anti-spam’s flagging capabilities after the end of the feedback window. If you see any serious issues that could hinder the rollout, please be sure to detail them in an answer so we can resolve them before we move forward.

2 Answers 2

2

Given Starship's critiques, I'm still in favor of turning this on as someone who will see the non-binding flags.

We also have Smokey working on top of these new flags, adding another flavor to the mix. A single non-binding flag will show up in the mod dashboard.

  • If Smokey doesn't agree, that post will linger long enough for a human to check it out. This handles the false-positive case.
  • If Smokey agrees that post is spam, it'll quickly earn more flags. Given the referenced post, such a soft-flag by ML and flag by Smokey should lead to enough flags to remove a post without a human in the loop.

These are two separate systems mutually agreeing that something is suspicious. A consensus-model approach is useful here.

I also need to mention that we get a slow drumbeat of people organically throwing spam-flags on content that references specific products that otherwise isn't all that spammy. Sometimes it's from a long time user with no history of spammy behavior, so an incidental product mention is not a sign. Sometimes the response is timely and the product is merely used as an example, clearly not a recommendation. There are a variety of reasons a mod will look and an organic spam-flag and not agree. This system would perhaps increase the rate of such flags we deal with, but this would go into improving the model (or so I hope).

2
  • No, the issue here is that they aren’t 2 totally different things each with a random chance of being wrong. Posts that look suspicious but are fine likely both get a high Smokey score and a high score on this Commented Jan 15 at 21:58
  • 1
    While it is true humans make bad spam flags sometimes, if they do that often and flag a bunch they get flag banned. Furthermore, it’s unlikely 4 people who don’t get what spam is all independently happen upon and then flag a post within a few hours (because then a mod would decline) Commented Jan 15 at 22:00
1

This is not accurate enough, and SmokeDetector could do this much better

I'm very glad you are all thinking about this; it's something I've advocated for years and I do believe most spam handling can and should be automated. That being said, SmokeDetector would be a much better candidate to use in such a system.

Based a simple combination of minimum reason count/reason score (which I came with in 5 minutes, it could easily be improved) 50% on Server Fault could be nuked without a single false positive in any of those tens of thousands of posts throughout the past 9 years.

The statistics for this system on this site aren't the most insightful considering the extremely small sample size, but if the system's statistics on Stack Overflow are anything to go by, it isn't nearly accurate enough. Roughly the same amount of spam would be removed without human intervention under my 100% accuracy idea made in 5 minutes than with this system.

While there hasn't been a false positive binding flag yet, that's probably just because you haven't tested it enough. If it had a 1% false positive rate for binding flags like on SO, it is very likely that it wouldn't bindingly flag a false positive out of the only 30 binding flags cast. Considering this system has a had a 1% false positive rate on SO and likely has false positives here to, while using SmokeDetector's systems would have gotten no false positive in 9 years of spam, I think this is simply not nearly as good.

Furthermore, a 1% false positive rate is just too high, and so is anything in that range, especially when it's a binding flag. Even just to cast non-binding flags, SmokeDetector needs at 99.75% confidence. I'd be hard-pressed to find an auto-flagging bot of the community's with a false positive rate higher than 1%. Simply deleting a false positive is much, much worse. Rather than just wasting a mod's time, you've actively deleted a post that will probably not be noticed or undeleted, given a user an unjust reputation penalty, and unless they intimiately familiar with the inner workings of the site, you've probably driven them off to.

The 94.7% accuracy rate of the non-binding flags is still too low. That's an unreasonable level of accuracy for an experienced human on just about any sort of flag, let alone a bot and when talking about spam/rude/abusive flags. I think it needs at least 98-99% accuracy to be reasonable. We could catch much more spam just by lowering Smokey autoflag accuracies to this level than the 80% this system gets.

2
  • 1
    I'd like to mention, in addition to the thoughts I left on your post on Meta SO, that the datasets you're comparing against are not equivalent in size, time period, or composition. Activity was higher on this site during a bulk of that dataset, and by contrast we observed a smaller period of time during one of our least-active times of the year. I, and I'm sure the mods here, would gladly take 2 non-binding FPs if over 3/4ths of spam is removed autonomously. Commented Jan 15 at 6:00
  • @Spevacus You are missing the point. The point is you have failed to get enough data here to draw a statistically significant idea of the accuracy here. Only on SO is there anything like enough data to draw a valid conclusion, therefore there isn’t much choice but to use the data from SO until this is tested enough here. Commented Jan 15 at 12:16

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.