We’re using a machine learning model to fight against spam

Question

Years ago, the dedicated Charcoal organization taught us that yes, you can teach a machine to flag spam automatically. After all, to quote ArtOfCode from How does spam protection work on Stack Exchange?:

When the flagging gets tough, the tough... hand over the reins to the computers and let them do the work.

In the spirit of this, we’ve decided to take a page out of Charcoal’s playbook and teach our own little robot how to recognize and flag spam.

TL;DR: What’s happening?

We’re now using a machine learning model that we’ve trained in-house to evaluate posts on the platform, for the purposes of catching spam. At this time, we are only raising flags on Super User’s spam with this model. The results have been very promising, and we’ve caught an overwhelming majority of spam that Super User sees day-to-day. We are continuing to iterate on this initiative and hope to be able to deploy to multiple sites in the near future.

What’s this about a robot?

Following up on the work we did to release Similarity anti-spam, we want to reduce the labor required to deal with spam. We want you all to be building, not just destroying. Nobody likes dealing with spam, and if we can accurately handle more of it autonomously, we should.

Similarity anti-spam is great at recurring spam campaigns, but the percentage of all spam on the platform that it takes action on is generally low. At the time of writing and since its network-wide release, the system has taken action on about 30% of all spam deleted across the network, and it brings with it some false positives that are a little noisy for moderators. We’re still tweaking these systems and we’ll be back with an update and some data on our progress there, so stay tuned.

We should ideally be taking proactive action on more than just the minority of spam. Adding a new detection method that works hand-in-hand with our existing method, but is informed by a different methodology, is a great way to work towards that.

Our anti-spam system has its own brain now

We’ve built a machine learning model that we’ve trained primarily on the spam we’ve observed from Super User. The model accepts a post’s content and returns a confidence score. At a level of 80% confidence, we feel we can safely raise a non-binding spam flag, and at 90% confidence, we will raise a binding spam flag that deletes the post instantly. This behavior closely resembles our Similarity anti-spam efforts, and these two systems will work in conjunction with each other. These confidence thresholds, much like Similarity anti-spam, can be adjusted at a per-site level in the event that a particular site observes higher than normal false positives.

With the consent of site moderators, we decided to set the ML model loose on Super User as of December 10th, 2025 to see how accurate its flagging would be. The result is very impressive: 76% of all Super User spam has been acted on by the combined efforts of Similarity anti-spam and ML anti-spam between the ML launch date and today. 60% of all spam was deleted automatically with a binding flag. We had zero false positive binding flags during this time. We did have one false positive non-binding flag, but it didn’t appear to be a far cry from spam anyway.

Here’s the raw data of the ML model’s efforts on Super User for the duration of this trial. The % values represent the percentage of all spam on Super User in all contexts:

Total Spam on SU	# Autoflagged (%)	Non-binding TP flags (%)	Binding Flags (%)	Non-binding flags (FP)
50	35 (70%)	5 (10%)	30 (60%)	1

Let’s bring Similarity anti-spam’s efforts into the mix! Our Similarity anti-spam brings us an additional 6% coverage to bring us up to 76% at the cost of one additional false positive non-binding flag. Here’s the raw data of the combined efforts of our anti-spam systems (both ML and Similarity detection methods) on Super User since December 10th, 2025 in a table:

Total Spam on SU	# Autoflagged (%)	Non-binding TP flags (%)	Binding Flags (%)	Non-binding flags (FP)
50	38 (76%)	8 (16%)	30 (60%)	2

Our model training on network-wide data looks promising. We’re primarily targeting the sites that receive the highest volume of spam such as Super User, Ask Ubuntu, Stack Overflow, and Server Fault. Our hope is that we can turn on automatic flagging for these sites in 2026. We will do this on a site-by-site basis when we’re confident that our model will be a boon for that site and have adequately communicated the data to, at a minimum, that site’s moderators before activating it.

What do the anti-spam processes look like under the hood?

Currently, all post activity by a post’s author initiates a spam evaluation, where both our ML and Similarity anti-spam systems evaluate its “spamminess”. If the evaluation’s results indicate that the post is spam, we will raise a spam flag automatically. This evaluation currently only utilizes our Similarity anti-spam system to determine automatic action (except on Super User). When we’re ready, we can activate the ML model’s flagging capabilities on a site-by-site basis.

These systems operate independently of each other, but we’re toying with the idea of these systems informing for and against each other. If we do this, we might consider downgrading a binding spam flag from Similarity anti-spam to a non-binding flag if the ML model thinks a post is not spam, or upgrading a non-binding flag to a binding flag if both detection methods consider a post to be quite suspicious.

How did the ML model’s training go?

It wasn’t easy! The developers who were involved in the model’s training had to surmount several hurdles when sourcing their training data. The dataset that’s used to train the model must be as accurate as possible for the model to perform well, and many initial attempts at a network-wide dataset were met with some sub-par results. An initial slim-down of our data to Super User’s deleted spam was the most promising because, hilariously, it gets a high volume of “high-quality” spam. This is to say, a majority of spam on Super User is easily identifiable as spam, even to a robot.

Furthermore, we had to tackle another problem, one a few moderators on the network have identified: We need to ensure that “deleted spam” is adequately defined. From the beginning, we programmatically considered any deleted post with a helpful spam flag as “deleted spam”... But that’s not the whole story, or necessarily accurate. Oftentimes, moderators will cast a normal delete vote as a response to spam flags, signalling that the post does deserve to be deleted but isn’t necessarily the kind of spam we want to “destroy” and invoke red-flag-deletion penalties against. These posts are not a good fit to train against because we broadly permit amnesty on these kinds of posts. Ruling out these situations has greatly improved the model’s evaluation accuracy, while not significantly reducing the amount of spam we’d catch.

Another fun hurdle was dealing with spam that was edited by well-meaning high-rep users to remove the spam payload. We decided that not including any posts that were edited by anyone other than the post author was the correct decision as we can’t trust that the content of posts edited by anyone else is still spam.

These improvements to our programmatic definition of spam are also being applied to our Similarity anti-spam detector, as this will reduce the noisiness of false positive non-binding flags. They also permit us to grow our training dataset to include multiple sites without a significant reduction in accuracy.

What tools do moderators have to audit ML anti-spam?

This system is built to work alongside Similarity anti-spam, so all of the places you’d expect to see the results of its work will also reveal the efforts of the ML model. This means that moderators have access to spam evaluation timeline entries to see these evaluations’ results. Here is an example post’s spam evaluation from Super User where Similarity anti-spam would not have flagged, but the ML model determined that it was very spammy and cast a binding flag:

Moderators can also view these entries in their auto-flagged spam dashboard. You’ll notice a few other changes since our last announcement, as we now include filtering options by Status as well as an option to only show post authors whose profiles aren’t deleted. If ML anti-spam flagging is enabled on your site, there will also be a filtering option for “Detection type” which allows you to filter down to only ML or only Similarity anti-spam determinations:

What’s next?

We’ll continue training the best model we can for a rollout on sites where we think we can make a meaningful difference in spam prevention. Our timeframe for a rollout would likely be some time in early 2026. We’ll also pay very close attention to how this system performs on Super User over the coming weeks. Given that the percentage of overall spam we’re unilaterally deleting is much higher, we need to ensure that the flags we’re casting are very accurate, and tweak our system accordingly if we’re off the mark. Interrupting legitimate users’ participation is a red line we don’t want to cross.

We’re also still evaluating how our auto flagged spam dashboard could transform into a broader spam-fighting hub, as requested in the earlier announcement. This is a rather large undertaking, and it’s broadly informed by what decisions we make regarding opening visibility into this dashboard to non-moderators. A large hurdle here is that the ability to see deleted posts is essential for this page to be useful, and changing how we distribute that ability needs to be done very carefully. We’d like to start that conversation as soon as we can.

On behalf of the moderation tooling team, I’d like to thank you all again for your support and feedback on our anti-spam systems so far. The response to this work has so far been overwhelmingly positive, which is fantastic. The issue of spam on the platform is one I have personal stake in due to my participation with Charcoal, and it feels amazing to be able to work with the moderation tooling team directly to be able to tackle it.

The developers for the machine learning model will be around on this post to answer any technical questions you may have, and we’ll be paying close attention to respond to any additional feedback, requests for data, or concerns you may have. This post is being posted close to our year-end break, so please be patient with us if there is a pause in responses.

Was the "one false positive non-binding flag" on this answer? See this chat transcript for some background. — Chester Gillon
– Chester Gillon, Commented Dec 16, 2025 at 22:42
@ChesterGillon That one was flagged by Similarity anti-spam, not the ML model. It would be pretty funny if it had a grudge against Journeyman Geek though! I'd link to the ML's false positive but I don't want to make the author of the false positive feel like I'm calling their post out specifically as spammy since it seems to be a solid enough answer, it just looks suspicious at first glance. — Spevacus
– Spevacus StaffMod, Commented Dec 16, 2025 at 22:49

Stack Exchange Network

We’re using a machine learning model to fight against spam

TL;DR: What’s happening?

What’s this about a robot?

Our anti-spam system has its own brain now

What do the anti-spam processes look like under the hood?

How did the ML model’s training go?

What tools do moderators have to audit ML anti-spam?

What’s next?

0

You must log in to answer this question.

Linked

Hot Network Questions

We’re using a machine learning model to fight against spam

TL;DR: What’s happening?

What’s this about a robot?

Our anti-spam system has its own brain now

What do the anti-spam processes look like under the hood?

How did the ML model’s training go?

What tools do moderators have to audit ML anti-spam?

What’s next?

0

You must log in to answer this question.

Linked

Related

Hot Network Questions