6

As we are seeing a lot of spam, why can't we blacklist questions with a description containing a mobile number and the string "loans" combination on this platform?

9
  • 2
    Does this answer your question? Can a machine be taught to flag spam automatically? Commented Jul 22, 2023 at 4:57
  • 27
    We have an automated system for that; right now it is offline because of the moderator strike. Commented Jul 22, 2023 at 4:57
  • 4
    I understand the question as a request for implementing input validation. By doing this it wouldn't be possible to post a question, answer, comment containing spam and therefore no need to flag afterwards and delete.
    – U880D
    Commented Jul 22, 2023 at 5:13
  • 12
    As I've explained here, there is an automated system to stop spam (including loan spam with phone numbers). However, that system is currently on strike.
    – cocomac
    Commented Jul 22, 2023 at 5:48
  • @U880D to my understanding, some forms of that also exist; however, it's quite hard to handle this sort of thing comprehensively with regex without also hitting tons of false positives. Spammers commonly exploit "unicode symmetry" attacks to create text that is readable but does not use the sequences of characters one would naturally expect it to. Commented Jul 22, 2023 at 5:52
  • 1
    @KarlKnechtel, "comprehensively with regex", I wouldn't use regex. "hitting tons of false positives" implemented and used naive Bayes spam filtering in the past, I haven't experienced such. But that's something the Inc. would need to implement. Furthermore, someone how has access to the whole data of the site could analyze the pattern what's makes a spammer and prevent upfront posing spam.
    – U880D
    Commented Jul 22, 2023 at 5:59
  • I don't know exactly what is implemented. I am pretty sure the site staff imagine they have a vested interest in not telling us exactly what is implemented. Commented Jul 22, 2023 at 6:04
  • 11
    Unrelated to the strike, Charcoal's metasmoke server is currently down (ISP issues; expected downtime several days still); but once it's back up, you can explore its search functionality. There is a rule for phone numbers (split into several reasons, "phone number detected in" ... title, question, answer? I don't remember the precise wording) and you can separately search for "loan". You need to be a registered user to use regex search, which would easily let you find the intersection of these. My suspicion is that you'll see a precision less than 100%
    – tripleee
    Commented Jul 22, 2023 at 7:36
  • 1
    Actually metasmoke.erwaysoftware.com/… has only TP (actual spam) hits in metasmoke. The same search on post bodies produces one false positive, but a very high detection rate also.
    – tripleee
    Commented Jul 30, 2023 at 6:31

1 Answer 1

39

The SE spam filter is notoriously broken and has been like this for a long time.

Usually the difference is caught by the Charcoal project, but hostility from SE has caused the project to go on strike together with many other moderators and curators*. Which is very unfortunate, since there were plans to actually improve the reliability (as in availability, not accuracy) of Charcoal by increasing support from SE (where there is next to none currently, except for increased API limits). Before the AI craze, it looked like we were making progress. Unfortunately, the company has decided to scare people away instead of continuing the trend forwards.

* :

4
  • 12
    Perhaps a more diplomatic articulation would be "Stack Exchange has not put a lot of effort into their own spam filter because Charcoal was handling a lot of the effort using free volunteer resources." In retrospect, perhaps we should have gone on strike sooner (ha ha, only serious).
    – tripleee
    Commented Jul 22, 2023 at 10:05
  • 13
    @tripleee this already is a diplomatic articulation, no need to playdown genuine harm.
    – Akixkisu
    Commented Jul 22, 2023 at 12:51
  • We have two ways. First don't let user submit the question which is more suitable i believe, other one is let the user feed the question and bot will delete it if found suspicious. So by this way we have two layers act as active, active. I am sure how it's implemented just sharing thoughts, team may be already thought of this. My question is why this is still happening as it's not good for this kind of platform.
    – asktyagi
    Commented Jul 23, 2023 at 5:27
  • 4
    If the company wanted to run Smoke Detector, they could. The efficacy of the bot will dwindle over time without the cadre of volunteers who update its detections in response to new spam campaigns and other emerging threats, though.
    – tripleee
    Commented Jul 23, 2023 at 7:33

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.