Return to Answer

added 7 characters in body

Source Link

edited Oct 8, 2024 at 12:58

Joachim

10.8k
2
25
75

questions that
- are duplicates
- are incomplete
- are poorly formatted
- are poorly tagged
- where thecontain code is in the form of an image
- are not in the correct language
- should have been posted on another SE site
- should have been posted on meta
- are spam
answers that
- are asking if anyone's found the answer yet
- are comments on the question or one of the answers
- are new questions
- are spam
- are outdated or version-specific
- are duplicates
comments that
- identify duplicates
- answer the question
- indicate the issue was solved
- indicate that answers don't work or are otherwise problematic
- are abusive/chatty/not needed

Source Link

answered Oct 2, 2024 at 20:00

Catija

116.3k
47
309
444

This post and linked blog posts focus on a subset of concerns that, while valid, miss the mark for me when it comes to indicating that the company's concerns and priorities are connected to mine as a member of the community, particularly when I would expect a post here on MSE to be written for a different audience than the SO Blog.

As others have noted, the communication here feels very impersonal and distant. It fails to recognize or empathize with the struggles and concerns community members have voiced, let alone address them. And, honestly, using the word "observed" to refer to the community's reaction to many of your recent changes when most of them are extremely unpopular is... amusing.

The thing is, this post and those blogs are written as if y'all are presenting something novel and groundbreaking to the community. As if the community hasn't been living and breathing it and trying to get the company to do something for that entire time! For the last two years that AI has been a struggle, community members have been acting to protect the content from AI generated junk and understand the implications of data models training on SE user generated content... but the platform's issues existed long before then.

The blog posts and this question fail to recognize that one of the main concerns of community members is that y'all are spending money and time "securing" the data on a platform that's public and open on the internet for everyone to use, which many feel is futile and I haven't seen any recognition from staff of this concern. The converse is true, in fact - as one of the blog posts mentions that these companies find more and different ways to get around the restrictions y'all have created. And, while I know and trust the skills the staff have and am more likely to go along with "we can't discuss it publicly" as an explanation, even I have a hard time believing this isn't just going to be a perpetual game of whac-a-mole that will continue absorbing all of the money and time y'all have to invest.

In particular, to respond to these efforts to "steal" CC BY-SA data, you're making it significantly more difficult for users to get the data themselves and negatively impacting tools users have had to resort to building because the company hasn't significantly invested in improving the platform in a decade. You're also failing - or at best struggling - to respond to the issues community members express about these changes, priorities, and decisions.

There's a cost/benefit equation here and at this point, I'm seeing all costs and no benefits, even though in April 2022 Philippe's response to a question about the use of the company's revenue from AI companies made the promise:

The money that we raise from charging these huge companies that have billions of dollars on their balance sheet will be used for projects that directly benefit the community.

Despite that, this question doesn't say anything that reaffirms this promise nor does it indicate it's still forthcoming, which two of the blog posts at least manage to do in indefinite, conceptual ways. Unfortunately, the way it is worded in the blogs doesn't give me a lot of confidence, either.

In the attribution post, it says:

[...] we will build confidence in and with our communities as we use partner investment in these integrations to directly invest in building the tools and systems that drive community health.

The Data protection post says:

Knowledge authors and curators get: [...] Revenue from licensing invested into the platforms and tools they use to create the knowledge sets.

I'm going to be honest - the things y'all are planning to focus on seem designed to drive volume of content, not quality. I understand that having volume is much more measurable than quality, so it's tempting to increase volume - but much of the volume that exists currently is so predictably low-quality, the system should be able to prevent or handle it automatically.

You also seem focused on attracting new contributors, not investing in supporting the contributors who worked to create and curate the content that exists. New users/creators/curators are also valuable to seek out but existing tools and barriers to participation have core issues that writing better onboarding won't solve - these chores get bothersome, particularly when the tasks are so simple a computer could have - and should have - done them.

Y'all seem intent in relying on community curation through manual review queues (like Staging Ground and close/first questions/late answers) and user-built tools (Charcoal, SO Botics), rather than building tools (and even using AI creatively) to identify and/or prevent

questions that
- are duplicates
- are incomplete
- are poorly formatted
- are poorly tagged
- where the code is in an image
- are not in the correct language
- should have been posted on another SE site
- should have been posted on meta
- are spam
answers that
- are asking if anyone's found the answer yet
- are comments on the question or one of the answers
- are new questions
- are spam
- are outdated or version-specific
- are duplicates
comments that
- identify duplicates
- answer the question
- indicate the issue was solved
- indicate that answers don't work or are otherwise problematic
- are abusive/chatty/not needed

None of this needs to be manual any more! Unfortunately, by making the volume higher but relying on manual methods to review/prevent problematic content, it's unlikely the community will be able to ensure quality is assured! The thing is, we don't have to wait for the higher volume to know these issues exist. The community has been asking for the company's help addressing these issues for years and your solution has generally been to give them more manual work, leaving users to do what they could to slog through queues or automate things.

Charcoal, SOCVR, SOBotics, and other user projects have been addressing a ton of these issues for years, often with just REGEX - but they rely on users being willing to invest the time and effort to build the tools, invest their own money and hardware to keep them running, keep them updated, and use them. Stack Overflow Inc. hasn't done anything of this scale! The closest thing the company has done was an overhaul of mod tools to better identify suspicious votes - which y'all haven't even bothered to announce despite it being well-received by mods, particularly on SO!

Please, I implore you - invest in the platform to reduce how manual curation is rather than building more manual labor for volunteers. Free up the community members to do fun things. Reward them for the work they do, recognize they're people who are the key to this platform having the value the company now seems to see in it.

Talk about the value of the SE community the way Wikipedia's Director of Machine Learning, Chris Albon talks about theirs - as people, not as the "knowledge store" they create.

The ubiquity of Wikipedia can make it easy to forget that behind every fact, every image, and every word in every article is a person — a person with a life, with family, friends, and pressures on their time.

People are why Wikipedia continues to persist.

For over twenty years and multiple times a second as you read this, thousands of people are spending their time building a global commons of freely available, neutral, reliable information for the world. That is why, despite the rapid changes the internet has gone through, the online encyclopedia remains relevant. [...]

One thing that will never change is our fundamental belief that Wikipedia comes from its editors. The foreseeable future remains that a large group of humans working together — debating, discussing, fact-checking each other, even when using AI to boost their efforts — will be the best way to create reliable knowledge.

[...] But as the internet is flooded with AI-generated content (some of it good, much of it bad) the value of an individual human volunteer, spending their evenings after the kids are asleep or after they get off from their job, building and improving the knowledge commons that is Wikipedia, will only become more important.

There will be something after this artificial intelligence boom, and something else after that. Still, the work of Wikipedia and its volunteers to build a reliable resource of information freely for the world will continue. Wikipedia is here to stay.

Y'all, I tried to trim it as much as I could but so many of the paragraphs just felt so validating... and made me want to be part of their efforts. I'm not saying Wikimedia doesn't have a troubled relationship with the community of editors but at least they can seem supportive to an outsider. Stack Overflow, on the other hand is trying to extol the value of a human community while simultaneously making them sound like a machine!

Community remains at the center of what happens on the platform and the core engine that drives everything.

There are specific people inside the company who do actually manage to keep the community's needs in mind when making decisions. I worked with some really amazing people who did everything they could to make sure that was the case and some of them are still there fighting for it and pushing back against things that would harm the community and this platform. I appreciate their work and recognize how exhausting it is. Specifically, there was a time I was on the brink of quitting but for the intervention of one of those people.

But let's not kid ourselves. Those people are the exception, not the rule. If community was actually at the center of the platform,

these amazing people wouldn't have to fight and push back and pick and choose what to stick their neck out for or try to negotiate the "least bad" solution to something the company has decided is necessary.
Moderators and community members wouldn't have to rebuff plans over and over while trying to convince the company of the real harm the plans would do.
The blog posts in this series would read more like the one from Wikimedia's blog.

Yes, there are absolutely times for even community-centric companies to take a stand when it comes to what's best for a platform, even if it overrules or conflicts with the wishes of the community using it. It's totally reasonable for a company to set priorities and goals, determine how they'll solve problems, and ensure they're financially viable. But community-centric companies are transparent and solve problems with their communities because they know that the users are the ones best able to help them identify and address their core problems.

It’s essential that the Stack Exchange community and company have a mutual understanding of the opportunities and challenges that lay ahead.

I totally agree. I just doubt that the company as a whole actually understands this or that it adequately values the people who make up the community as something more than the content they create.