The marketing blog post series linked in the question starts with
If you’re weary of reading about the latest chatbot innovations and the nine ways AI will change your daily life next year, this series of posts may be for you.
Perhaps if the rest of the series and the company's policy about AI was so self-aware, I'd be a little more hopeful about this.
It carries on to say
Consequently, in the current transformation, human-centered sources of knowledge are obscured. We face a world in which the old paradigms are no longer paramount, and their places in the world are redefined.
Now, this post has my name on it. As it's Creative Commons licensed, someone who wants to reuse this content and wants to do this the right way would attribute this to my username - not just the site it was on. Often times, especially in a smaller internet community like this, the source and providence matters as much as the content.
Essentially, LLM providers act parasitically, and almost voraciously, sucking up resources and content, with very little consideration for the commons, public good or the communities they take content from, and somehow I don't feel normalising or glorifying this is for the best.
Companies and organizations lucky enough to host these engines of knowledge production are at a decision point; how do they continue to monetize when the technological landscapes have changed?
Perhaps the reason SE's weathered many storms is luck, but this site and the knowledge within exist because of a solid foundation set up years ago, significant amounts of experience, curiosity and knowledge held by its users.
A blind focus on monetising doesn't always end in success. SE's commercial products are almost always going to rely on the community as its driving force - whether it was Careers 1.0 or Teams. Yet very often it feels like SE's struggling to find their *AAS while the communities' needs and desires get pushed aside on the promise of 'we'll look at it when we have the resources'.
If you believe in luck, buy a lottery ticket. A lot of work went into what SE is today.
Knowledge-as-a-service: The future of community business models
I'll zoom in on the 'new' problems you've cited. The beauty here is...
..none of those are our problems.
Let me pick these apart:
"Answers are not knowledge": having a pool of subject matter experts and enthusiasts rather than what's essentially a black box that puts worth together is why our answers contain knowledge. It's not just that LLMs lack context, they lack imagination, intellectual curiosity and the ability to reason and make connections.
"The LLM Brain Drain" assumes that there's any actual intelligence there.
A healthy community challenges itself and learns through reinforcement. We find 'real' problems we face; we share, not take; we have the freedom and intellectual curiosity to try to solve and resolve our own problems. We get nerd sniped.
Rather ironically, the strict rules of the network, aggressive meritocracy and quality standards is why we're a good source of information, but also why people complain about us.
Our 'community brain drain' has different causes, and these are probably more critical for ongoing survival of our communities.
- Developers lack trust in AI tools
Well yes. So do many members of the community, and AI's a controversial topic here. There's a good reason for this. AI and the organisations that promote them often haven't proven trustworthy. I feel like the focus on AI being a substitute for the network leaves out something. For people who don't want an AI tool - and want trusted, human-driven answers and feedback - we literally have a full suite of tools.
I'd finish my critique of this post with this:
Stack Overflow and the larger Stack Exchange community need to be direct about our needs
We have always been direct about our needs. They're often ignored, or we get promised it'll get taken care of. The day we're not direct about what we need, y'all can probably shut down the network. Personally, I'm not opposed to LLM companies paying SE. I'm opposed to LLMs being the one basket the company puts all their eggs in. I'm convinced the bubble will burst and the GenAI industry isn't really a sustainable, long-term future.
If the company, which we've worked with for over a decade, isn't listening, how can we expect LLM companies to?
Attribution as the foundation of developer trust
For all the criticism SE gets - it is rarely that our knowledge is untrustworthy. That some folks find us intimidating, and on-boarding is hard? Sure. That things get outdated? Absolutely. But our answers are posted, refined, and occasionally questioned.
It seems very strange that as developers - the core audience of this network - distrust GenAI, a handful of organisations decides this absolutely must be the future - and wishes to force-feed the future. It never ends well for the goose.
I'm all for getting a fair deal for the community. It's just the strange love affair with GenAI despite the push back that gets me. There seem to be lots of attempts to 'sneak' in GenAI 'features' we don't need, or changes to the social contract around that sort of thing. And yet your own surveys are telling you developers don't trust GenAI.
Ongoing community data protection
Some of the assertions made here seem... charmingly optimistic. LLM providers have been shown to ignore robots.txt files, training on pirated material.
The blog post backs some of this up:
In the last year, Stack has seen numerous data points that suggest LLM providers have escalated their methods for procuring community content for commercial use. In the last few months, non-partners have begun posing as cloud providers to hack into community sites, which led us to take steps to block hosted commercial applications that do so without attributing community content.
On the other hand, I'm a little skeptical about the following part of the paragraph:
At the same time, these strategies will help turn potential violators into trusted customers and partners by re-directing them to mutually beneficial pathways for all parties. (This also serves as a reminder for users of all tools and services to pay close attention to terms, conditions, and policies to know what you agree to.)
The tricky part is really not treating your users like they're going to steal their own data.
Many of the choices made to 'safeguard' or add guide-rails to data access either ended up eroding community trust, resulting in friction for community-run tools, or resulted in worse quality-of-life for users.
For example, academic institutions wishing to use data for research purposes or communities looking to guard their collective work against unexpected systemic failure should not have their legitimate activities restricted.
Practically and rather ironically, it took the community a matter of days to build a tool that downloaded the individual site data dumps. There's supposed to be a way to request a full data dump the correct way, but as of the date of posting I'm still waiting on my request to be fulfilled.
I choose to believe the intent is there, but if a community member with somewhat deeper reserves of patience and a direct-ish line to staff can't get a data dump legitimately in a reasonable period of time, one would hope an academic institution requesting data could get it before their undergrads become professors. The tools for access should be built with the tools for restrictions. Those activities pretty much are restricted.
Let's be very honest - this is not about protecting data for the community. It's about protecting potential revenue. I've not seen as much movement in dealing with the community's needs as I'd like. I'm disappointed, but I know others are furious.
"Benefits for all"
Some really neat technology has been a solution looking for a problem. 3D TVs, the 'metaverse', blockchain and now LLMs.
And yet, here I am, on a 2D screen, on a text-based platform that runs on a traditional web platform.
The thing is, knowledge survives because people want it to. I often use SuperUser as a way to collect/store things I've learnt over the years. If it died, I'd probably grieve a little, and find somewhere else to do these things. Maybe post stuff on my blog, or some other site.
I'm going to follow on with something Jeremy quoted from the question in the comments:
"An existential threat looms over the world of human knowledge preservation"
Life finds a way. There's a certain level of hubris to assume humans don't preserve knowledge without the help of large platforms, quite the contrary. Large organisations often lack the focus or platform to preserve knowledge. The BBC destroyed many old tapes, and a lot of old content only survives cause an individual taped it. Humans are hoarders. We pass down stories, teach skills, read, and write.
People will write, share knowledge, get excited about the ugly, cable-tied personal project when it first whirs to life. And if LLMs are such a threat, I'm not entirely sure how becoming part of the problem by appeasement helps with long-term community health.
I'm reminded of one of my favourite works of J. Michael Straczynski, a poster he did called Never Surrender Dreams - some excerpts of the full work feel relevant to me personally:
Children sing and dance spontaneously, tell stories without fear, reveal their thoughts without inhibition, and reach for what logic tells us should be unattainable. We do, we explore, we ask questions; we pursue our heart's desires, we dream of achieving greatness.
We're entirely capable of preserving our stories and thoughts.
And, well, this is what makes SE a great place. And yet, we're told that putting all those things into a black box is essential to preserve knowledge.
But as time passes, we learn fear, we learn to second-guess ourselves, and we learn to suspect our abilities and our desires. We are told that some
people tell stories, some people dance, and some people sing, but these things are not for everyone.
And well, perhaps these things are for no-one. ChatGPT won't judge you, will it? And yet it feels like the the 'story' here is that places like this, communities of people like us, aren't going to survive LLMs.