Avoiding half-baked experiments: Setting clear standards for experiments and rollouts, beyond “Minimum”

Question

N.B. Please note that while this post talks about MVPs (which we may or may not agree about their scope and definition), it is mostly concerned with experimental features (whether a graduated product or not). Discussing “viability” is part of the issue, but I did put out a list of questions, which are more detailed and less controversial. Please don't get hung up on MVP and its definition, but rather try to address the "Concrete Questions".

^{This is inspired by this post/comment on MSO.}

Over the past few years, we’ve seen a pattern: new features and experiments ship that lack core functionality, only to be quietly retracted or left to languish. That cycle undermines trust in every future experiment. I’d like to start a discussion to reach a clearer view of what a Minimum Viable Product (MVP)¹ should include, and what guardrails we need to protect existing workflows.

Why this matters

Documentation’s rocky launch
Documentation was introduced back in 2016 as a collaborative “how-to” library, amassed thousands of examples—but failed in under a year. Key issues: no full-text search, limited moderation tools, and UI gaps that hid good content. Volunteers invested time that ultimately went to waste, and the very idea of crowd-authored tutorials was tainted.
Discussions without the basics
Discussions was built to host opinion-based, architecture, or experience-driven questions. It was also launched as an alternative to Chat to address issues such as sub-optimal search. Yet, Discussions came without search, effective tagging, moderation tools, or downvotes (they were removed at a later date). Months later, it still feels like an unfiltered chat room (ironically with less features than Chat)—hardly the “forum” it was branded to be.
Comment experiment
The latest example is February 2025’s comment experiment, a feature to ask “follow-up questions” was added that left even veteran users confused, broke moderation and review tools, etc. The UI hiccups that ensued made me, and many others, doubt whether any comment-system change could ever work.

Each of these experiments shipped without essentials, and the community, at least in part, ended up blaming the idea, while in reality the implementation was the main culprit.² This makes it harder to revisit promising concepts later.

Concrete questions for the community

Core functionality
- What minimal feature set must be in place before an experiment goes live?
- Is there a general “checklist” that we can rely on in order to decide whether a product is an MVP or not?
Non-destructive rollouts
- Should experiments ever disable existing capabilities?
- If a disruptive change is unavoidable, how should it be communicated, measured, and potentially reversed?
Timing of community consultation
- At what development stage should product teams engage experienced users or moderators?³
- Which venues—Meta posts, dedicated chat rooms, beta programs—are most effective for early feedback? (For instance, some of the changes to the UI, can be distributed as userscripts to be tested by community members before they get rolled out.)⁴
Measuring harm vs. value
- Beyond raw usage metrics, how do we detect when a feature is causing more problems than it solves?
- What process should exist for swift rollback or remediation if an experiment degrades the experience?

“Small tweaks to Discussion or to comments are not going to achieve ambitious goals.”

That insight applies not just to high-level strategy but also to how we deliver new features. If we can agree on a clear, enforceable definition of “viable” and a shared rollback plan, future experiments won’t feel like blind swings.

What examples of “just right” MVPs have you seen on Stack Exchange? Which launches clearly missed the mark—and what lessons can we carry forward? How can we establish solid guardrails so that the next big idea lands ready for success?

_{1. Also see this article about Minimum Viable Products/Features (MVPs/MVFs), specifically: "A minimum viable product (MVP) is often mistaken as the first general release of a product, the initial offering that is good enough to address the early market. But for most products, an MVP should be a much earlier and cruder version that acts as a learning device—a means to test a crucial assumption and make the right product decision".}

_{2. I am not claiming that there are no bad ideas/features; just saying that at least some of the experiments could have been successful if they were conducted properly.}

_{3. I have previously touched on this matter in my answer to What can be cut away, and why?.}

_{4. Credit goes to Kevin B, but I cannot find the message.}

Are there maybe also positive examples, from which we could learn? I mean experiments that where not half baked and were concerned with a somewhat important functionality of the site? We could contrast them with the half-baked experiments. — NoDataDumpNoContribution
– NoDataDumpNoContribution, Commented May 6, 2025 at 9:12
I'd say the chat sidebar would count, and potentially chat updates depending on how ot goes. — Journeyman Geek
– Journeyman Geek, Commented May 6, 2025 at 9:23
@NoDataDumpNoContribution Yes, there are. I did ask What examples of “just right” MVPs have you seen on Stack Exchange? — M--
– M--, Commented May 6, 2025 at 14:46
Thanks for putting this together! There’s a small area I’d like to poke at: the distinction between experiments to learn (e.g. testing a hypothesis, likely not using production ready code) and actual MVPs (e.g. a minimal version of a feature we intend to fully launch and ideally fully develop, though, of course, follow-through can sometimes be hit or miss). I don’t think experiments and MVPs are the same thing, and they likely require different rubrics for assessment. — EmmaBee
– EmmaBee Staff, Commented May 7, 2025 at 0:33
What I consider MVPs: Discussions, Staging Ground (goal to develop) What I consider experiments: Comments, 1-Rep to Vote, Answer Bot (goal to learn) — EmmaBee
– EmmaBee Staff, Commented May 7, 2025 at 0:44
That said, I understand the concern that experiments to learn might eventually ‘graduate’ into MVPs before they’ve been fully validated or refined, which is all the more reason why it might be useful to evaluate these as separate but potentially related processes. Ideally, a process should allow a faster track for experiments (since these might get thrown away), and more time spent considering and refining MVPs (I prefer MLPs, personally) before they’re launched (and plans for follow through). IMO. — EmmaBee
– EmmaBee Staff, Commented May 7, 2025 at 0:48
@EmmaBee I made that distinction. Wherever I used product or feature or experiment, instead of MVP, I had that distinction in mind. That said, even an experiment should be viable to some extent; not because it might graduate some day, but because otherwise you cannot rely on the results to make decisions (This makes it harder to revisit promising concepts later). For example, results that were shared after the comment experiment were vague, inconclusive, and if I may, not even remotely scientific. — M--
– M--, Commented May 7, 2025 at 1:46
@M-- Ah, I read this as using the terms experiment and MVP more or less interchangeably. Could just be me, but might help others if you include a sentence or two that these are distinct and likely need distinct consideration? I also agree with this: "experiment should be viable to some extent", and that the definition of viable might be different for experiments and MVPs. — EmmaBee
– EmmaBee Staff, Commented May 7, 2025 at 1:57
Ah, yes. MVP is one of those words that has a lot of meaning and used a bit interchangeably, and with Documentation in the mix on your list (a fully launched product), to me that is something that graduated out of true "experiment mode", (but we should also be learning post-release, so the boundaries between are a bit fuzzy) so it seemed like maybe you are in the camp of "MVP = first general release of a product". But I think I see now where you are coming from. Thanks for clarifying! — EmmaBee
– EmmaBee Staff, Commented May 7, 2025 at 2:29
I like the breakdown of minimum "testable", "usable" and "lovable" products (avoiding the word "viable" as it could mean each of those things). All an experiment has to do is validate or invalidate a hypothesis. A fake door or pretotype can do that even without core functionality. — Connell
– Connell StaffMod, Commented May 7, 2025 at 9:35
@M-- well I agree with others that there'll be different definitions for different things, but in the case of experiments (e.g. comments), we're talking about the MTP, then the only real requirement is that it validates a hypothesis. Plus, yes, should be included in the opt-out switch — Connell
– Connell StaffMod, Commented May 7, 2025 at 15:27
@M-- I partly agree, we can learn from that example how to better ensure the hypothesis is validated. I use that example because it is one I worked on myself recently. More generally though then, do you think there are any other criteria for an MTP other than validating a hypothesis and allowing users to opt out? — Connell
– Connell StaffMod, Commented May 7, 2025 at 15:48
@M-- generally I'm in favour of earlier user feedback. Perhaps counterintuitively, that also means releasing early versions to gather user data. But yes I think more qualitative feedback from the community at early stages would be great (as long as it's not just added process that delays releases and therefore delays the feedback we can get through experimentation) — Connell
– Connell StaffMod, Commented May 7, 2025 at 16:37
@M-- when you refer to experiments that "lack core functionality" or "shipped without essentials", I am reading that as a minimum "usable" product (which may have been the confusion earlier). What core functionality or essentials would you put on a checklist for a minimum "testable" product? — Connell
– Connell StaffMod, Commented May 7, 2025 at 16:47

cocomac · Accepted Answer · 2025-05-06 03:44:48Z

These are good questions, but the answer will vary for each MVP (== minimum viable product). I completely agree that answers to these questions should be determined, or at least considered, for an MVP. Yet, each experiment is likely to yield different answers.

I'm going to largely focus on how we reach those answers, rather than what the answers are. To some extent, this answers #3 - community consultation, but focused on collaborating on a MVP definition. Phrased differently, "How can Stack collaborate with the community to answer these questions before starting development of an MVP, or in the early development stages?" Crucially, this aims to proactively spot areas of improvement for an MVP before problems arise.

MVPs have a scope defining what does and doesn't need to be included to release a product/feature (either to the general public or external testers, e.g., an opt-in group).

In theory, this is good - for teams, it provides milestones. For users (incl. beta testers), it communicates what to (and to not) expect.

One common complaint is "why isn't X done?". This is often a consequence of mismatching expectations, and can often be reframed as "I think X is necessary for <product> to meet my needs". Some of these X's are minor, like this - nice to have, but by no means essential. Others are more critical, and if ignored, risk disfunctional releases.

Ultimately, these have a common root cause - lack of consensus between internal people (Stack employees) and external people (random users, Meta folks, stakeholder groups, etc.).

This lack of consensus is visible - here's a handy example in the context of Discussions:

[...] We, the community and this project [Charcoal], have previously given our feedback as to what's needed. The company has repeatedly decided to move forward without what I'd consider to be the bare minimum. They've, to a limited extent, seen that it can get bad, but haven't been willing to do what we consider to be the minimum. So be it.

_{Chat message by Makyen from CHQ; quote adjusted for clarity.}

This is by no means the only example, and this isn't unique to Discussions. While I won't go into it on Meta, here's another example.

Adddressing the lack of consensus

The key idea above is that a lack of clarity and agreement for what's necessary in any given MVP leads to conflict and issues. This is both a people issue - folks (both from Meta and from Stack) can feel disrespected/ignored, and a product issue - something not working.

An idea: public-facing MVP proposals

This builds on my response to the What's on your mind? question. A major focus of my response there was improving communication between the community and staff members (esp. ones who don't regularly use Meta). That sentiment - that genuine, kind, and constructive conversations really matter - is relevant here.

The next time a significant (in terms of scope) feature is coming, why not publish a brief MVP proposal and ask for feedback specifically on the MVP plan before releasing the feature itself?

Not if the community likes the actual feature, but for suggestions/advice on the path to implementing it. This wouldn't need to be a big featured MSO/MSE post, creating a chatroom (or using The Meta Room and/or Tavern on the Meta) to get some suggestions could work.

Using something like chat can be significantly more approachable - you're not going to get tons of people downvoting your proposal, but rather (hopefully) a real-ish-time conversation gathering feedback to help make it a success.

...answer will vary for each MVP. Yes, honestly the outcome I hope to see coming out of this is for the product team to ask these question every time they have an idea. — M--
– M--, Commented May 6, 2025 at 6:00
I'd be particularly in favor of a dedicated chatroom for each new experiment. That way, the community can provide input on the MVP as it's in development rather than feedback coming in large waves after each announcement. — Anerdw
– Anerdw, Commented May 6, 2025 at 18:33

cocomac · Accepted Answer · 2025-05-06 03:53:07Z

13

Curation

cocomac's answer is right in general, but there's one absolute must for every MVP, no matter what the experiment is trying to test: No. More. Spam.

Stack Overflow's Discussions beta had a really bad spam problem. It got so bad that A proposal to stop curating/moderating Discussions until it's "fixed" eventually reached a net score of +135. At one point, I was afraid to curate Discussions in public in case someone was looking over my shoulder. There were several factors involved in the experiment's general state, but most of them fell under the umbrella of poor curation. No API endpoint for SmokeDetector, poor moderator tooling, no reputation requirements, nothing. This led to a deluge of spam, which routinely lingered for days or weeks.

Discussions was a particularly bad case, but curation as a guiding principle needs to apply to every experiment. That doesn't mean curation as strict as what we have on main Stack Exchange sites, but there must be a way to check spam waves or low-quality content floods regardless of the specific experiment. Appropriate curation and/or moderation should be a prerequisite for any MVP. If the "minimum viable product" is highly susceptible to spam waves, it's not really viable at all.

edited May 6, 2025 at 3:53

cocomac

21.9k7 gold badges50 silver badges115 bronze badges

answered May 6, 2025 at 3:51

Anerdw

4,6843 gold badges10 silver badges45 bronze badges

Is your idea of "curation" a expected to be a compulsory (non-Q&A) activity for SE users?

guest271314
– guest271314

2025-05-07 23:51:50 +00:00
Commented May 7, 2025 at 23:51
2

@guest271314 No. Just the experiment's analogue of votes/flags/VTC/VTD. The specifics will vary based on the experiment, but there needs to be a way to filter out harmful or abuse content.

Anerdw
– Anerdw

2025-05-07 23:56:15 +00:00
Commented May 7, 2025 at 23:56
All data I decide to consume or produce on the Interwebs is just data to me. I do the filtering about what is true and correct based on my own intellect, and evidence. I don't think a computer program can be designed or implemented that can filter content for multiple definitions of "harmful" or "abuse" in general. Those are subjective, opinionated perceptions.

guest271314
– guest271314

2025-05-07 23:59:04 +00:00
Commented May 7, 2025 at 23:59
1

@guest271314 This wouldn’t be a program. It would be volunteer work, just like most Stack Exchange curation. Read through the Help Center to see how curation works on SE sites.

Anerdw
– Anerdw

2025-05-08 00:49:01 +00:00
Commented May 8, 2025 at 0:49
"Read through the Help Center to see how curation works on SE sites." If you search on the Help Center for "curation" there are zero results. The idea of "curation" on SE sites is a slang terms used by some users.

guest271314
– guest271314

2025-06-14 15:14:18 +00:00
Commented Jun 14, 2025 at 15:14
@guest271314 I didn't literally mean search for the word "curation"? The post flagging page is probably a good place to start.

Anerdw
– Anerdw

2025-06-14 20:56:59 +00:00
Commented Jun 14, 2025 at 20:56
The term "curation" is not on that page, either. The term "curation" or "curate' as you use it is third-party lingo without any definitive meaning; that is, basically just mere gossip. Should the term "curation" be flagged as propagandizing vague third-party gossip every time it's used on these boards? AFAICT no user is under any obligation to "flag" any content whatsoever.

guest271314
– guest271314

2025-06-14 21:00:40 +00:00
Commented Jun 14, 2025 at 21:00
@guest271314 Again, yes. The term “curation” is not literally on the help pages. And yes, nobody is under obligation to flag. We do it voluntarily because we don’t like seeing spam or garbage on the site. Any MVP should also make it possible to - voluntarily - limit the quantity of spam and garbage.

Anerdw
– Anerdw

2025-06-14 23:28:24 +00:00
Commented Jun 14, 2025 at 23:28
The idea of "curation", to me, is garbage. There is no "we". There are individual users who sign in to SE Web sites using their individual user accounts. You don't like the content, so what? It's just data. There's a lot a raw data that you might not like, that doesn't mean the data needs to be disappeared. There's a lot of legislation and case law on the public record that some might find objectionable that can't be wished away to the corn field. Then what?

guest271314
– guest271314

2025-06-14 23:29:48 +00:00
Commented Jun 14, 2025 at 23:29

Add a comment |

VLAZ · Accepted Answer · 2025-05-06 10:54:09Z

Slightly harsh viewpoint? Folks are chasing things that may not exist, and there's a certain degree of corporate Attention Deficit.

Part 1: A long time ago... in a Q&A network surprisingly near

The 'Core' products of SE had very clear goals, often borne of community requests. Its worth looking back at the three core components of the network

For SO proper before the trilogy

In the words of Jeff

There’s far too much great programming information trapped in forums, buried in online help, or hidden away in books that nobody buys any more. We’d like to unlock all that. Let’s create something that makes it easy to participate, and put it online in a form that is trivially easy to find.

Its surprising that the old rivalry continues , but stack oveflow was also described as

Stackoverflow is sort of like the anti-experts-exchange (minus the nausea-inducing sleaze and quasi-legal search engine gaming) meets wikipedia meets programming reddit.

Meta was grudgingly instituted, and sort of was built initially by community. While some aspects of that initial vision did not make it through, there was a clear idea of what meta was.

And of course chat.

With all these - SE identified a problem their target community had, looked at the solutions they had, and polished them. In many cases, the community already had their own paths laid out - whether it was the pre-meta forums, or IRC/Basecamp for chat.

Part 2 Warlords and waffles

Now, many of the failed projects were attempts to write further chapters for their own sake. Now, great people worked on many of these, and with my focus generally on the rest of the network, I might get some things wrong. Feel free to correct me.

For Documentation, the foundational document that describes it seems to be this MSO post - Warlords of Documentation.

We're starting off with a somewhat abstract problem - documentation sucks. We have an examination of why and a possible solution. That said, to an extent SO didn't take an existing solution, or formalised an informal solution, they were trying to build something new and the clarity of purpose or direction wasn't there. It was also very much SO/dev focused rather than being a product that broadly stood on its own.

As an established platform I guess more than "what new thing do we do?" the question we might want to ask is "What do our users want and how do we delight them?".

I use SE as documentation so - the other question would be "what can we do better than anyone else?"

We still do Q&A and it feels like putting that on the back burner to chase other things was and is a mistake.

Part 3: Discussions - the chat we have at home?

Before the sudden and welcome recollection we have a fairly robust chat system, Discussions was pushed as the replacement for Chat.

In a sense it was a way to try to mitigate the somewhat strict and possibly arcane seeming rules that grew organically around the main Q&A site but it is a very strange fit for a site, and network that prided itself on being low noise and the exact opposite of forums.

Unlike the other main components of the network, and maybe a little like documentation, it didn't formalise an existing informal tool, nor was it aimed at doing something 'better'.

I also felt Docs had a clearer idea of what they wanted to build.

Part 4: The Comments redux - the return of the discussion board

To me, it feels like the 'new' comment formula is trying to turn the main site - designed to showcase questions and answers into more of a traditional discussion board. There was a proposal some time back for renaming comments that was fairly popular and got lost in the shuffle. There's something there I feel needs extra emphasis in context

The basic building blocks of every SE site are the QUESTIONS and the ANSWERS. They are clearly portrayed as such, so if anyone puts "something else" in that space, it's such an obvious UI gaff, folks have no problem spotting it and fixing it — we have nearly 100% compliance.

For a UI change, the 'new' comments design... elevates a comment to the same level as the question or answer. I like parts of it but I'm not sure what it solves for an existing user (who is used to the existing UI) or a new user who is lead to believe a comment is equal to the question.

Part 6: We need to know where the river meets the sea

I have often wondered that if these products succeeded, what's the intended end goal. Sometimes it feels like the approach is to build things and hope they work, rather than how it'll integrate with existing products and communities. We might be a little too old for the excitement of testing in production, especially if it's a drastic change.

In very recent memory, the chat link on the sidebar was a 'small' change that went brilliantly, and I'm looking forward to the chat updates but these are new coats of paint on a existing, established product that's looking a little shabby. In a sense, its also somewhat community led too. Its almost like how things were isn't it?

Part 7: Who are we building for?

A fairly common argument for new functionality is to attract new users. To quote an old acquaintance, "It is good and virtuous". However, oftentimes, it feels like the 'new users' are the sort of folks who don't want to be here - they want something different. I'm all for new blood - but folks who tell others "SO is toxic cause I have bad experiences" or worse - people who believe that blindly seem the wrong people to design for. If we're making fundamental changes in design, or new products, it should be keeping the community we have now and sort of folks who contributed in the past in mind, as well as the people who'd benefit from being here. One would hope I'd not need to break the distracted boyfriend meme again.

We need to build for the future, sure but also, with significant periods of upheavals in the community - and both folks who work in the community and company being in flux, we also need to build the present and remember the past.

_{If you're looking for waffles mentioned in part 2 🧇🧇🧇}

IMO, your answer misses an important aspect of Stack agenda, i.e. monetary motivations. It'd have been great if the public platform was a non-profit (assuming it'd have access to sufficient resources, or at least as much as it does today). But it is not. Hence, the focus on maximizing "satisfaction" for customers, a.k.a. users. Does every decision/feature implemented in the name of that ultimate goal result in more profit and satisfied customers? No! But I personally accepted that company will make decisions to that end, and all I can do is to advocate for less harm and occasional improvements — M--
– M--, Commented May 6, 2025 at 17:22
I felt the answer was longer than I'd normally post, and honestly least for now, tying feature development directly to profitability might mean nothing gets done. — Journeyman Geek
– Journeyman Geek, Commented May 7, 2025 at 0:35

ColleenV · Accepted Answer · 2025-05-06 15:01:11Z

Frankly, I hate the whole concept of minimum viable product. I don’t go into a project thinking what is the bare minimum I can do and still succeed. You start with a time period or some other constraint and think ‘What is the maximum I can deliver within these constraints?” The term ‘minimum viable product’ frames prioritizing core features in a way that makes a team focus on need vs want, which doesn’t leave a lot of room to delight your customers.

So, given that, I recommend setting a maximum development effort for an incremental step in a project and a checkpoint after each step to evaluate what was discovered and correct course.

Here’s our team's "new feature" workflow in a nutshell:

Time-boxed investigation to unearth technical challenges, gather information, discover prior work that can be leveraged etc. and then propose different options to move forward with different trade-offs
Present the results of the investigation to stakeholders and reach a consensus on which path forward to take, which might be more investigation, implementing something, or tabling the idea.
Get the team together and break the proposal down into steps and start story pointing them. This is basic Agile-style epic refinement. Our release cycle is 90 days, so we figure out the most functionality we can deliver as a team in that time period toward the goal.
Design the implementation and present it to stakeholders, clarifying requirements along the way.
Implement and release. Gather feedback from users on how it works and adjust the next release based on what we discover are the highest priority needs/pain points. Often our customer will want a different feature rather than more refinement on the one we just delivered.

We never plan the entire new feature in detail. We focus on getting the core feature with no embellishment to the users and let them work with it. As soon as they start using something, their priorities and requirements change.

Our team has struggled through a similarly soured relationship between us and our primary customer. The answer is, unintuitively, loosening the reins instead of laying down more constraints and, more obviously, more frequent and transparent communication.

(I may elaborate on this more later. I'm out of time now)

This sounds great. I love to get the best that they can deliver. But given Stack resources (and different directions/products/initiatives that are pursued) is not realistic. I'd love to get an absolutely polished feature even for experiments, yet it won't happen. — M--
– M--, Commented May 6, 2025 at 14:43
@M-- I understand why you're skeptical, but you don't know what the team is actually capable of once given the space to work. You don't have any idea of their actual constraints. So many times after we present investigation results, our customer says Oh, that's not how I thought it worked at all. We imagine how we would do things differently but we have no context. The best I can offer is the workflow that helped fix our team's relationship with our primary customer. — ColleenV
– ColleenV, Commented May 6, 2025 at 14:55

M-- · Accepted Answer · 2025-05-10 06:10:33Z

It's a complex topic. For example, I would be inclined to say that the company did not nearly enough experiments between 2010 and 2024, others may think there were way too many. If you value the Q&A model above all else, you probably don't want any experiments at all and will set the bar very high. I want to see the knowledge collection model adapted to a changing environment and that's why I would set the bar lower.

If we assume that at least some level of experimentation is desired, we should be willing to accept some disturbances during experiments and some additional cleanup effort afterwards. It will not always be avoidable. Disturbances of the regular operation should be reduced though to a minimum and should be communicated in a way where it's clear why it's needed. There should already be a plan for how to recover and clean up afterwards and this step should then be executed.

^{Example: Recent answer bot answers are not included in the data dump (good), have unclear content licenses (bad), and seem to remain on the sites even though the experiment failed (bad). It would have been better if all impacts of this experiment would have been cleaned up.}

Failed experiments should clean up all their impacts as much as possible.

^{Example: For 1-rep-vote experiment (which wasn't conducted but could have), one could have recorded votes from 1-rep users for a short time only (say 2 weeks), analyze their impact, then try to undo their impact as much as possible (vote reversal is available and done kind of regularly). If some residual effects (badges, caps) remained, it would be okay in my eyes; for science.}

Some disruption is a reasonable price to pay for a gain in knowledge.

Yes, experiments should sometimes be able to disable existing capabilities. Otherwise you would be severely limited in the scope of experiments that can be conducted and only additional capabilities could be experimented with. We could never try out to simplify anything for example.

From the past, I would say that the whole community, not only selected members, should be consulted during the planning phase of an experiment.

^{Example: Before the beta rollout of collectives and articles, selected members of the community were consulted. The feature wasn't very well received, wasn't very successful and slowly died in the years following. It seems a bit as if only consulting a few members, doesn't give enough significant input. It's better to gather as much feedback as possible.}

What's striking to me though is the half-bakedness of it all. Collectives or Discussions received very little changes after the initial rollout. It's almost like the company didn't want the features to succeed which looks a bit like a waste of resources.

Multiple rounds of experiments on an single subject might be necessary in order to make it a valuable feature. The company should somehow develop more stamina in that regard and try to polish planned features more before deciding if they are worth it. Maybe it needs a vision that goes beyond more immediate engagement. Where does the company see the platform in 6-8 years?

^{Example: The Staging ground worked when introduced, but took multiple rounds (although I just remember that the company gave up on it in the middle of the first round). The trending sort order also took multiple rounds of forth-and-back with the community (although it seems to not be available outside of SO, why not?). The unfriendly comments robot had at least two versions/iterations (although it's not in use anymore apparently, why not?).}

All in all a mixed picture there. Following through on their plans and optimizing new features seem to be weak points.

I think that meta Q&As are already quite effective for communication but maybe there are better ways to structure discussions about experiments. Maybe dedicated "folders" for experiments, where all Q&A related to a specific experiment are kept together? Could be realized with the tag system or something else. But in general, I'm happy with the existing framework.

Now the difficult part: minimal feature sets and success metrics.

I think each good/insightful experiment (doesn't need to be successful to be useful) should have a very, very clear problem and solution description. What is the problem, why is it so bad and how can we improve on it? It should also include a discussion of potential downsides/risks. What are the downsides? It must include a very clear metric or set of metrics that can decide on the success of the experiment. An A/B test is often preferred unless impossible to do and a discussion of and control for the novelty effect should also take place.

^{Unfortunately, often that is not done. I frequently wonder what the problem is supposed to be (example: "some members of the communities are looking for new ways to engage" might be too vague, what are they looking for?) or what the used terms mean (example: "people want to ask follow-up questions" - but what is a follow-up question and how is it different from a normal question, why can't it be explained in the text?) or what the metric will be (example: "we are looking for more engagement" - is needing to click one additional time more engagement?) or the novelty effect is not controlled (example: drawing a circle around voting triangles allegedly improves voting by a whopping 97%, but does it really?).}

There is much room to improve.

I have a hard time trying to come up with general applicable guidelines, but I think I can tell you when an experiment might work when I see it. So including the community in all stages might be a good idea and try to be as clear as possible with problem and solution descriptions.

As for the size, I guess experiments can come in all sizes (1-rep-voting would be a very small change to the software), but it makes sense to go in smaller steps and check back often, but of course sometimes you have to make a larger jump if there aren't any reasonable intermediate steps available.

And we should be prepared to see experiments failing frequently. There are probably many more ways that you can do things wrong than right, but that still doesn't mean that the current state is the best possible or even close to that. Failed experiments should not result in doubts about possible system-wide change or taint ideas. One can fail, and try again, and succeed the next time.

^{And a final example: Dedicated thank you feature. 1/6 of all comments are supposed to be thank yous, we could get rid of these somehow. The company thought that a dedicated thank you button that otherwise is useless would be a good idea for that. Downside would be competition with the upvote button. After one recent experiment (that did a couple of things at the same time, which would typically be bad practice) it concluded that there is a 10% reduction in thank you comments (which isn't much) and will create the button. The metric is fine, the feature set is questionable (a button that otherwise doesn't do anything) but compete; yet the problem is not solved while there would be much better approaches (e.g. AI assisted thank you comment detection with a note to upvote instead and an automatic scheduled comment deletion after X days, which would surely find more than 10% of thank you comments and additionally would not compete with upvotes but support them especially considering the technology is quite simple nowadays). The community advocated for this. If a far suboptimal solution is also seen as fail, then this experiment failed very early, when possible solutions were sought. Everything afterwards couldn't correct for the initial conceptional problem. Some might say that this experiment wasn't really necessary in this form.}

"The unfriendly comments robot had at least two versions/iterations (although it's not in use anymore apparently, why not?)." FWIW, I think it's still operational. I've seen mods complain that it almost never catches anything unfriendly. But does catch other stuff that probably needs to be removed. — VLAZ
– VLAZ, Commented May 7, 2025 at 11:34
@VLAZ Okay. Last time I asked, I heard that it's probably not working anymore. Maybe it would also need an update. — NoDataDumpNoContribution
– NoDataDumpNoContribution, Commented May 7, 2025 at 11:40

Stack Exchange Network

Avoiding half-baked experiments: Setting clear standards for experiments and rollouts, beyond “Minimum”

Why this matters

Concrete questions for the community

5 Answers 5

Adddressing the lack of consensus

An idea: public-facing MVP proposals

Curation

Part 1: A long time ago... in a Q&A network surprisingly near

Part 2 Warlords and waffles

Part 3: Discussions - the chat we have at home?

Part 4: The Comments redux - the return of the discussion board

Part 6: We need to know where the river meets the sea

Part 7: Who are we building for?

You must log in to answer this question.

Linked

Hot Network Questions

Avoiding half-baked experiments: Setting clear standards for experiments and rollouts, beyond “Minimum”

Why this matters

Concrete questions for the community

5 Answers 5

Adddressing the lack of consensus

An idea: public-facing MVP proposals

Curation

Part 1: A long time ago... in a Q&A network surprisingly near

Part 2 Warlords and waffles

Part 3: Discussions - the chat we have at home?

Part 4: The Comments redux - the return of the discussion board

Part 6: We need to know where the river meets the sea

Part 7: Who are we building for?

You must log in to answer this question.

Linked

Related

Hot Network Questions