Changes to reporting for the [status-review] escalation process

Question

Stack Exchange has recently been working on improvements to the process by which you escalate issues to our attention and by which we are transparent about what we work on in kind. Our transparency reporting process is described in this post, but it is long overdue for critical changes. Importantly, we are rethinking what metrics we’ve been reporting out.

You’ve probably already seen some of the efforts meant to address the growing backlog that have been taking place in this past year. These include the Community Asks Sprint initiative, Slate’s post clarifying tag usage, and our recent rekindling of a bug duty rotation for Public Q&A (ok, that last one maybe you’ve only noticed if you’re looking very closely, ‘cause there is no accompanying Meta communication). In tandem with those, we’ve been working on a new way to report how we’re faring in responding to your requests. So, without further ado...

Introducing a new metric for measuring backlog health: P80s

The new metric we’ll be using internally to measure how we’re performing and to help surface issues that need attention is the “P80 metric.” P80 expresses the idea that “80% of all work items should be handled within X weeks,” and that most items should not sit indefinitely without a resolution. That X will vary for each of the three process tags (as defined on Slate’s post clarifying tag usage). For instance, to reflect the belief that we should triage issues in a timely manner even if it then takes us some time to reach an outcome for them, status-review has a shorter P80 target than the other two process tags.

Given the age of some of the issues currently in our backlog, it’ll take a few quarters before we meet what we consider to be reasonable thresholds for each of these tags. This means our initial goals won’t be to meet the long-term targets, but rather, the time will be spent getting the backlog under control. Our primary aim is to make sure it doesn’t keep aging and that the P80 doesn’t keep increasing — and only after that will we be in a better position to start improving our response times. In the long run, we’d like to be able to target 8 weeks (4 sprints - our first response time) for status-review, 16 weeks (8 sprints) for status-planned, and 40 weeks (20 sprints) for status-deferred. Please note that these are tentative goals, and that once we’re closer to meeting them, they might still be subject to revision depending on internal resource allocation, in the same way that the previous reporting was subject to adjusting depending on those same factors.

As an example of how this work in practice, using an 8-week target for status-review:

Assume there are 100 posts with the tag.
Each post has had the status-review tag for some number of days.
Order all the posts by the number of days they’ve been in status-review.
The 80th post on that list determines the P80 metric.
For example, if the 80th post has been in status-review for 7 weeks, we are meeting our target. But if the 80th post has been in status-review for 9 weeks, we are not.

This metric acknowledges that some issues will simply take longer — possibly much longer - than the target time to finish. This is a normal and natural part of doing business. However, 80% of the issues pending in any given week should be less than 8 weeks old.

Note that transitioning an issue to a state it’s previously been in doesn’t reset its time in that status. As an example, suppose a post spends 2 weeks in status-planned, then 6 weeks in status-deferred. If the post is then moved back to status-planned, it will begin counting up from 2 weeks, not from 0 again, reflecting the total amount of time the issue has spent in status-planned. This prevents us from deferring handling an issue indefinitely on accident by moving it between statuses. We expect that folks will sometimes make small prioritization errors they later correct, and this should be accounted for seamlessly.

Reporting P80s

External reporting will take place on Meta, in the same post where it’s taken place historically. It will still take place once a quarter, and will be composed of an aggregate report. A report will look something like this:

Status tag	Posts in tag	P80 last quarter	P80 current quarter	Goal (if applicable)
Review	xxx	xxx days	xxx days	xxx days by yyy / decreasing trend
Planned	xxx	xxx days	xxx days	xxx days by yyy / decreasing trend
Deferred	xxx	xxx days	xxx days	xxx days by yyy / decreasing trend

Alongside this, we’ll also include:

The past two years of P80 data for each tag in chart format.
A list of important Meta posts we’ve addressed as a part of this process
Escalation guidance (which generally will match the quarterly roadmap) for what should be escalated on that quarter.

Conclusion

We hope you see an improvement in this change, and that it revitalizes trust in how we handle your escalations internally. Of course, we don’t think changing what we measure, in and of itself, will solve the accumulating backlog of reports, but we do believe that you can’t control what you can’t measure. This metric provides us with clear, actionable steps to ensure success, and those steps are broadly in agreement with what we feel this process needs to meet the needs of the user base.

I should be clear about the fact that these metrics are primarily internal-facing and that we share them publicly for transparency and because we want to hold ourselves accountable to you — this means the internal reporting will be taking the form described above. As such, while I’m open to small corrections, requests for clarifications, and such, I’m much more interested in serious objections to the plan laid out in this post as it relates to the public reporting specifically.

Leaving answers instead of comments is generally easier to thread discussions, so please err on the side of doing that, even if the answers are short. Splitting your concerns to one topic per answer also makes following the discussion much easier.

Some of the details might change between now and a final version (which will be taking shape as an edit to the overall process post), but unless y’all feel I’m going completely in the wrong direction with the public reporting, this is gonna to be how the process will look in the future. Lemme know your thoughts below!

I'm not a Developer (or a Project Manager) but isn't this just a recipe for your dev team to cherry-pick the top 80 easiest tasks every month, and leave the hardest ones to rot? — Richard, Commented Apr 1 at 19:40
@Richard Top 80 %. While it still allows the most difficult tasks to "rot", most of the tasks should be handled comparatively quickly :) — Jeanot Zubler, Commented Apr 1 at 20:16
is the sprint to calendar-time conversion rate 2 sprints = 1 year? — cfr, Commented Apr 6 at 5:15
Sprints are 2-weeks-long, @cfr. The section where I list the tentative targets should make that cleat, but let me know if I can clarify further. — JNat, Commented Apr 7 at 8:16
This reads to me as that this is a metric the dev/pm team has set out as a goal. But will say the legal department prioritize questions and injuries that stem from [status-review] as well to help meet that goal? — A-Tech, Commented Apr 9 at 9:56
Issues like the one you're referencing will generally be triaged to the Trust & Safety team, who will work with the legal department. The T&S team is one of the teams responsible for meeting these goals and targets, so I'm sure they'll do their best to respond to requests on a timely manner. Is that what you mean? — JNat, Commented Apr 9 at 10:07
It has been on review for half a year so I'm not sure we'll agree about the timely bit @JNat — Mast, Commented Apr 20 at 13:58

V2Blast · Accepted Answer · 2025-03-31 19:56:24Z

26

+50

If the company isn't meeting its metrics with regards to P80s and other metrics, would increasing resources towards public platform be on the table? What levels of backlog would result in a relook at resource allocations - especially in the view of bug duty rotations apparently pausing at some point?

edited Mar 31 at 19:56

V2Blast

8,2774 gold badges38 silver badges80 bronze badges

answered Mar 31 at 13:29

Journeyman Geek

201k50 gold badges380 silver badges813 bronze badges

14

There are a few approaches to solving for not meeting response targets where I might be part of the group that makes a decision, but hiring and staffing is well beyond my paygrade. Importantly, though, if the company is not meeting the targets the metrics will signal that that's the case. The process involves several departments and teams, so if we find ourselves not meeting the targets continuously that will beget answers for why that is the case in order to determine the solutions.
– JNat StaffMod
Commented Mar 31 at 13:52

Add a comment |

Larnu · Accepted Answer · 2025-04-10 11:35:05Z

7

Will the P80 target only be looked at from an "all sites" perspective, or will there be any analysis for a per site target. For example, some sites have (many) experiments, some of which can/do need improvements (looking at you, Discussions), or change that go community wide can affect some sites quite different to others, so some meta site have many posts in status-review or status-planned. If, for example, overall you are meeting P80, but you have a site in the community which is failing the P80 target miserably, with a good volume of (total) posts needing work efforts, would that be considered a failure?

Admittedly, at the time of writing only Stack Exchange Meta and Stack Overflow Meta have a "signifcant" volume, though, Code Golf Meta does have 17 posts in a status that denotes work could occur. SEDE

edited Apr 10 at 11:35

answered Apr 7 at 13:18

Larnu

1,3037 silver badges17 bronze badges

4

There are currently no plans to break the reporting down by site. If we observe a site is heavily skewing the stats, one way or the other, we'll consider whether it makes sense to slice it by site ;)
– JNat StaffMod
Commented Apr 7 at 15:16
1

I expect per-site metas to in general have less intermediate-status-tagged posts than main-meta or SO-meta (e.g. Puzzling has only one status-review) so per-site numbers would be subject to quite a bit of noise.
– bobble
Commented Apr 7 at 15:18
Yeah, Code Golf has the most posts of a tag that isn't Meta Stack Exchange/[meta.so], @bobble , with 16 [status-deferred] posts. SEDE
– Larnu
Commented Apr 7 at 15:32
2

A little bonus context: A meaningful/useful measurement of P80 requires a decent number of posts with a given status tag. Plus, if we want to measure the response time network-wide, then incorporating the many issues only reported on MSO is required. But if we want to measure P80 on a small site with just 2-10 posts in review/deferred, well... we can't (what's the 80th %ile of 3 posts?), unless we pool them together with the broader network. That's a more suitable decision anyway, since P80 is supposed to be a "big picture" metric. Response times per-site may still vary.
– Slate StaffMod
Commented Apr 7 at 16:21
I do mention this in the post, @Slate: : "community which is failing the P80 target miserably, with a good volume of (total) posts needing work efforts" I wouldn't expect a site with 2 status posts to be in scope.
– Larnu
Commented Apr 7 at 16:33

Add a comment |

Journeyman Geek · Accepted Answer · 2025-03-31 13:31:26Z

5

With ongoing projects, current issues and feature requests related to these are usually tracked by answers to a main question. I'm aware question tags are automatically monitored by the ticketing system, is there a process in place for issues surfaced in answers to formal requests for feedback, and is there anything on the moderator/regular user end we need to be aware of?

answered Mar 31 at 13:31

Journeyman Geek

201k50 gold badges380 silver badges813 bronze badges

6

There is no formal process in place to monitor such tags in answers, no. It's my understanding that it's a practice folks have taken up across departments, but not quite a standardized one. That being the case, I believe each team/individual who uses that system will have their own way of ensuring the answers the tags get added to are accounted for somewhere. I'll inquire and see if there's a need to standardize this, especially if there are a lot of such answers that end up getting stuck with a process tag for a long time.
– JNat StaffMod
Commented Mar 31 at 13:41
Speaking generally, Meta sites (network wide) could benefit from some new/improved post notices to mark updates/resolution from CMs/other staff/mods.
– Robotnik
Commented Apr 4 at 4:17

Add a comment |

Bart van Ingen Schenau · Accepted Answer · 2025-03-31 14:41:11Z

4

Are you planning to (try to) backfill the P80 data for the last two years, or will the graph start with just a single data point?

answered Mar 31 at 14:41

Bart van Ingen Schenau

3,0761 gold badge19 silver badges17 bronze badges

7

The reporting will include "The past two years of P80 data for each tag in chart format." ;)
– JNat StaffMod
Commented Mar 31 at 14:52

Add a comment |

Martin · Accepted Answer · 2025-04-22 09:58:28Z

0

I've certain tickets that are not status review - and submitted outside meta, for example via a contact form, partially cause I'm hoping a quieter approach would avoid drama, or in theory it's something sensitive. One of those is likely to hit about a year in a potentially open status, and is unresolved.

Are 'non meta' tickets also under P80?

edited Apr 22 at 9:58

Martin

16.4k3 gold badges31 silver badges111 bronze badges

answered Apr 22 at 1:32

Journeyman Geek

201k50 gold badges380 silver badges813 bronze badges

Are these tickets you've sent in via the contact us form? CM escalations? Something else?
– JNat StaffMod
Commented Apr 22 at 7:50
Well, things with a formal ticket after escalation via the Contact form - specifically its the one about the full dump(from august '24 - so not quite, but I've not seen any real movement on this. The initial "informal" request is from jul or earlier ). Some community requests are complicated . There's potential for others depending - So practically, I'd consider this in terms of 'generic' issues that've gotten a ticket of some form rather than my specific, very complicated example. Other examples I'd consider would be things like mod resignations which sometimes slip under the cracks. –
– Journeyman Geek
Commented Apr 22 at 8:01
None of those cases are covered by this metric, no
– JNat StaffMod
Commented 2 days ago

Add a comment |

Stack Exchange Network

Changes to reporting for the [status-review] escalation process

Introducing a new metric for measuring backlog health: P80s

Reporting P80s

Conclusion

5 Answers 5

You must log in to answer this question.

Linked

Hot Network Questions

Changes to reporting for the [status-review] escalation process

Introducing a new metric for measuring backlog health: P80s

Reporting P80s

Conclusion

5 Answers 5

You must log in to answer this question.

Linked

Related

Hot Network Questions