Shortly before the code freeze for PostgreSQL 18, Robert Haas added a feature that allows external modules to provide additional information to the EXPLAIN command.
This was a long-awaited feature for me. For an extension that influences the query planning process, providing users with notes on how the extension has affected the plan makes perfect sense. Instead of merely writing to a log file - access to which is often restricted by security policies - this information may be made available through the EXPLAIN command.
The feature introduced many entities that are not easy to figure out: an EXPLAIN option registration routine (RegisterExtensionExplainOption), an explain extension ID, per plan/node hooks, and an option handler.
The pg_overexplain
extension, introduced with this feature to demonstrate how it works, seems a little messy and impractical for me, at least in its current state. So, I decided to find out how flexible this new technique is and demonstrate the opportunities opening up to developers with a more meaningful example. I have modified the freely available pg_index_stats extension and added information about the statistics used in the query planning process.
The STAT parameter was added to the list of EXPLAIN options, accepting Boolean ON/OFF values. If it is enabled, information about the statistics used is inserted at the end of the EXPLAIN: the presence of MCV, histogram, and the number of elements in them, as well as the values of stadistinct, stanullfrac, and stawidth.
You might wonder why this is necessary. After all, doesn't the set of statistics directly stem from the list of expressions in the query? Isn't it possible to identify which statistics were utilised by examining the cost-model code for a particular type of expression?
While it is indeed possible, this approach is not always sufficient. We understand the algorithms, but we typically do not have access to the underlying data. As a result, we cannot accurately determine which specific statistics are
[...]