1

We have a series of status updates for projects and the last update for a given project is the one we want to report on in several ways. For instance:

ProjectID DateTime EventDescription
001 2024-12-07 11:34 New
001 2024-12-07 11:36 Submitted
002 2024-12-07 11:40 New
003 2024-12-07 12:34 New
001 2024-12-07 14:02 Approved
002 2024-12-07 14:55 Submitted
004 2024-12-07 15:02 New
004 2024-12-07 15:44 Submitted
001 2024-12-07 16:03 Completed

In our actual data, there are, of course, thousands of projects and many more status updates.

THE GOAL: We have to use an aggregation to grab the last status (the current status) for each project, but we want a summary of the project status and count of each in a datetime range.

For the data above, we want to get:

Status Project Count
New 1
Submitted 2
Completed 1

We are looking for a means to do this in a single query. We have several places where we need this. This is just one example, and using a transformation is not a viable option at this time.

In addition to simple counts, we next hope to figure out how to aggregate these status updates into bucket counts by day to show a status graph across a series of days. How many each day are New, Submitted, etc... But we would be thrilled just to get status counts accurately.

We believe this requires a pipeline aggregation, but have not been able to get it working.

Our working aggregation query to get the latest project status for project:

GET journaling*/_search
{
  "query": {
    "bool": {
      "filter": [
        { "range": {
          "DATETIME": {
            "gte":"2024/11/01 00:00:00.000",
            "lte":"2024/11/30 23:59:59.000"
          }
        }},
        {
          "match": {
            "ACCOUNT": "12345"
          }
        }
      ]      
    }
  },
  "size": 0,
  "aggs": {
    "ProjectStatusSummary": {
      "terms": {
        "field": "PROJECTID"
      },
      "aggs": {
        "group": {
          "top_hits": {
            "size": "1",
            "_source": {
              "includes": [
                "DATETIME",
                "PROJECTID",
                "EVENTDESCRIPTION",
                "PROJECTSTART"
              ]
            },
            "sort": {
              "DATETIME": {
                "order": "desc"
              }
            }
          }
        }
      }
    }
  }
}
6
  • I have learned a bit more. I explored 2 options. I tried to aggregate on this aggregation and received an error from Elastic because it does not like sub-aggregations on a top_hits. I also tried collapsing as a different means to get the latest status, which works great, but I do not yet see a way to aggregate on the output of a collapse either.
    – Scott Lynn
    Commented Dec 18, 2024 at 19:13
  • Continuing to learn. I have tried a "max" aggregation for the DATETIME field, and it does return the correct records, but I have not yet been able to get the fields for the selected record, just the document key and the actual datetime value. This max aggregation could work if I could figure out how to get the other fields for the max document.
    – Scott Lynn
    Commented Dec 19, 2024 at 13:57
  • And I learned max aggregations, just like top_hits, cannot accept sub-aggregations.
    – Scott Lynn
    Commented Dec 19, 2024 at 14:10
  • I would think wanting to work on a set of most recent records for each order, customer or whatever would be very useful, but so far not possible. If you collapse to get most recent, then I do not see a way to aggregate on the collapse results. If you use a top_hits aggregation, then you cannot sub-aggregate on the results. If you use a max aggregation, then you cannot sub-aggregate on the results or see the other fields in the max record found.
    – Scott Lynn
    Commented Dec 19, 2024 at 14:23
  • I believe using named pipelines is my best and final option to achieve what I need, but I have not figured out a named pipeline aggregation yet that fetches records. All the examples I have seen are about doing math and ending up with a numeric result.
    – Scott Lynn
    Commented Dec 19, 2024 at 15:48

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.