ElasticSearch: Need to aggregate on an aggregation to get count of status values based on last field for each session

Ask Question

Asked 4 months ago

Modified 4 months ago

Viewed 33 times

We have a series of status updates for projects and the last update for a given project is the one we want to report on in several ways. For instance:

ProjectID	DateTime	EventDescription
001	2024-12-07 11:34	New
001	2024-12-07 11:36	Submitted
002	2024-12-07 11:40	New
003	2024-12-07 12:34	New
001	2024-12-07 14:02	Approved
002	2024-12-07 14:55	Submitted
004	2024-12-07 15:02	New
004	2024-12-07 15:44	Submitted
001	2024-12-07 16:03	Completed

In our actual data, there are, of course, thousands of projects and many more status updates.

THE GOAL: We have to use an aggregation to grab the last status (the current status) for each project, but we want a summary of the project status and count of each in a datetime range.

For the data above, we want to get:

Status	Project Count
New	1
Submitted	2
Completed	1

We are looking for a means to do this in a single query. We have several places where we need this. This is just one example, and using a transformation is not a viable option at this time.

In addition to simple counts, we next hope to figure out how to aggregate these status updates into bucket counts by day to show a status graph across a series of days. How many each day are New, Submitted, etc... But we would be thrilled just to get status counts accurately.

We believe this requires a pipeline aggregation, but have not been able to get it working.

Our working aggregation query to get the latest project status for project:

GET journaling*/_search
{
  "query": {
    "bool": {
      "filter": [
        { "range": {
          "DATETIME": {
            "gte":"2024/11/01 00:00:00.000",
            "lte":"2024/11/30 23:59:59.000"
          }
        }},
        {
          "match": {
            "ACCOUNT": "12345"
          }
        }
      ]      
    }
  },
  "size": 0,
  "aggs": {
    "ProjectStatusSummary": {
      "terms": {
        "field": "PROJECTID"
      },
      "aggs": {
        "group": {
          "top_hits": {
            "size": "1",
            "_source": {
              "includes": [
                "DATETIME",
                "PROJECTID",
                "EVENTDESCRIPTION",
                "PROJECTSTART"
              ]
            },
            "sort": {
              "DATETIME": {
                "order": "desc"
              }
            }
          }
        }
      }
    }
  }
}

asked Dec 18, 2024 at 15:09

Scott Lynn

391 silver badge9 bronze badges

I have learned a bit more. I explored 2 options. I tried to aggregate on this aggregation and received an error from Elastic because it does not like sub-aggregations on a top_hits. I also tried collapsing as a different means to get the latest status, which works great, but I do not yet see a way to aggregate on the output of a collapse either.
– Scott Lynn
Commented Dec 18, 2024 at 19:13
Continuing to learn. I have tried a "max" aggregation for the DATETIME field, and it does return the correct records, but I have not yet been able to get the fields for the selected record, just the document key and the actual datetime value. This max aggregation could work if I could figure out how to get the other fields for the max document.
– Scott Lynn
Commented Dec 19, 2024 at 13:57
And I learned max aggregations, just like top_hits, cannot accept sub-aggregations.
– Scott Lynn
Commented Dec 19, 2024 at 14:10
I would think wanting to work on a set of most recent records for each order, customer or whatever would be very useful, but so far not possible. If you collapse to get most recent, then I do not see a way to aggregate on the collapse results. If you use a top_hits aggregation, then you cannot sub-aggregate on the results. If you use a max aggregation, then you cannot sub-aggregate on the results or see the other fields in the max record found.
– Scott Lynn
Commented Dec 19, 2024 at 14:23
I believe using named pipelines is my best and final option to achieve what I need, but I have not figured out a named pipeline aggregation yet that fetches records. All the examples I have seen are about doing math and ending up with a numeric result.
– Scott Lynn
Commented Dec 19, 2024 at 15:48

| Show 1 more comment

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

ElasticSearch: Need to aggregate on an aggregation to get count of status values based on last field for each session

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked