Group By and Aggregate Performance

N1QL Pushdowns optimize the performance of N1QL queries by supporting GROUP BY and Aggregate expressions.

(Introduced in Couchbase Server 5.5 - Enterprise Edition)

Overview

This features improves performance of N1QL queries with aggregations and GROUP BY execution.

After the optimizer selects an index for a query block, it attempts the two optimizations below:

  • Pagination optimization, by pushing the OFFSET and LIMIT parameters to the index scan.

  • Grouping and aggregation pushdown to the indexer (introduced in Couchbase 5.5).

Prior to Couchbase 5.5, even when a query with aggregate and/or GROUP BY is covered by an index, the query fetched all relevant data from the indexer and group the data within the query engine. With this enhancement, the query intelligently requests the indexer to perform grouping and aggregation in addition to range scan. The Indexer has been enhanced to perform grouping, COUNT(), SUM(), MIN(), MAX(), AVG(), and related operations.

This requires no changes to the user query, but a good index design to cover the query and order the keys is required. Not every query will benefit from this optimization, and not every index can accelerate every grouping and aggregation. Understanding the right patterns will help you to design your indexes and queries. Aggregate Pushdown to the global secondary index is supported on both storage engines: Standard GSI and Memory Optimized GSI (MOI).

This reduction step of performing the GROUP BY and Aggregation on the indexer reduces the amount of data transfer and disk I/O, resulting in:

  • Improved query response time

  • Improved resource utilization

  • Low latency

  • High scalability

  • Low TCO

For example, let’s compare the previous vs. current performance of using GROUP BY and examine the EXPLAIN plan of the following query that is defined in the Couchbase travel-sample index:

CREATE INDEX `def_type` ON `travel-sample`(`type`)

Consider the query:

SELECT type, COUNT(type)
FROM `travel-sample`
WHERE type IS NOT MISSING
GROUP BY type;

Before Couchbase version 5.5, this query engine fetched relevant data from the indexer and group the data within the query engine and then aggregate. This simple query takes about 250 ms.

GBAP Ex0 QP before55

Now, in Couchbase version 5.5, this query use the same def_type index, but executes in under 70 ms. In the explain below, you can see fewer steps and the lack of the grouping step after the index scan because the index scan step does the grouping and aggregation as well.

GBAP Ex0 QP after55

As the data and query complexity grows, the performance benefit (both latency and throughput) will grow as well.

For example, consider the query:

SELECT type, COUNT(type)
FROM `travel-sample`
WHERE type IS NOT MISSING
GROUP BY type;

The text explain plan shows the accelerated aggregation details. For details, see the Query Plan Fields section.

...
        "index_group_aggs": {
          "aggregates": [
            {
              "aggregate": "COUNT",
              "expr": "1",
              "id": 2,
              "keypos": -1
            }
          ],
          "depends": [
            0
          ],
          "group": [
            {
              "depends": [
                0
              ],
              "expr": "cover ((`travel-sample`.`type`))",
              "id": 0,
              "keypos": 0
            }
          ]
        },
        "index_id": "c5bbb792c69c1704",
        "index_projection": {
          "entry_keys": [
            0,
            2
          ]
        },
...

Here’s is how this query executes when the indexer executes the GROUP BY and aggregation. The query engine does not fetch any data from the data service (KV service), as shown below:

GBAP 55execution

For your reference, this is how the same query executed before 5.5.

GBAP pre55execution

Examples for Indexer GROUP BY and Aggregation

Example A: Let’s Consider a composite index to explore some scenarios:

CREATE INDEX idx_a ON `travel-sample` (geo.alt, geo.lat, geo.lon, id) WHERE type = "airport"

Let’s consider sample queries that can benefit from this optimization and the queries that cannot.

Positive Case examples of queries that use indexing grouping and aggregation:

  • SELECT COUNT(*) FROM `travel-sample` WHERE geo.alt > 10 AND type="airport";

  • SELECT COUNT(geo.alt) FROM `travel-sample` WHERE geo.alt BETWEEN 10 AND 30 AND type = "airport";

  • SELECT COUNT(geo.lat) FROM `travel-sample` WHERE geo.alt BETWEEN 10 AND 30 AND geo.lat = 40 AND type = "airport";

  • SELECT geo.alt, AVG(id), SUM(id), COUNT(geo.alt), MIN (geo.lon), MAX(ABS(geo.lon)) FROM `travel-sample` WHERE geo.alt > 100 AND type = "airport" GROUP BY geo.alt;

  • SELECT lat_count, SUM(id) FROM `travel-sample` WHERE geo.alt > 100 AND type = "airport" GROUP BY geo.alt LETTING lat_count = COUNT(geo.lat) HAVING lat_count > 1;

  • SELECT AVG(DISTINCT geo.lat) FROM `travel-sample` WHERE geo.alt > 100 AND type = "airport" GROUP BY geo.alt;

Negative Case examples:

  • SELECT COUNT(*) FROM `travel-sample` WHERE geo.lat > 20 AND type = "airport";

    • This query has no predicate on the leading key geo.alt. The index idx_a cannot be used.

  • SELECT COUNT(*) FROM `travel-sample`;

    • This query has no predicate at all.

  • SELECT COUNT(v1) FROM `travel-sample` LET v1 = ROUND(geo.lat) WHERE geo.lat > 10 AND type = "airport";

    • The aggregate depends on LET variable.

  • SELECT ARRAY_AGG(geo.alt) FROM `travel-sample` WHERE geo.alt > 10 AND type = "airport";

    • ARRAY_AGG is not supported.

Positive query examples with GROUP BY on leading index keys

Example B: Consider the following index:

CREATE INDEX idx_b ON `travel-sample`(geo.alt, geo.lat, geo.lon, id)

In the following query, the GROUP BY keys (geo.alt, geo.lat) are the leading keys of the index, so the index is naturally ordered and grouped by the order of the index key definition. Therefore, the query below is suitable for indexer to handle grouping and aggregation.

SELECT geo.alt, geo.lat, SUM(geo.lon), AVG(id), COUNT(DISTINCT geo.lon)
FROM `travel-sample`
WHERE geo.alt BETWEEN 10 AND 30
AND type = "airport"
GROUP BY geo.alt, geo.lat
HAVING SUM(geo.lon) > 1000;

Here’s the executed query plan showing that index scan handled grouping and aggregation:

GBAP ExB Plan

Positive query examples with GROUP BY on non-leading index keys

Example C: Consider the following index and query:

CREATE INDEX idx_c ON `travel-sample`(geo.alt, geo.lat, geo.lon, id)
 WHERE type = "airport"

SELECT geo.lat, id, SUM(geo.lon)
FROM `travel-sample`
WHERE geo.alt BETWEEN 10 AND 30
AND type = "airport"
GROUP BY geo.lat, id
HAVING SUM(geo.lon) > 1000;

The following is a bottom-up rendering of the execution plan for easier viewing. In this case, the indexer sends partial group aggregation, which the query merges to create the final group and aggregation. In this scenario (when the grouping is on non-leading keys), any query with aggregation and DISTINCT modifier cannot be accelerated by the indexer, such as COUNT(DISTINCT id).

GBAP ExC Plan

Positive query examples on array indexes with GROUP BY on leading index keys

Example D: Consider the following index and query:

CREATE INDEX idx_d ON `travel-sample` (geo.lat, geo.lon, DISTINCT public_likes, id) WHERE type = "hotel"

SELECT geo.lat, geo.lon, SUM(id), AVG(id)
FROM `travel-sample`
WHERE geo.lat BETWEEN 10 AND 30
 AND geo.lon > 50
 AND type = "hotel"
 AND ANY v IN public_likes SATISFIES  v = “%a%” END
GROUP BY geo.lat, geo.lon
HAVING SUM(id) > 100;

In this case, the predicates are on the leading keys up to and including the array key. Therefore, indexer can efficiently do the grouping as seen by the optimal plan below. It’s important to note the array index key is created with a DISTINCT modifier (not the ALL modifier) to get this optimization and that the SATISFIES clause in the ANY predicate must be that of equality (that is, v = “%a%”).

GBAP ExD Plan

Example D2: On the other hand, if there’s a predicate missing on geo.lon--which is prior to the array key—​while using the same idx_d index as above, then the grouping is done by the old method:

SELECT geo.lat, geo.lon, SUM(id), AVG(id)
FROM `travel-sample`
WHERE geo.lat BETWEEN 10 AND 30
 AND type = "hotel"
 AND ANY v IN public_likes SATISFIES  v = “%a%” END
GROUP BY geo.lat, geo.lon
HAVING SUM(id) > 100;
GBAP ExD2 Plan

Example E: Consider the index and query:

CREATE INDEX idx_e ON `travel-sample` (ALL public_likes, geo.lat, geo.lon, id) WHERE type = "hotel"

SELECT un, t.geo.lat, COUNT(un), AVG(t.geo.lat)
FROM `travel-sample` AS t
 UNNEST t.public_likes AS un
WHERE un > "J"
AND t.type = "hotel"
GROUP BY un, t.geo.lat;

In this case, the UNNEST operation can use the index because the leading ALL array key is the array being unwound. Note, the unwound operation repeats the parent document (travel-sample) and the t.geo.lat reference would have duplicates compared to the original travel-sample documents.

GBAP ExE Plan

Query Qualification and Pushdown

Not every GROUP BY and aggregate query can be handled by the indexer. Following are some simple rules that will help you to write the proper queries and design the required indexes to get the most of this feature.

The following are necessary in order for an indexer to execute GROUP BY and aggregates:

  • All the query predicates are able to convert into ranges and able to push to indexer.

  • The whole query must be covered by an index.

    • For a query to be covered by an index, every attribute referenced in the query should be in one index.

    • Query should not have operations such as joins, subquery, or derived table queries.

  • GROUP BY keys and Aggregate expressions must be one of the following:

    • Index keys or document key

    • An expression based on index keys or document key

  • GROUP BY and aggregate expressions must be simple.

Scenarios for Group By and Aggregation

Like any feature in a query language, there are subtle variations between each query and index that affects this optimization. We use the travel-sample dataset to illustrate both positive and negative use cases.

The following table lists the scenarios and requirements for queries to request the indexer to do the grouping and acceleration. When the requirements are unmet, the query will fetch the relevant data and then do the grouping and acceleration as usual. No application changes are necessary. The query plan generated reflects this decision.

1. GROUP BY on leading keys

One of the common cases is to have both predicates and GROUP BY on leading keys of the index. First create the index so that the query is covered by the index. You can then think about the order of the keys.

The query requires a predicate on leading keys to consider an index. The simplest predicate is IS NOT MISSING.

CREATE INDEX idx_expr ON Keyspace_ref (a, b, c);

SELECT a, b, Aggregate_Function(c)  /* MIN(c), MAX(c), COUNT(c), or SUM(c) */
FROM Keyspace_ref
WHERE a IS NOT MISSING              /* 1st index field must be in a WHERE clause */
GROUP BY a, b;

Example 1: List the cities with the landmarks with the highest latitude.

Use the MAX() aggregate to find the highest landmark latitude in each state, group the results by country and state, and then sort in reverse order by the highest latitudes per state.

CREATE INDEX idx1 ON `travel-sample`(country, state, geo.lat)
WHERE type="landmark";

SELECT country, state, MAX(ROUND(geo.lat)) AS Max_Latitude
FROM `travel-sample`
WHERE country IS NOT MISSING
AND type = "landmark"
GROUP BY country, state
ORDER BY Max_Latitude DESC;

In this query, we need to give the predicate country IS NOT MISSING (or any WHERE clause) to ensure this index is selected for the query. Without a matching predicate, the query will use the primary index.

Results:

[
  {
    "Max_Latitude": 60,
    "country": "United Kingdom",
    "state": null
  },
  {
    "Max_Latitude": 51,
    "country": "United Kingdom",
    "state": "England"
  },
  {
    "Max_Latitude": 50,
    "country": "France",
    "state": "Picardie"
  },
...

The Example 1 EXPLAIN Plan shows that GROUP BY is executed by the indexer and is detailed in the GROUP BY Query Plan table:

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan3",
            "covers": [
              "cover ((`travel-sample`.`country`))",
              "cover ((`travel-sample`.`state`))",
              "cover (((`travel-sample`.`geo`).`lat`))",
              "cover ((meta(`travel-sample`).`id`))",
              "cover (count(cover ((`travel-sample`.`country`))))",
              "cover (min(round(cover (((`travel-sample`.`geo`).`lat`)))))"
            ],
            "filter_covers": {
              "cover ((`travel-sample`.`type`))": "landmark"
            },
            "index": "idx1",
            "index_group_aggs": {
              "aggregates": [
                {
                  "aggregate": "MAX",
                  "depends": [
                    2
                  ],
                  "expr": "round(cover (((`travel-sample`.`geo`).`lat`)))",
                  "id": 4,
                  "keypos": -1
                }
              ],
              "depends": [
                0,
                1,
                2
              ],
              "group": [
                {
                  "depends": [
                    0
                  ],
                  "expr": "cover ((`travel-sample`.`country`))",
                  "id": 0,
                  "keypos": 0
                },
                {
                  "depends": [
                    1
                  ],
                  "expr": "cover ((`travel-sample`.`state`))",
                  "id": 1,
                  "keypos": 1
                }
              ]
            },
...

2. GROUP BY on non-leading keys

When using GROUP BY on a non-leading key:

  • The indexer will return pre-aggregated results.

  • Results can have duplicate or out-of-order groups. The N1QL indexer will do 2nd level of aggregation and compute the final result.

  • The N1QL indexer can pushdown only if the leading key has a predicate.

To use Aggregate Pushdown, use the following syntax of the index and query statements:

CREATE INDEX idx_expr ON Keyspace_ref (a, b, c);

Syntax A:

SELECT Aggregate_Function(a), b, Aggregate_Function(c)
FROM Keyspace_ref
WHERE a IS NOT MISSING
GROUP BY b;

Syntax B:

SELECT Aggregate_Function(a), Aggregate_Function(b), c
FROM Keyspace_ref
WHERE a IS NOT MISSING
GROUP BY c;

Example 2 (A): List the states with their total number of landmarks and the lowest latitude of any landmark.

Use the COUNT() operator to find the total number of landmarks and use the MIN() operator to find the lowest landmark latitude in each state, group the results by state, and then sort in order by the lowest latitudes per state.

CREATE INDEX idx2 ON `travel-sample`(country, state, ROUND(geo.lat))
WHERE type="landmark";

SELECT COUNT(country) AS Total_landmarks, state, MIN(ROUND(geo.lat)) AS Min_Latitude
FROM `travel-sample`
WHERE country IN ["France", "United States", "United Kingdom"]
AND type = "landmark"
GROUP BY state
ORDER BY Min_Latitude;

Explain Plan:

GBAP Ex2A EP
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan3",
            "covers": [
              "cover ((`travel-sample`.`country`))",
              "cover ((`travel-sample`.`state`))",
              "cover (((`travel-sample`.`geo`).`lat`))",
              "cover ((meta(`travel-sample`).`id`))",
              "cover (count(cover ((`travel-sample`.`country`))))",
              "cover (min(round(cover (((`travel-sample`.`geo`).`lat`)))))"
            ],
            "filter_covers": {
              "cover ((`travel-sample`.`type`))": "landmark"
            },
            "index": "idx2",
            "index_group_aggs": {
              "aggregates": [
                {
                  "aggregate": "COUNT",
                  "depends": [
                    0
                  ],
                  "expr": "cover ((`travel-sample`.`country`))",
                  "id": 4,
                  "keypos": 0
                },
                {
                  "aggregate": "MIN",
                  "depends": [
                    2
                  ],
                  "expr": "round(cover (((`travel-sample`.`geo`).`lat`)))",
                  "id": 5,
                  "keypos": -1
                }
              ],
              "depends": [
                0,
                1,
                2
              ],
              "group": [
                {
                  "depends": [
                    1
                  ],
                  "expr": "cover ((`travel-sample`.`state`))",
                  "id": 1,
                  "keypos": 1
                }
              ],
              "partial": true
            },
...
The "partial": true line means it was pre-aggregated.

Results:

[
  {
    "Min_Latitude": 33,
    "Total_landmarks": 1900,
    "state": "California"
  },
  {
    "Min_Latitude": 41,
    "Total_landmarks": 8,
    "state": "Corse"
  },
  {
    "Min_Latitude": 43,
    "Total_landmarks": 6,
    "state": "Languedoc-Roussillon"
  },
...

Example 2 (B): List the number of landmarks by latitude and the state it’s in.

Use COUNT(country) for the total number of landmarks at each latitude. At a particular latitude, the state will be the same; but an aggregate function on it is needed, so MIN() or MAX() is used to return the original value.

SELECT COUNT(country) Num_Landmarks, MIN(state) State_Name, ROUND(geo.lat) Latitude
FROM `travel-sample`
WHERE country IS NOT MISSING
AND type = "landmark"
GROUP BY ROUND(geo.lat)
ORDER BY ROUND(geo.lat);

Results:

[
  {
    "Latitude": 33,
    "Num_Landmarks": 227,
    "State_Name": "California"
  },
  {
    "Latitude": 34,
    "Num_Landmarks": 608,
    "State_Name": "California"
  },
  {
    "Latitude": 35,
    "Num_Landmarks": 27,
    "State_Name": "California"
  },
...

3. GROUP BY keys in different CREATE INDEX order

When using GROUP BY on keys in a different order than they appear in the CREATE INDEX statement, use the following syntax:

CREATE INDEX idx_expr ON Keyspace_ref(a, b, c);

SELECT Aggregate_Function(c)
FROM Keyspace_ref
WHERE a IS NOT MISSING
GROUP BY b, a;

Example 3: Like Example 1 with the GROUP BY fields swapped, list the landmarks with the lowest longitude.

Use the MIN() operator to find the lowest landmark longitude in each city, group the results by activity and city, and then sort in reverse order by the lowest longitudes per activity.

CREATE INDEX idx3 ON `travel-sample`(activity, city, geo.lon)
WHERE type="landmark";

SELECT activity, city, MIN(ROUND(geo.lon)) AS Max_Longitude
FROM `travel-sample`
WHERE country IS NOT MISSING
AND type = "landmark"
GROUP BY activity, city
ORDER BY Min_Longitude;

Results:

[
  {
    "Min_Longitude": -124,
    "activity": "buy",
    "city": "Eureka"
  },
  {
    "Min_Longitude": -123,
    "activity": "drink",
    "city": "Glen Ellen"
  },
  {
    "Min_Longitude": -123,
    "activity": "do",
    "city": "Santa Rosa"
  },
  {
    "Min_Longitude": -123,
    "activity": "eat",
    "city": "Moss Beach"
  },
...

4. GROUP BY on expression

When grouping on an expression or operation, the indexer will return pre-aggregated results whenever the GROUP BY and leading index keys are not an exact match.

To use Aggregate Pushdown and avoid pre-aggregated results, use one of the two following syntaxes of the index and query statements:

Syntax A: Field with an expression (GROUP BY and Index keys match)

CREATE INDEX idx_expr ON Keyspace_ref(a+b, b, c);

SELECT Aggregate_Function(c)
FROM Keyspace_ref
WHERE a IS NOT MISSING
GROUP BY a+b;

Syntax B: Operation on a field (GROUP BY and Index keys match)

CREATE INDEX idx_operation ON Keyspace_ref (LOWER(a), b, c);

SELECT Aggregate_Function(c)
FROM Keyspace_ref
WHERE a IS NOT MISSING
GROUP BY LOWER(a);

For comparison, the below index and query combination will yield pre-aggregated results.

Pre-aggregated Syntax: The GROUP BY and Index keys don’t match.

CREATE INDEX idx_operation ON Keyspace_ref (a, b, c);

SELECT Aggregate_Function(c)
FROM Keyspace_ref
WHERE a IS NOT MISSING
GROUP BY UPPER(a);

Example 4 (A): A field with an expression.

Let’s say the distance of a flight feels like "nothing" when it’s direct, but feels like the true distance when there is one layover. Then we can list and group by flight distances by calculating the distance multiplied by the stops it makes.

CREATE INDEX idx4_expr
ON `travel-sample`(ROUND(distance*stops), ROUND(distance), sourceairport)
WHERE type="route";

SELECT ROUND(distance*stops) AS Distance_Feels_Like,
       MAX(ROUND(distance)) AS Distance_True,
       COUNT(sourceairport) Number_of_Airports
FROM `travel-sample`
WHERE ROUND(distance*stops) IS NOT MISSING
AND type = "route"
GROUP BY ROUND(distance*stops);

Query Plan:

GBAP Ex4A VP
...
        "index_group_aggs": {
          "aggregates": [
            {
              "aggregate": "COUNT",
              "depends": [
                2
              ],
              "expr": "cover ((`travel-sample`.`sourceairport`))",
              "id": 4,
              "keypos": 2
            },
            {
              "aggregate": "MAX",
              "depends": [
                1
              ],
              "expr": "cover (round((`travel-sample`.`distance`)))",
              "id": 5,
              "keypos": 1
            }
          ],
          "depends": [
            0,
            1,
            2
          ],
          "group": [
            {
              "depends": [
                0
              ],
              "expr": "cover (round(((`travel-sample`.`distance`) * (`travel-sample`.`stops`))))",
              "id": 0,
              "keypos": 0
            }
          ]
...

Results:

[
  {
    "Distance_Feels_Like": 1055,
    "Distance_True": 1055,
    "Number_of_Airports": 1
  },
  {
    "Distance_Feels_Like": 1806,
    "Distance_True": 1806,
    "Number_of_Airports": 2
  },
  {
    "Distance_Feels_Like": 0,
    "Distance_True": 13808,
    "Number_of_Airports": 24018
  },
...

Example 4 (B): An operation on a field.

Let’s say the distance of a flight feels like "nothing" when it’s direct, but feels like the true distance when there is one layover. Then we can list and group by the uppercase of the airport codes and listing the flight distances by calculating the distance multiplied by the stops it makes along with the total distance.

CREATE INDEX idx4_oper
ON `travel-sample`(sourceairport, ROUND(distance*stops), distance)
WHERE type="route";

SELECT UPPER(sourceairport) AS Airport_Code,
       MIN(ROUND(distance*stops)) AS Distance_Feels_Like,
       SUM(ROUND(distance)) AS Total_Distance
FROM `travel-sample`
WHERE sourceairport IS NOT MISSING
AND type = "route"
GROUP BY UPPER(sourceairport);

Results:

[
  {
    "Airport_Code": "ESU",
    "Distance_Feels_Like": 0,
    "Total_Distance": 6223
  },
  {
    "Airport_Code": "QSF",
    "Distance_Feels_Like": 0,
    "Total_Distance": 3285
  },
  {
    "Airport_Code": "LHW",
    "Distance_Feels_Like": 0,
    "Total_Distance": 13837
  },
...

5. Heterogeneous data types for GROUP BY key

When a field has a mix of data types for the GROUP BY key:

  • NULLS and MISSING are two separate groups.

Example 5:

To see a separate grouping of MISSING and NULL, we need to GROUP BY a field we know exists in one document but not in another document while both documents have another field in common. For example, create 3 such documents:

INSERT INTO `travel-sample` VALUES("01",{"type":1, "email":"abc","xx":3});

INSERT INTO `travel-sample` VALUES("01",{"type":1, "email":"abc","xx":null});

INSERT INTO `travel-sample` VALUES("02",{"type":1, "email":"abcd"});

Then run the following query:

SELECT type, xx, MIN(email) AS Min_Email
FROM `travel-sample`
WHERE type IS NOT NULL
GROUP BY type, xx;

Results:

[
  {
    "Min_Email": "abc",
    "type": 1,
    "xx": 3
  },
  {
    "Min_Email": "abc",
    "type": 1,
    "xx": null
  },
  {
    "Min_Email": "abcd",
    "type": 1              <-- is a separate result since field "xx" is MISSING
  },
  {
    "Min_Email": null,
    "type": "airline"
  },
...

6. GROUP BY META().ID Primary Index

If there is no filter, then pushdown is supported for an expression on the Document ID META().id in the GROUP BY clause.

To use Aggregate Pushdown, use the following example of the index and query statement:

CREATE PRIMARY INDEX idx_expr ON named_keyspace_ref;

SELECT COUNT(1)
FROM named_keyspace_ref
GROUP BY SUBSTR(META().id, 0, 10);
If there is a filter on the Document ID, then the primary index can be used as a secondary scan.

Example 6: List the number of countries that are in each decile of the META().id field.

CREATE PRIMARY INDEX idx6 ON `travel-sample`;

SELECT COUNT(1) AS Cnt, SUBSTR(META().id,0,9) AS Meta_Group
FROM `travel-sample`
GROUP BY SUBSTR(META().id,0,9);

Results:

[
  {
    "Meta_Group": "airport_9",
    "Number_of_Country": 121
  },
  {
    "Meta_Group": "airport_1",
    "Number_of_Country": 187
  },
  {
    "Meta_Group": "airport_3",
    "Number_of_Country": 482
  },
...

7. LIMIT with GROUP BY on leading keys

To use Aggregate Pushdown when there is a LIMIT clause and a GROUP BY clause on one or more leading keys, use the following example of the index and query statement:

CREATE INDEX idx_expr ON named_keyspace_ref (k0, k1);

SELECT k0, COUNT(k1)
FROM named_keyspace_ref
WHERE k0 IS NOT MISSING
GROUP BY k0
LIMIT n;

Example 7: LIMIT with GROUP BY on the leading key.

CREATE INDEX idx7 ON `travel-sample` (city, name)
WHERE type = "landmark";

SELECT city AS City, COUNT(DISTINCT name) AS Landmark_Count
FROM `travel-sample`
WHERE city IS NOT MISSING
AND type = "landmark"
GROUP BY city
LIMIT 4;

Explain Plan:

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan3",
            "covers": [
              "cover ((`travel-sample`.`city`))",
              "cover ((`travel-sample`.`name`))",
              "cover ((meta(`travel-sample`).`id`))",
              "cover (count(distinct cover ((`travel-sample`.`name`))))"
            ],
            "filter_covers": {
              "cover ((`travel-sample`.`type`))": "landmark"
            },
            "index": "idx7",
            "index_group_aggs": {
              "aggregates": [
                {
                  "aggregate": "COUNT",
                  "depends": [
                    1
                  ],
                  "distinct": true,
                  "expr": "cover ((`travel-sample`.`name`))",
                  "id": 3,
                  "keypos": 1
                }
              ],
              "depends": [
                0,
                1
              ],
              "group": [
                {
                  "depends": [
                    0
                  ],
                  "expr": "cover ((`travel-sample`.`city`))",
                  "id": 0,
                  "keypos": 0
                }
              ]
            },
            "index_id": "7852b5e2c07281f3",
            "index_projection": {
              "entry_keys": [
                0,
                3
              ]
            },
            "keyspace": "travel-sample",
            "limit": "4",
            "namespace": "default",
...
The limit is pushed to the indexer because the GROUP BY key matched with the leading index key.

Results:

[
  {
    "City": null,
    "Landmark_Count": 15
  },
  {
    "City": "Abbeville",
    "Landmark_Count": 1
  },
  {
    "City": "Abbots Langley",
    "Landmark_Count": 19
  },
  {
    "City": "Aberdeenshire",
    "Landmark_Count": 6
  }
]

8. OFFSET with GROUP BY on leading keys

To use Aggregate Pushdown when there is an OFFSET clause and a GROUP BY clause on one or more leading keys, use the following example of the index and query statement.

CREATE INDEX idx_expr ON named_keyspace_ref (k0, k1);

SELECT k0, COUNT(k1)
FROM named_keyspace_ref
WHERE k0 IS NOT MISSING
GROUP BY k0
OFFSET n;

Example 8: OFFSET with GROUP BY on a leading key.

CREATE INDEX idx8 ON `travel-sample` (city, name)
WHERE type = "landmark";

SELECT city AS City, COUNT(DISTINCT name) AS Landmark_Count
FROM `travel-sample`
WHERE city IS NOT MISSING
AND type = "landmark"
GROUP BY city
OFFSET 4;

Explain Plan:

{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan3",
            "covers": [
              "cover ((`travel-sample`.`city`))",
              "cover ((`travel-sample`.`name`))",
              "cover ((meta(`travel-sample`).`id`))",
              "cover (count(distinct cover ((`travel-sample`.`name`))))"
            ],
            "filter_covers": {
              "cover ((`travel-sample`.`type`))": "landmark"
            },
            "index": "idx7",
            "index_group_aggs": {
              "aggregates": [
                {
                  "aggregate": "COUNT",
                  "depends": [
                    1
                  ],
                  "distinct": true,
                  "expr": "cover ((`travel-sample`.`name`))",
                  "id": 3,
                  "keypos": 1
                }
              ],
              "depends": [
                0,
                1
              ],
              "group": [
                {
                  "depends": [
                    0
                  ],
                  "expr": "cover ((`travel-sample`.`city`))",
                  "id": 0,
                  "keypos": 0
                }
              ]
            },
            "index_id": "7852b5e2c07281f3",
            "index_projection": {
              "entry_keys": [
                0,
                3
              ]
            },
            "keyspace": "travel-sample",
            "namespace": "default",
            "offset": "4",
            "spans": [
...
The offset is pushed to the indexer because the GROUP BY key matched with the leading index key.

Results:

[
  {
    "City": "Aberdour",
    "Landmark_Count": 4
  },
  {
    "City": "Aberdulais",
    "Landmark_Count": 1
  },
  {
    "City": "Abereiddy",
    "Landmark_Count": 1
  },
  {
    "City": "Aberfeldy",
    "Landmark_Count": 2
  },
...

9. Aggregate without GROUP BY key

This is a case of aggregation over a range without groups. If the index can be used for computing the aggregate, the indexer will return a single aggregate value. To use Aggregate Pushdown, use the following syntax of index and queries:

CREATE INDEX idx_expr ON named_keyspace_ref (a, b, c);

Q1: SELECT Aggregate_Function(c)
    FROM named_keyspace_ref
    WHERE a IS NOT MISSING;

Q2: SELECT SUM(a)
    FROM named_keyspace_ref
    WHERE a IS NOT MISSING;

Q3: SELECT SUM(a), COUNT(a), MIN(a)
    FROM named_keyspace_ref
    WHERE a IS NOT MISSING;

Q4: SELECT SUM(a), COUNT(b), MIN(c)
    FROM named_keyspace_ref
    WHERE a IS NOT MISSING;

Example 9 (Q1): Multiple Aggregate without GROUP BY key.

CREATE INDEX idx9
ON `travel-sample`(ROUND(distance), stops, sourceairport)
WHERE type = "airport";

Q1: SELECT SUM(ROUND(distance)) AS Total_Distance,
           SUM(stops) AS Total_Stops,
           COUNT(sourceairport) AS Total_Airports
    FROM `travel-sample`
    WHERE distance IS NOT MISSING
    AND type = "airport";

Results:

[
  {
    "Total_Airports": 24024,
    "Total_Distance": 53538071,
    "Total_Stops": 6
  }
]

Example 9 (Q2): Aggregate without GROUP BY key.

Q2: SELECT SUM(ROUND(distance)) AS Total_Distance
    FROM `travel-sample`;

Results:

[
  {
    "Total_Distance": 53538071
  }
]

Example 9 (Q3): Multiple Aggregate without GROUP BY key.

Q3: SELECT SUM(ROUND(distance)) AS Total_Distance,
           COUNT(ROUND(distance)) AS Count_of_Distance,
           MIN(ROUND(distance)) AS Min_of_Distance
    FROM `travel-sample`
    WHERE distance IS NOT MISSING;

Results:

[
  {
    "Count_of_Distance": 24024,
    "Min_of_Distance": 3,
    "Total_Distance": 53538071
  }
]

Example 9 (Q4): Multiple Aggregate without GROUP BY key.

Q4: SELECT SUM(ROUND(distance)) AS Total_Distance,
           COUNT(stops) AS Count_of_Stops,
           MIN(sourceairport) AS Min_of_Airport
    FROM `travel-sample`
    WHERE distance IS NOT MISSING;

Results:

[
  {
    "Count_of_Stops": 24024,
    "Min_of_Airport": "AAE",
    "Total_Distance": 53538071
  }
]

10. Expression in Aggregate function

Aggregations with scalar expressions can be speeded up even if the index key does not have the matching expression on the key. To use Aggregate Pushdown, use the following syntax of the index and query statement:

CREATE INDEX idx_expr ON named_keyspace_ref (a,b,c);

SELECT Aggregate_Function1(Expression(c))
FROM named_keyspace_ref
WHERE a IS NOT MISSING
GROUP BY a,b;

Example 10: List the landmarks with the highest latitude.

Use the MAX() operator to find the highest landmark latitude in each state, group the results by country and state, and then sort in reverse order by the highest latitudes.

CREATE INDEX idx10 ON `travel-sample`(country, state, ABS(ROUND(geo.lat)))
WHERE type="landmark";

SELECT country, state, SUM(ABS(ROUND(geo.lat))) AS SumAbs_Latitude
FROM `travel-sample`
WHERE country IS NOT MISSING
AND type = "landmark"
GROUP BY country, state
ORDER BY SumAbs_Latitude DESC;

The Example 10 Explain Plan shows that Aggregates are executed by the indexer and is detailed in the Aggregate Query Plan table:

GBAP Ex10 VP
{
  "plan": {
    "#operator": "Sequence",
    "~children": [
      {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan3",
            "covers": [
              "cover ((`travel-sample`.`country`))",
              "cover ((`travel-sample`.`state`))",
              "cover (abs(round(((`travel-sample`.`geo`).`lat`))))",
              "cover ((meta(`travel-sample`).`id`))",
              "cover (sum(cover (abs(round(((`travel-sample`.`geo`).`lat`))))))"
            ],
            "filter_covers": {
              "cover ((`travel-sample`.`type`))": "landmark"
            },
            "index": "idx10",
            "index_group_aggs": {
              "aggregates": [
                {
                  "aggregate": "SUM",
                  "depends": [
                    2
                  ],
                  "expr": "cover (abs(round(((`travel-sample`.`geo`).`lat`))))",
                  "id": 4,
                  "keypos": 2
                }
              ],
              "depends": [
                0,
                1,
                2
              ],
              "group": [
                {
                  "depends": [
                    0
                  ],
                  "expr": "cover ((`travel-sample`.`country`))",
                  "id": 0,
                  "keypos": 0
                },
                {
                  "depends": [
                    1
                  ],
                  "expr": "cover ((`travel-sample`.`state`))",
                  "id": 1,
                  "keypos": 1
                }
...

Results:

[
  {
    "SumAbs_Latitude": 117513,
    "country": "United Kingdom",
    "state": null
  },
  {
    "SumAbs_Latitude": 68503,
    "country": "United States",
    "state": "California"
  },
  {
    "SumAbs_Latitude": 10333,
    "country": "France",
    "state": "Île-de-France"
  },
...

11. SUM, COUNT, MIN, MAX, or AVG Aggregate functions

Currently, the only aggregate functions that are supported are SUM(), COUNT(), MIN(), MAX(), and AVG() with or without the DISTINCT modifier.

To use Aggregate Pushdown, use the below syntax of the index and query statement:

CREATE INDEX idx_expr ON named_keyspace_ref (a,b,c,d);

SELECT Aggregate_Function(a), Aggregate_Function(b),
       Aggregate_Function(c), Aggregate_Function(d)
FROM named_keyspace_ref
WHERE a IS NOT MISSING
GROUP BY a;

Example 11:

CREATE INDEX idx11 ON `travel-sample`(ROUND(geo.lat), geo.alt, city, ROUND(geo.lon))
       WHERE type = "airport";

SELECT MIN(ROUND(geo.lat)) AS Min_Lat,
       SUM(geo.alt) AS Sum_Alt,
       COUNT(city) AS Count_City,
       MAX(ROUND(geo.lon)) AS Max_Lon
FROM `travel-sample`
WHERE geo.lat IS NOT MISSING
AND type = "airport"
GROUP BY (ROUND(geo.lat))
ORDER BY (ROUND(geo.lat)) DESC;

Results:

[
  {
    "Count_City": 1,
    "Max_Lon": 43,
    "Min_Lat": 72,
    "Sum_Alt": 149
  },
  {
    "Count_City": 3,
    "Max_Lon": -157,
    "Min_Lat": 71,
    "Sum_Alt": 120
  },
  {
    "Count_City": 6,
    "Max_Lon": -144,
    "Min_Lat": 70,
    "Sum_Alt": 292
  },
...

12. DISTINCT aggregates

There are four cases when DISTINCT aggregates can use this feature:

  1. If the DISTINCT aggregate is on the leading GROUP BY key(s).

  2. If the DISTINCT aggregate is on the leading GROUP By key(s) + 1 (the immediate next key).

  3. If the DISTINCT aggregate is on a constant expression (GROUP BY can be on any key).

  4. If there is no GROUP BY and the DISTINCT aggregate is on the first key only or in a constant expression.

To use Aggregate Pushdown, use one of the following syntaxes of the index and query statements:

Case #1: If the DISTINCT aggregate is on the leading GROUP BY key(s).

CREATE INDEX idx_expr ON named_keyspace_ref (a, b, c);
Syntax A
SELECT SUM(DISTINCT a)
FROM named_keyspace_ref
WHERE a IS NOT MISSING
GROUP BY a;
Syntax B
SELECT COUNT(DISTINCT a), SUM(DISTINCT b)
FROM named_keyspace_ref
WHERE a IS NOT MISSING
GROUP BY a, b;

Example 12-1 (A): A DISTINCT aggregate on the leading GROUP BY key(s).

CREATE INDEX idx12_1 ON `travel-sample`(ROUND(geo.lat), ROUND(geo.lon), country)
WHERE type = "airport";

SELECT SUM(DISTINCT ROUND(geo.lat)) AS Sum_Lat
FROM `travel-sample`
WHERE geo.lat IS NOT MISSING
AND type = "airport"
GROUP BY ROUND(geo.lat);

Results:

[
  {
    "Sum_Lat": 27
  },
  {
    "Sum_Lat": 36
  },
  {
    "Sum_Lat": 71
  },
...

Example 12-1 (B): A DISTINCT aggregate on the leading GROUP BY key(s).

SELECT COUNT(DISTINCT ROUND(geo.lat)) AS Count_Lat,
       SUM(DISTINCT ROUND(geo.lon)) AS Sum_Lon
FROM `travel-sample`
WHERE geo.lat IS NOT MISSING
AND type = "airport"
GROUP BY ROUND(geo.lat), ROUND(geo.lon);

Results:

[
  {
    "Count_Lat": 1,
    "Sum_Lon": -166
  },
  {
    "Count_Lat": 1,
    "Sum_Lon": -107
  },
  {
    "Count_Lat": 1,
    "Sum_Lon": -159
  },
...

Case #2: If the DISTINCT aggregate is on the leading GROUP BY key(s) + 1 (the next key)

CREATE INDEX idx_expr ON named_keyspace_ref (a, b, c);
Syntax A
SELECT SUM(DISTINCT b)
FROM named_keyspace_ref
WHERE a IS NOT MISSING
GROUP BY a;
Syntax B
SELECT COUNT(DISTINCT c)
FROM named_keyspace_ref
WHERE a IS NOT MISSING
GROUP BY a, b;

Example 12-2 (A): A DISTINCT aggregate on the leading GROUP BY key(s) + 1 (the next key).

CREATE INDEX idx12_2 ON `travel-sample`(country, ROUND(geo.lat), ROUND(geo.lon))
WHERE type = "airport";

SELECT COUNT(DISTINCT country) AS Count_Country,
       SUM(DISTINCT ROUND(geo.lat)) AS Sum_Lat
FROM `travel-sample`
WHERE country IS NOT MISSING
AND type = "airport"
GROUP BY country;

Results:

[
  {
    "Count_Country": 1,
    "Sum_Lat": 483
  },
  {
    "Count_Country": 1,
    "Sum_Lat": 2290
  },
  {
    "Count_Country": 1,
    "Sum_Lat": 591
  }
]

Example 12-2 (B): A DISTINCT aggregate on the leading GROUP BY key(s) + 1 (the next key)

SELECT COUNT(DISTINCT country) AS Count_Country,
       SUM(DISTINCT ROUND(geo.lat)) AS Sum_Lat,
       COUNT(DISTINCT ROUND(geo.lon)) AS Count_Lon
FROM `travel-sample`
WHERE country IS NOT MISSING
AND type = "airport"
GROUP BY country, ROUND(geo.lat);

Results:

[
  {
    "Count_Country": 1,
    "Count_Lon": 16,
    "Sum_Lat": 483
  },
  {
    "Count_Country": 1,
    "Count_Lon": 103,
    "Sum_Lat": 2290
  },
  {
    "Count_Country": 1,
    "Count_Lon": 13,
    "Sum_Lat": 591
  }
]

Case #3: If the DISTINCT aggregate is on a constant expression (GROUP BY can be on any key)

CREATE INDEX idx_expr ON named_keyspace_ref (a, b, c);

SELECT a, COUNT(DISTINCT 1)
FROM named_keyspace_ref
WHERE a IS NOT MISSING
GROUP BY b;
The results will be pre-aggregated if the GROUP BY key is non-leading, as in this case and example.

Example 12-3: A DISTINCT aggregate on a constant expression (GROUP BY can be on any key)

CREATE INDEX idx12_3 ON `travel-sample`(country, geo.lat, geo.lon)
WHERE type = "airport";

SELECT MIN(country) AS Min_Country,
       COUNT(DISTINCT 1) AS Constant_Value,
       MIN(ROUND(geo.lon)) AS Min_Logitude
FROM `travel-sample`
WHERE country IS NOT MISSING
AND type = "airport"
GROUP BY geo.lat;

Results:

[
  {
    "Constant_Value": 1,
    "Min_Country": "United States",
    "Min_Longitude": -75
  },
  {
    "Constant_Value": 1,
    "Min_Country": "United States",
    "Min_Longitude": -169
  },
  {
    "Constant_Value": 1,
    "Min_Country": "United States",
    "Min_Longitude": -165
  },
...

Case #4: If the DISTINCT aggregate is on the first key only or in a constant expression, and there is no GROUP BY clause

CREATE INDEX idx_expr ON named_keyspace_ref (a, b, c);

Q1: SELECT SUM(DISTINCT a)
    FROM named_keyspace_ref;               /* ok */

Q2: SELECT COUNT(DISTINCT 1)
    FROM named_keyspace_ref;               /* ok */

Q3: SELECT SUM(DISTINCT c)
    FROM named_keyspace_ref;                /* not ok */

All other cases of DISTINCT pushdown will return an error.

Example 12-4: A DISTINCT aggregate on the first key only or in a constant expression, and there is no GROUP BY clause.

CREATE INDEX idx12_4 ON `travel-sample`(geo.alt, geo.lat, geo.lon)
WHERE type = "airport";

Q1: SELECT SUM(DISTINCT ROUND(geo.alt)) AS Sum_Alt
    FROM `travel-sample`
    WHERE geo.alt IS NOT MISSING
    AND type = "airport";

Results:

[
  {
    "Sum_Alt": 1463241
  }
]

Another query with index idx12_4:

Q2: SELECT COUNT(DISTINCT 1) AS Const_expr
    FROM `travel-sample`
    WHERE type = "airport";

Results:

[
  {
    "Const_expr": 1
  }
]

Another query with index idx12_4 but will not pushdown the aggregate to the indexer:

Q3: SELECT SUM(DISTINCT ROUND(geo.lon)) AS Sum_Lon
    FROM `travel-sample`
    WHERE geo.alt IS NOT MISSING
    AND type = "airport";

Results:

[
  {
    "Sum_Lon": -11412
  }
]

13. HAVING with an aggregate function inside

To use Aggregate Pushdown when a HAVING clause has an aggregate function inside, use the following syntax of index and query statement:

CREATE INDEX idx_expr ON named_keyspace_ref (k0, k1);

SELECT k0, COUNT(k1)
FROM named_keyspace_ref
WHERE k0 IS NOT MISSING
GROUP BY k0
HAVING Aggregate_Function(k1);

Example 13: HAVING with an aggregate function inside.

List the cities that have more than 180 landmarks.

CREATE INDEX idx13 ON `travel-sample` (city, name)
WHERE type = "landmark";

SELECT city AS City, COUNT(DISTINCT name) AS Landmark_Count
FROM `travel-sample`
WHERE city IS NOT MISSING
AND type = "landmark"
GROUP BY city
HAVING COUNT(DISTINCT name) > 180;

Results:

[
  {
    "City": "London",
    "Landmark_Count": 443
  },
  {
    "City": "Los Angeles",
    "Landmark_Count": 284
  },
  {
    "City": "San Diego",
    "Landmark_Count": 197
  },
  {
    "City": "San Francisco",
    "Landmark_Count": 797
  }
]

14. LETTING with an aggregate function inside

To use Aggregate Pushdown when a LETTING clause has an aggregate function inside, use the following syntax of the index and query statement.

CREATE INDEX idx_expr ON named_keyspace_ref (k0, k1);

SELECT k0, COUNT(k1)
FROM named_keyspace_ref
WHERE k0 IS NOT MISSING
GROUP BY k0
LETTING var_expr = Aggregate_Function(k1)
HAVING var_expr;

Example 14: LETTING with an aggregate function inside.

List cities that have more than half of all landmarks.

CREATE INDEX idx14 ON `travel-sample` (city, name)
WHERE type = "landmark";

SELECT city AS City, COUNT(DISTINCT name) AS Landmark_Count
FROM `travel-sample`
WHERE city IS NOT MISSING
AND type = "landmark"
GROUP BY city
LETTING MinimumThingsToSee = COUNT(DISTINCT name)
HAVING MinimumThingsToSee > 180;

Results:

[
  {
    "City": "London",
    "Landmark_Count": 443
  },
  {
    "City": "Los Angeles",
    "Landmark_Count": 284
  },
  {
    "City": "San Diego",
    "Landmark_Count": 197
  },
  {
    "City": "San Francisco",
    "Landmark_Count": 797
  }
]

Limitations

The following are currently not supported and not pushed to the indexer:

  • HAVING or LETTING clauses, unless there is an aggregate function inside.

  • ORDER BY clauses.

  • ARRAY_AGG() or any facility to add new Aggregate function, such as Median.

  • LIMIT pushdown with GROUP BY on non-leading keys.

  • OFFSET pushdown with GROUP BY on non-leading keys.

  • A subquery in a GROUP BY or Aggregate pushdown.

Aggregate Comparison

Item Aggregate on Non-Array Index Field Aggregate on Array Index Field DISTINCT Aggregate on Non-Array Index Field DISTINCT Aggregate on Array Index Field

Supports MIN() and MAX()

-

-

Supports SUM() and COUNT()

-

Supports AVG()

Supports ARRAY_AGG()

-

-

-

-

Appx 1 - Query Plan Fields

Consider the example:

EXPLAIN SELECT type, COUNT(type)
FROM `travel-sample`
WHERE type IS NOT MISSING
GROUP BY type;

In the query plan:

  • plan.`~children`[0.covers] shows that the index covers the query.

  • plan.`~children`[0.index_group_aggs] shows the aggregation and groupings done by the indexer.

  • index_group_aggs object has details on the aggregate, index key position, expression dependency, and group expressions handled by the indexer. This object is present in the plan only when the indexer handles the grouping and aggregation.

    Item Name Description Explain Text in This Example

    aggregates

    Array of Aggregate objects, and each object represents one aggregate function. The absence of this item means there is no Aggregate function.

    aggregates

    ... aggregate

    Aggregate operation.

    COUNT

    ... depends

    List of index key positions the GROUP BY expression depends on, starting with 0.

    0

    (because it’s the 1st item)

    ... expr

    Group expression or an aggregate expression.

    "cover ((`travel-sample`.`type`))"

    ... id

    Unique ID given internally and will be used in index_projection

    2

    ... keypos

    Key Position to use the Index expr or the query expr.

    • A value > -1 means the group key exactly matches the corresponding index keys, where 0 is the 1st index key.

    • A value of -1 means the group key does not match the index key and uses the query expression instead.

    0

    (because it matches the 1st index key)

    depends

    List of index key positions the GROUP BY expression depends on, starting with 0.

    0

    (because it’s the 1st item)

    group

    Array of GROUP BY objects, and each object represents one group key. The absence of this item means there is no GROUP BY clause.

    group

    ... depends

    Index key position of a single GROUP BY expression

    0

    (because it’s the 1st GROUP BY key)

    ... expr

    Single GROUP BY expression.

    "cover ((`travel-sample`.`type`))"

    ... id

    Unique ID given internally and will be used in index_projection

    0

    ... keypos

    Key Position to use the Index expr or the query expr.

    • A value > -1 means the group key exactly matches the corresponding index keys, where 0 is the 1st index key.

    • A value of -1 means the group key does not match the index key and uses the query expression instead.

    0

    (because it matches the 1st key in the index expression)

    {
      "plan": {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "IndexScan3",
            "covers": [
              "cover ((`travel-sample`.`name`))",
              "cover ((meta(`travel-sample`).`id`))",
              "cover (count(1))"
            ],
            "index": "idx_name",
            "index_group_aggs": {
              "aggregates": [
                {
                  "aggregate": "COUNT",
                  "expr": "1",
                  "id": 2,
                  "keypos": -1
                }
              ],
              "depends": [
                0
              ],
              "group": [
                {
                  "depends": [
                    0
                  ],
                  "expr": "cover ((`travel-sample`.`name`))",
                  "id": 0,
                  "keypos": 0
                }
              ]
            },
            "index_id": "5dfe130db88b4ec",
            "index_projection": {
              "entry_keys": [
                0,
                2
              ]
            },
            "keyspace": "travel-sample",
            "namespace": "default",
            "spans": [
              {
                "exact": true,
                "range": [
                  {
                    "inclusion": 1,
                    "low": "null"
                  }
                ]
              }
            ],
            "using": "gsi"
          },
          {
            "#operator": "Parallel",
            "~child": {
              "#operator": "Sequence",
              "~children": [
                {
                  "#operator": "InitialProject",
                  "result_terms": [
                    {
                      "expr": "cover ((`travel-sample`.`name`))"
                    },
                    {
                      "expr": "cover (count(1))"
                    }
                  ]
                },
                {
                  "#operator": "FinalProject"
                }
              ]
            }
          }
        ]
      },
      "text": "select name, count(1)\nfrom `travel-sample` use index (idx_name)\nwhere name is not missing\ngroup by name;"
    }
When the index_group_aggs section is present, it means that the query is using Index Aggregations.

GROUP BY Query Plan

Item Name Description EXPLAIN Text in Example #1 (GROUP BY)

aggregates

Array of Aggregate objects, and each object represents one aggregate function. The absence of this item means there is no Aggregate function.

aggregates

... aggregate

Aggregate operation.

MAX

... depends

List of index key positions the GROUP BY expression depends on, starting with 0.

2

(because it’s the 3rd item)

... expr

Group expression or an aggregate expression.

round(cover (((`travel-sample`. `geo`).`lat`)))

... id

Unique ID given internally and will be used in index_projection

4

... keypos

Key Position to use the Index expr or the query expr.

  • A value > -1 means the group key exactly matches the corresponding index keys, where 0 is the 1st index key.

  • A value of -1 means the group key does not match the index key and uses the query expression instead.

-1

(because the index has the field geo.lat but the query adds the ROUND() function to geo.lat)

depends

List of index key positions the GROUP BY expression depends on, starting with 0.

0, 1, 2

group

Array of GROUP BY objects, and each object represents one group key. The absence of this item means there is no GROUP BY clause.

group

... depends

Index key position of a single GROUP BY expression, starting with 0.

0

(because it’s the 1st GROUP BY key)

... expr

Single GROUP BY expression.

`travel-sample`.`country`

... id

Unique ID given internally and will be used in index_projection.

0

... keypos

Key Position to use the Index expr or the query expr.

  • A value > -1 means the group key exactly matches the corresponding index keys, where 0 is the 1st index key.

  • A value of -1 means the group key does not match the index key and uses the query expression instead.

0

(because it matches the first key in the index expression)

The Query Plan sections of an Aggregate pushdown are slightly different than those used in a GROUP BY.

Aggregate Query Plan

Item Name Description EXPLAIN Text in Example #10 (Aggregate)

aggregates

Array of Aggregate objects, and each object represents one aggregate function. The absence of this item means there is no Aggregate function.

aggregates

... aggregate

Aggregate operation.

SUM

... depends

List of index key positions the GROUP BY expression depends on, starting with 0.

2

(because it’s the 3rd item)

... expr

Group expression or an aggregate expression.

"abs(round(cover (((`travel-sample`.`geo`).`lat`))))"

... id

Unique ID given internally and will be used in index_projection

4

... keypos

Key Position to use the Index expr or the query expr.

  • A value > -1 means the group key exactly matches the corresponding index keys, where 0 is the 1st index key.

  • A value of -1 means the group key does not match the index key and uses the query expression instead.

2

(because the query’s 3rd key exactly matches the index’s 3rd key)

depends

List of index key positions the GROUP BY expression depends on, starting with 0.

0, 1, 2

group

Array of GROUP BY objects, and each object represents one group key. The absence of this item means there is no GROUP BY clause.

group

... depends

Index key position of a single GROUP BY expression, starting with 0.

0

(because it’s the 1st GROUP BY key)

... expr

Single GROUP BY expression.

"cover ((`travel-sample`.`state`))"

... id

Unique ID given internally and will be used in index_projection.

0

... keypos

Key Position to use the Index expr or the query expr.

  • A value > -1 means the group key exactly matches the corresponding index keys, where 0 is the 1st index key.

  • A value of -1 means the group key does not match the index key and uses the query expression instead.

0

(because it matches the 1st key in the index expression)

... depends

Index key position of a single GROUP BY expression, starting with 0.

0

(because it’s the 1st GROUP BY key)

... expr

Single GROUP BY expression.

"cover ((`travel-sample`.`state`))"

... id

Unique ID given internally and will be used in index_projection.

1

... keypos

Key Position to use the Index expr or the query expr.

  • A value > -1 means the group key exactly matches the corresponding index keys, where 0 is the 1st index key.

  • A value of -1 means the group key does not match the index key and uses the query expression instead.

1

(because it matches the 2nd key in the index expression)