CREATE INDEX
The CREATE INDEX
statement allows you to create a secondary index.
Secondary indexes contain a filtered or a full set of keys in a given bucket.
Secondary indexes are optional but increase query efficiency on a bucket.
CREATE INDEX
is by default a synchronous operation.
so every CREATE INDEX
statement blocks until the operation finishes.
Index building starts by creating a task that is queued for index build.
After this phase, if you lose connectivity, the index build operation continues in the background.
You can also run index creation asynchronously using the defer_build
clause.
In the asynchronous mode, CREATE INDEX
starts a task to create the primary index and returns as soon as the task is queued for execution, and then the full index creation operation happens in the background.
Both GSI and View indexes provide a status field and mark index status pending.
With GSI indexer, index status continues to report "pending".
This status field and other index metadata can be queried using system:indexes
.
You can create multiple identical secondary indexes on a bucket for better index availability and place them on separate nodes using the "nodes" clause.
Array indexing enables you to create global indexes on array elements and optimize the execution of queries involving array elements. For details, see Array Indexing.
Index partitioning helps increase the query performance by dividing and spreading a large index of documents across multiple nodes, horizontally scaling out an index as needed. For details, see Index Partitioning.
RBAC Privileges
User executing the CREATE INDEX statement must have the Query Manage Index privilege granted on the keyspace/bucket. For more details about user roles, see Authorization.
Indexes cannot be built concurrently on a given bucket unless the "errors": [{"code": 12014, "msg": "error: Build Already In Progress. Bucket BUCKET_NAME. Index INDEX_NAME. Index state: pending"}] |
Syntax
CREATE INDEX index_name ON named_keyspace_ref ( expression1 [ , expression2 ] * ) WHERE filter_expressions [ USING GSI | VIEW ] [ WITH { "nodes": [ "node_name" ], "defer_build":true|false , "num_replica": num_replica_num } ];
- Arguments
-
- index_name
-
[Required] A unique name that identifies the index.
Valid GSI index names can contain any of the following characters: A-Z a-z 0-9 # _, and must start with a letter, [A-Z a-z].
The minimum length of an index name is 1 character and there is no maximum length set for an index name.
When querying, if the index name contains a '#' or '_' character, you must enclose the index name within backticks.
- named_keyspace_ref
-
An identifier that refers to the bucket name. Specifies the bucket as the source for which the index needs to be created.
You can add an optional namespace name to the keyspace name in this way:
namespace-name : keyspace-name
For example,
main:customer
indicates the customer keyspace in the main namespace. If the namespace name is omitted, the default namespace in the current session is used. - expression1, expression2, ... , expressionX
-
Refers to an attribute name or a scalar function or an ARRAY expression on the attribute. This constitutes an index-key for the index. For details about array expressions, see Array Indexing.
USING GSI
|VIEW
-
[Optional; Default is GSI]
The
USING
clause specifies the index type to use. Secondary indexes can be created using global secondary indexes (GSI) or views. WITH
-
- "nodes":["node_name"]
-
A single global secondary index can be placed on specific nodes that run the indexing service. The
nodes
option allows you to specify the nodes that the index is placed on.If
nodes
is not specified, then nodes running the index service are randomly selected to host the index, based on the number of replicas.Multiple nodes can be specified to distribute replicas of an index amongst multiple nodes, for example:
CREATE INDEX productName_index1 ON bucket_name(productName, ProductID) WHERE type="product" USING GSI WITH {"nodes":["node1:8091", "node2:8091", "node3:8091"]};
If specifying both
nodes
andnum_replica
, the number of nodes in the array must be one greater than the specified number of replicas otherwise the index creation will fail.The node names passed to the nodes
parameter must include the cluster administration port (by default 8091). For exampleWITH {"nodes": ["192.0.2.0:8091"]}
instead ofWITH {"nodes": ["192.0.2.0"]}
. - "defer_build":true | false
-
With defer_build set to TRUE, then the
CREATE INDEX
operation queues the task for building the index but immediately pauses the building of the index of type GSI. Index building requires an expensive scan operation. Deferring building of the index with multiple indexes can optimize the expensive scan operation. Admins can defer building multiple indexes and, using theBUILD INDEX
statement, multiple indexes to be built efficiently with one efficient scan of bucket data.With defer_build set to FALSE, then the
CREATE INDEX
operation queues the task for building the index and immediately kicks off the building of the index of type GSI. - "num_replica": num_replica_num
-
num_replica
specifies the number of replicas of the index to create.The indexer will automatically distribute these indexes amongst index nodes in the cluster for load-balancing and high availability purposes. When creating an index with replicas in this manner, the indexer will attempt to distribute the replicas based on the server groups in use in the cluster where possible.
If
num_replica_num
is not less than the number of index nodes in the cluster, then the index creation will fail.We recommend that you do not create (or drop) secondary indexes when any node with a secondary index role is down as this may result in duplicate index names.
Using the meta().id function
For details, see Indexing Meta Info.
Using indices for aggregates
If there is an index on the expression of an aggregate, that index may be used to satisfy the query.
For example, given the index "abv_idx
" created using the following statement:
CREATE INDEX abv_idx ON `beer-sample`(abv);
The query engine will use the index "abv_idx
" for the following query:
SELECT min(abv), max(abv) FROM `beer-sample`;
Examples
The following example creates a secondary index that contains beers with an abv
value greater than 5 on the node 192.0.2.1
:
CREATE INDEX over5 ON `beer-sample`(abv) WHERE abv > 5 USING GSI WITH {"nodes": ["192.0.2.1:8091"]};
The following example creates a secondary index on the beer-sample
bucket and then queries system:indexes
for status of the index:
CREATE INDEX `beer-sample-type-index` ON `beer-sample`(type) USING GSI; SELECT * FROM system:indexes WHERE name="beer-sample-type-index";
The following example creates the same secondary index by using the deferred build option and then queries system:indexes
for status of the index:
CREATE INDEX `beer-sample-type-index` ON `beer-sample`(type) USING GSI WITH {"defer_build":true}; SELECT * FROM system:indexes WHERE name="beer-sample-type-index";
Because the deferred build option was enabled, the output from the query on system:indexes
shows beer-sample-type-index
shows the index has not finished building ("state": "pending"
).
The following example uses the BUILD INDEX
statement to kick off the deferred build on the beer-sample-type-index
index and then queries system:indexes
for status of the index:
BUILD INDEX ON `beer-sample`(`beer-sample-type-index`) USING GSI; SELECT * FROM system:indexes WHERE name="beer-sample-type-index";
This time the query on system:indexes
shows that the index is built ("state": "online"
).