CREATE PRIMARY INDEX

The CREATE PRIMARY INDEX statement allows you to create a primary index. Primary indexes contain a full set of keys in a given keyspace. Primary indexes are optional and are only required for running ad hoc queries on a keyspace that is not supported by a secondary index.

CREATE PRIMARY INDEX is by default a synchronous operation. The CREATE PRIMARY INDEX statement blocks until the operation finishes. Index building starts by creating a task that is queued for index build. After this phase, if you lose connectivity, the index build operation continues in the background. You can also run index creation asynchronously by using the defer_build clause. In the asynchronous mode, CREATE PRIMARY INDEX starts a task to create the primary index and returns as soon as the task is queued for execution. The full index creation operation happens in the background.

Both GSI and view indexers provide a status field and mark index status pending. With the GSI indexer, index status continues to report pending. This status field and other index metadata can be queried by using system:indexes.

RBAC Privileges

Users executing the CREATE PRIMARY INDEX statement must have the Query Manage Index privilege granted on the keyspace. For more details about user roles, see Authorization.

Indexes cannot be built concurrently on a given keyspace unless the defer_build option in the CREATE PRIMARY INDEX statement is used in combination with the BUILD INDEX statement. The following error is reported if a second index creation operation is kicked off before the completion of the ongoing index creation.

"errors": {"code": 12014,
   "msg": "error: Build Already In Progress. Bucket BUCKET_NAME. Index INDEX_NAME. Index state: pending"}

You can create multiple primary indexes on a bucket for better index availability and place them on separate nodes by using the nodes clause.

Syntax

CREATE PRIMARY INDEX [index_name]
    ON named_keyspace_ref
    [ USING GSI | USING VIEW ]
    [ WITH {"nodes": ["node_name"], "defer_build":true|false}, "num_replica": num_replica_num } ];

Arguments

index_name

[Optional] The unique name that identifies the index. If a name is not specified, the default name of #primary is applied.

Valid GSI index names can contain any of the following characters: A-Z a-z 0-9 # _, and must start with a letter [A-Z a-z].

The minimum length of an index name is 1 character and there is no maximum length set for an index name.

When querying, if the index name contains a '#' or '_' character, you must enclose the index name within backticks.

Unnamed primary indexes are dropped by using the DROP PRIMARY INDEX statement, and named primary indexes are dropped by using the DROP INDEX statement.

named_keyspace_ref

Identifier that refers to the keyspace name. Specifies the keyspace as source for which the index needs to be created. You can add an optional namespace name to the keyspace name in this way:

namespace-name : keyspace-name

For example, main:customer indicates the customer keyspace in the main namespace. If the namespace name is omitted, the default namespace in the current session is used.

USING GSI | USING VIEW

The USING clause specifies the index type to use. Primary indexes can be created as global secondary indexes (GSI) or views (VIEW). If the USING clause is not specified, by default GSI is used as the indexer.

WITH options

The WITH clause specifies additional options for the GSI type primary indexes.

"nodes":["node name"]

A single primary index of type GSI can only be placed on a single node that runs the indexing service. The nodes option allows you to specify the node that the index is placed on. If nodes is not specified, one of the nodes running the index service is randomly selected to host the index.

Multiple nodes can be specified to distribute replicas of an index amongst multiple nodes.

If specifying both nodes and num_replica, the number of nodes in the array must be one greater than the specified number of replicas otherwise the index creation will fail.

The node names passed to the nodes parameter must include the cluster administration port (by default 8091). For example WITH {"nodes": ["192.0.2.0:8091"]} instead of WITH {"nodes": ["192.0.2.0"]}.

"defer_build":true|false

When defer_build is set to true, the CREATE PRIMARY INDEX operation queues the task for building the index but immediately pauses the building of the index of type GSI. Index building requires an expensive scan operation. Deferring building of the index with multiple indexes can optimize the expensive scan operation. You can defer building multiple indexes and, using BUILD INDEX statement, multiple indexes to be built efficiently with one efficient scan of keyspace data.

When defer_build is set to false, the CREATE PRIMARY INDEX operation queues the task for building the index and immediately kicks off the building of the index of type GSI.

"num_replica": num_replica_num

num_replica specifies the number of replicas of the index to create.

The indexer will automatically distribute these indexes amongst index nodes in the cluster for load-balancing and high availability purposes. When creating an index with replicas in this manner, the indexer will attempt to distribute the replicas based on the server groups in use in the cluster where possible.

If num_replica_num is not less than the number of index nodes in the cluster, then the index creation will fail.

We recommend that you do not create (or drop) secondary indexes when any node with a secondary index role is down as this may result in duplicate index names.

Example 1: Create a primary index on the travel-sample keyspace.

CREATE PRIMARY INDEX `travel-sample-primary-index` ON `travel-sample` USING GSI;

After the index creation statement comes back, system:indexes is queried for the status of the index.

SELECT * FROM system:indexes WHERE name="travel-sample-primary-index";

Example 2: Create the same primary index using the defer_build option coupled with the BUILD INDEX statement.

CREATE PRIMARY INDEX `travel-sample-primary-index` ON `travel-sample` USING GSI
    WITH {"defer_build":true};

SELECT * FROM system:indexes WHERE name="travel-sample-primary-index";

With the defer_build option enabled, the output from system:indexes shows the travel-sample-primary-index in the pending state ("state": "pending"). BUILD INDEX can be used to kick off the deferred build on the primary index:

BUILD INDEX ON `travel-sample`(`travel-sample-primary-index`) USING GSI;

SELECT * FROM system:indexes WHERE name="travel-sample-primary-index";

Primary scan timeout

For a primary index scan on any keyspace size, the query engine guarantees that the client is not exposed to scan timeout if the indexer throws a scan timeout after it has returned a greater than zero sized subset of primary keys. To complete the scan, the query engine performs successive scans of the primary index until all the primary keys have been returned. It is possible that the indexer throws scan timeout without returning any primary keys, and in this event the query engine returns scan timeout to the client.

For example, if the indexer cannot find a snapshot that satisfies the consistency guarantee of the query within the timeout limit, it will timeout without returning any primary keys.

For secondary index scans, the query engine does not handle scan timeout, and returns index scan timeout error to the client. You can handle scan timeout on a secondary index by increasing the indexer timeout setting (See manage:manage-settings/query-settings.adoc) or preferably by defining and using a more selective index.