CREATE PRIMARY INDEX
The CREATE PRIMARY INDEX
statement allows you to create a primary index.
Primary indexes contain a full set of keys in a given keyspace.
Primary indexes are optional and are only required for running ad hoc queries on a keyspace that is not supported by a secondary index.
CREATE PRIMARY INDEX
is by default a synchronous operation.
The CREATE PRIMARY INDEX
statement blocks until the operation finishes.
Index building starts by creating a task that is queued for index build.
After this phase, if you lose connectivity, the index build operation continues in the background.
You can also run index creation asynchronously by using the defer_build
clause.
In the asynchronous mode, CREATE PRIMARY INDEX
starts a task to create the primary index and returns as soon as the task is queued for execution.
The full index creation operation happens in the background.
Both GSI and view indexers provide a status field and mark index status pending.
With the GSI indexer, index status continues to report pending
.
This status field and other index metadata can be queried by using system:indexes
.
RBAC Privileges
Users executing the CREATE PRIMARY INDEX
statement must have the Query Manage Index privilege granted on the keyspace.
For more details about user roles, see
Authorization.
Indexes cannot be built concurrently on a given keyspace unless the "errors": {"code": 12014, "msg": "error: Build Already In Progress. Bucket BUCKET_NAME. Index INDEX_NAME. Index state: pending"} You can create multiple primary indexes on a bucket for better index availability and place them on separate nodes by using the |
Syntax
CREATE PRIMARY INDEX [index_name] ON named_keyspace_ref [ USING GSI | USING VIEW ] [ WITH {"nodes": ["node_name"], "defer_build":true|false}, "num_replica": num_replica_num } ];
- Arguments
-
- index_name
-
[Optional] The unique name that identifies the index. If a name is not specified, the default name of
#primary
is applied.Valid GSI index names can contain any of the following characters: A-Z a-z 0-9 # _, and must start with a letter [A-Z a-z].
The minimum length of an index name is 1 character and there is no maximum length set for an index name.
When querying, if the index name contains a '#' or '_' character, you must enclose the index name within backticks.
Unnamed primary indexes are dropped by using the DROP PRIMARY INDEX
statement, and named primary indexes are dropped by using theDROP INDEX
statement. - named_keyspace_ref
-
Identifier that refers to the keyspace name. Specifies the keyspace as source for which the index needs to be created. You can add an optional namespace name to the keyspace name in this way:
namespace-name : keyspace-name
For example,
main:customer
indicates thecustomer
keyspace in the main namespace. If the namespace name is omitted, the default namespace in the current session is used. - USING GSI | USING VIEW
-
The USING clause specifies the index type to use. Primary indexes can be created as global secondary indexes (
GSI
) or views (VIEW
). If the USING clause is not specified, by defaultGSI
is used as the indexer. - WITH options
-
The
WITH
clause specifies additional options for theGSI
type primary indexes.- "nodes":["node name"]
-
A single primary index of type
GSI
can only be placed on a single node that runs the indexing service. Thenodes
option allows you to specify the node that the index is placed on. Ifnodes
is not specified, one of the nodes running the index service is randomly selected to host the index.Multiple nodes can be specified to distribute replicas of an index amongst multiple nodes.
If specifying both
nodes
andnum_replica
, the number of nodes in the array must be one greater than the specified number of replicas otherwise the index creation will fail.The node names passed to the nodes
parameter must include the cluster administration port (by default 8091). For exampleWITH {"nodes": ["192.0.2.0:8091"]}
instead ofWITH {"nodes": ["192.0.2.0"]}
. - "defer_build":true|false
-
When
defer_build
is set to true, theCREATE PRIMARY INDEX
operation queues the task for building the index but immediately pauses the building of the index of typeGSI
. Index building requires an expensive scan operation. Deferring building of the index with multiple indexes can optimize the expensive scan operation. You can defer building multiple indexes and, usingBUILD INDEX
statement, multiple indexes to be built efficiently with one efficient scan of keyspace data.When
defer_build
is set tofalse
, theCREATE PRIMARY INDEX
operation queues the task for building the index and immediately kicks off the building of the index of typeGSI
. - "num_replica": num_replica_num
-
num_replica
specifies the number of replicas of the index to create.The indexer will automatically distribute these indexes amongst index nodes in the cluster for load-balancing and high availability purposes. When creating an index with replicas in this manner, the indexer will attempt to distribute the replicas based on the server groups in use in the cluster where possible.
If
num_replica_num
is not less than the number of index nodes in the cluster, then the index creation will fail.We recommend that you do not create (or drop) secondary indexes when any node with a secondary index role is down as this may result in duplicate index names.
Example 1: Create a primary index on the travel-sample keyspace.
CREATE PRIMARY INDEX `travel-sample-primary-index` ON `travel-sample` USING GSI;
After the index creation statement comes back, system:indexes
is queried for the status of the index.
SELECT * FROM system:indexes WHERE name="travel-sample-primary-index";
Example 2: Create the same primary index using the defer_build
option coupled with the BUILD INDEX
statement.
CREATE PRIMARY INDEX `travel-sample-primary-index` ON `travel-sample` USING GSI WITH {"defer_build":true}; SELECT * FROM system:indexes WHERE name="travel-sample-primary-index";
With the defer_build
option enabled, the output from system:indexes
shows the travel-sample-primary-index
in the pending state ("state": "pending"
).
BUILD INDEX
can be used to kick off the deferred build on the primary index:
BUILD INDEX ON `travel-sample`(`travel-sample-primary-index`) USING GSI; SELECT * FROM system:indexes WHERE name="travel-sample-primary-index";
Primary scan timeout
For a primary index scan on any keyspace size, the query engine guarantees that the client is not exposed to scan timeout if the indexer throws a scan timeout after it has returned a greater than zero sized subset of primary keys. To complete the scan, the query engine performs successive scans of the primary index until all the primary keys have been returned. It is possible that the indexer throws scan timeout without returning any primary keys, and in this event the query engine returns scan timeout to the client.
For example, if the indexer cannot find a snapshot that satisfies the consistency guarantee of the query within the timeout limit, it will timeout without returning any primary keys.
For secondary index scans, the query engine does not handle scan timeout, and returns index scan timeout error to the client. You can handle scan timeout on a secondary index by increasing the indexer timeout setting (See manage:manage-settings/query-settings.adoc) or preferably by defining and using a more selective index.