Monitor with REST

Accessing the REST API (http://<ip>:8091/pools/default) gives an insight into computing resources consumed per node.

Node monitoring

Node status

As a part of the JSON returned by the nodeStatuses statistics end-point, the state of the node is returned via clusterMembership. This allows you, on a per node basis, to monitor the active status to guarantee the nodes are participating in the cluster. Critical events are defined by the status inactiveFailed, meaning that the node has failed, and administrator intervention is needed.

System statistics

JSON also provides the additional basic capacity consumption statistics for CPU (cpu_utilization), swap (swap_used), and free memory (mem_free). If any of these resources shows constraints for any node in the cluster, address the individual node and evaluate if additional nodes are merited.

Couchbase specific statistics

The final section provides additional insight into resources consumed by the individual node. The individual node disk consumed by Couchbase Server is defined by the following:

  • couch_docs_actual_disk_size, the physical memory used in the node

  • The number of background fetches ep_bg_fetched (data not in the cache and pulled from disk).

Bucket monitoring

The REST API /stats end-point

http://<ip>:8091/pools/default/buckets/<bucket_name>/stats

provides insight in the bucket health. Keeping tabs on the bucket statistics will provide good insight into your overall application health.

Operations Per Second (ops)

This is a fundamental measure of the total number of get, set, increment, and decrement operations that occur for a given bucket. While views operations are not factored into the metric, they provide a very quick measure of the load per application.

Cache Miss (ep_cache_miss_rate)

This metric counts the ratio of requested objects found in the cache compared to what is needed to be fetched from the disk. For example, if ten requests entered the database and one request needed to be retrieved from disk, the miss rate would be 10%. What is the right cache miss rate for a given cluster? It depends on how much the application expects the database to hold in memory.

Fragmentation (couch_docs_fragmentation)

Couchbase Server stores data in an append-only format on disk. As a result, monitoring fragmentation within a cluster is important especially if auto-compaction is set on a scheduled basis. This metric can provide insight into the running schedule to estimate whether it is running long enough and on a frequent enough basis to keep your database healthy.

Working Set (ep_bg_fetched and vb_active_resident_items_ratio)

You can use the ep_cache_miss_ratio in conjunction with the resident items ratio and memory headroom metrics to understand if your bucket has enough capacity to store your dataset’s most requested objects in memory. More importantly, you can forecast the need for additional nodes to expand memory capacity to the cluster.

Disk Drain (ep_queue_size)

One of the most important metrics to monitor, regardless of what your application is doing, is the queue drain rate. Keep careful and watch the amount of changes pending in the queues, particularly the disk write queue. Additional related information can be found with the cbstats command line utility.

From the REST standpoint, we can monitor both how the queue fills (ep_diskqueue_fill) and how quickly the queue is draining (ep_diskqueue_drain) to track the trend over time.