Monitor with REST
Accessing the REST API (http://<ip>:8091/pools/default
) gives an insight into computing resources consumed per node.
Node monitoring
- Node status
-
As a part of the JSON returned by the
nodeStatuses
statistics end-point, the state of the node is returned viaclusterMembership
. This allows you, on a per node basis, to monitor theactive
status to guarantee the nodes are participating in the cluster. Critical events are defined by the statusinactiveFailed
, meaning that the node has failed, and administrator intervention is needed. - System statistics
-
JSON also provides the additional basic capacity consumption statistics for CPU (
cpu_utilization
), swap (swap_used
), and free memory (mem_free
). If any of these resources shows constraints for any node in the cluster, address the individual node and evaluate if additional nodes are merited. - Couchbase specific statistics
-
The final section provides additional insight into resources consumed by the individual node. The individual node disk consumed by Couchbase Server is defined by the following:
-
couch_docs_actual_disk_size
, the physical memory used in the node -
The number of background fetches
ep_bg_fetched
(data not in the cache and pulled from disk).
-
Bucket monitoring
The REST API /stats
end-point
http://<ip>:8091/pools/default/buckets/<bucket_name>/stats
provides insight in the bucket health. Keeping tabs on the bucket statistics will provide good insight into your overall application health.
- Operations Per Second (
ops
) -
This is a fundamental measure of the total number of
get
,set
,increment
, anddecrement
operations that occur for a given bucket. Whileviews
operations are not factored into the metric, they provide a very quick measure of the load per application. - Cache Miss (
ep_cache_miss_rate
) -
This metric counts the ratio of requested objects found in the cache compared to what is needed to be fetched from the disk. For example, if ten requests entered the database and one request needed to be retrieved from disk, the miss rate would be 10%. What is the right cache miss rate for a given cluster? It depends on how much the application expects the database to hold in memory.
- Fragmentation (
couch_docs_fragmentation
) -
Couchbase Server stores data in an append-only format on disk. As a result, monitoring fragmentation within a cluster is important especially if auto-compaction is set on a scheduled basis. This metric can provide insight into the running schedule to estimate whether it is running long enough and on a frequent enough basis to keep your database healthy.
- Working Set (
ep_bg_fetched and vb_active_resident_items_ratio
) -
You can use the
ep_cache_miss_ratio
in conjunction with the resident items ratio and memory headroom metrics to understand if your bucket has enough capacity to store your dataset’s most requested objects in memory. More importantly, you can forecast the need for additional nodes to expand memory capacity to the cluster. - Disk Drain (
ep_queue_size
) -
One of the most important metrics to monitor, regardless of what your application is doing, is the queue drain rate. Keep careful and watch the amount of changes pending in the queues, particularly the disk write queue. Additional related information can be found with the cbstats command line utility.
From the REST standpoint, we can monitor both how the queue fills (
ep_diskqueue_fill
) and how quickly the queue is draining (ep_diskqueue_drain
) to track the trend over time.