Monitor with cbstats
You can query individual nodes for statistics either through a client that uses binary protocol or through the cbstats
utility.
Software developers and system administrators seeking lower level statistic information can retrieve them through the cbstats command line utility, which provides deep insight into what occurs within a cluster.
Below are some of the key metrics available through cbstats
and their meaning:
-
By tracking the number of open and rejected connections via
curr_connections
andrejected_conns
statistics, we can understand if any connection requests were rejected by this node. -
Each time an object is requested by an application and not found in the cache, Couchbase Server will find the object on disk.
This cache miss requires a background fetch and is a measured per item fetched from disk by
ep_bg_fetched
. If you are managing to a 100% working set, this could be a sign of a cluster under stress. The cache miss might not be an issue if you have a smaller working set. In either scenario, gaining an understanding of what’s typical in an environment is important as a large increase will provide an early warning signal. -
The number of items queued for persistence is an important area to monitor so that you understand if your cluster has adequate I/O resources for your application.
While your application will always be served via the caching tier, one of the great benefits of Couchbase Server is its ability to provide data durability by persisting to disk. If this asynchronous operation becomes overloaded, it can impact the application performance. As a result, especially in heavy write systems, it will be important that you monitor
ep_queue_size
andep_flusher_todo
. You would never want to get to 1 million items in the queue, and you would likely want to flag a warning around 500000 to 800000, especially if this is an upward trend over time. -
Following the
vb_num_eject_replicas
statistic gives you the number of times when Couchbase Server ejected replica values from memory.This statistics indicates that a specific bucket has reached its low watermark. While simply reaching this threshold might not be problematic as the cluster is simply freeing memory resources, it might be a problem if you consistently see this behavior or if it is an increasing trend. More importantly, this is a way to head off out of memory (
ep_oom_errors
andep_tmp_oom_errors
) scenarios, which you never want to see in your production clusters. -
The Couchbase Server design avoids stale cache scenarios by performing a warmup process during the node startup.
Warmup is the task of reading objects from the disk and pre-loading the cache. Monitoring the warmup provides visibility into how quickly a node will complete its startup process and be available to support load within the cluster. The warmup is complete when
ep_warmup_value_count
is equal tovb_active_curr_items
. However, getting more granular information can be provided byep_warmup_state
.Below are the seven warmup states. A node will not complete until its status is
done
.-
Initial
: Start warmup processes. -
EstimateDatabaseItemCount
: Estimating database item count. -
KeyDump
: Begin loading keys and metadata into RAM, but not documents. -
CheckForAccessLog
: Determine if an access log is available. This log indicates which keys have been frequently read or written. -
LoadingAccessLog
: Load information from the access log. -
LoadingData
: The server is loading data first for keys listed in the access log. If no log is available, it loads data based on keys found during the Key Dump phase. -
Done
: The server is ready to handleread
andwrite
requests.
-
-
Since Couchbase Server is not only a NoSQL persistence engine but also a cache, you need to understand memory consumption of the Couchbase Server
ep_engine
. This can be monitored by the optionmem_used
.
For example:
> cbstats localhost:11210 all auth_cmds: 9 auth_errors: 0 bucket_conns: 10 bytes_read: 246378222 bytes_written: 289715944 cas_badval: 0 cas_hits: 0 cas_misses: 0 cmd_flush: 0 cmd_get: 134250 cmd_set: 115750 …