Couchbase

Use the following procedure for monitoring the Couchbase database.

  1. Navigate to Database monitoring on the left pane.
  2. Select Couchbase from the Select database drop-down list.
  3. Select the cluster from the Select cluster drop-down list.

The following screen appears along with the various available tabs.

Add Alert

Cluster Overview

The Cluster Overview tab is selected by default. The following metrics are displayed in graph format.

Metric NameDescription
Cluster CPU usageCPU utilization of cluster
Cluster RAM usageMemory utilization of cluster
Disk usageDisk usage of cluster
DB up timeAvailability and uptime of the cluster
DB down timeAny period during which a cluster is unavailable or inaccessible

Node

Clicking the Node tab displays the following metrics.

Metric NameDescription
CPU usageCPU utilization of node
RAM usageMemory utilization of node
Disk usageDisk usage of cluster node
Swap memoryAmount of swap space used
Node operationsIncludes ops, number of hits and cmd get
Current itemsNumber of current items at node level
Node replicaNumber of node replicas
Docs data size
MCD memoryThe allocated MCD memory

Queries

Clicking the Queries tab displays the following metrics.

Metric NameDescription
RequestsNumber of request per node
Request time distributionRequest mean, median, 95 percentile, and 99 percentile
Request thread countNumber of request thread counts
GC countGarbage collection count

Select the threshold from the Query threshold drop-down list.

The following screen is displayed with the query details.

Query threshold
note

You can download the report in PDF format by clicking Download report.

Bucket

Enter the name of a bucket in the Search bucket drop-down list. The following metrics are displayed.

Metric NameDescription
Bucket opsBucket operations
Disk usedBucket disk used
Mem usedBucket memory used
Current itemsBucket current items
Database timeThe expiry duration of documents within a bucket

Metrics

Metric NameDescription
Active connectionsRefers to the number of client connections to the Couchbase cluster.
Failed connectionsRefers to a client initiated request to connect to a server failed.
Total transactionsProvides a summary count of completed operations.
Average transaction rateQuantifies the average number of transactions processed by a system per unit of time. It's a measure of system throughput and efficiency.
Events processed per secondIt is related to the Eventing Service, which explicitly processes document mutation events.
Average execution time per event (ms)This metric measures the average duration that a single Couchbase Eventing Function takes to complete its execution from the moment it is triggered by a document mutation until it finishes processing.
Total mutationsrepresents the cumulative sum of all operations that result in a change to the data stored within a Couchbase bucket or across the entire cluster. This metric quantifies the total write activity over a given period.
Failed event countRepresents the cumulative number of times a Couchbase Eventing Function failed to successfully process a document mutation (event). When an Eventing Function encounters an unhandled error during its execution, or if it explicitly throws an exception that isn't caught, it increments this counter.
Eventing on update failureIt tracks instances where an Eventing Function attempts to perform an update operation on a document within Couchbase itself as part of its logic, and that particular update operation fails.
Eventing on delete failureIt tracks instances where an Eventing Function attempts to perform a delete operation on a document within Couchbase itself as part of its logic, and that particular delete operation fails.
Eventing worker restart countMeasures the cumulative number of times an Eventing worker process has unexpectedly terminated and subsequently restarted on an Eventing node.
Worker thread utilizationDescribes the percentage of time that worker threads within a system are actively engaged in processing tasks, as opposed to being idle or waiting. It's a key indicator of how efficiently a system's concurrent processing capabilities are being used.
Eventing memory usageQuantifies the amount of Random Access Memory (RAM) currently being consumed by the Couchbase Eventing Service and its associated worker processes on an Eventing node.
Timer execution countRepresents the cumulative number of times an Eventing Timer has successfully "fired" and triggered the execution of its associated Eventing Function.
System log errorsAccumulates the number of log entries that have a severity level of "ERROR" or above (e.g., FATAL, CRITICAL, ALERT, EMERGENCY, although "ERROR" is the most common highest level for operational issues).
Cache hit ratioRepresents the percentage of document read requests that were successfully retrieved directly from the Couchbase Data Service's in-memory cache (the managed cache), without needing to access the underlying disk.
Disk read bytesAccumulates the total number of bytes that the system (or a specific process/application) has requested and successfully received from physical disk devices (HDDs, SSDs, NVMe drives)
Disk write bytesAccumulates the total number of bytes that the system has sent and successfully committed to physical disk devices.
Log monitoringLog monitoring involves the systematic collection, parsing, analysis, and alerting on events recorded in Couchbase's log files.

Security

Metric NameDescription
Failed loginsCounts each instance where a user (human or application service account) attempts to connect to and authenticate with the Couchbase cluster, but the authentication process fails.
Delete operationsCounts each successful invocation of a delete operation on a document within a Couchbase bucket. When an application or a Couchbase internal process (like expiration) removes a document, this metric is incremented.
DML operationsQuantifies the total number of operations performed that modify the data stored within a database. It specifically refers to the set of commands or actions used to manipulate the data.
Auth failed countIncrements every time a client tries to establish a connection or perform an operation requiring authentication, but the authentication process fails.
Auth success countIncrements every time a client successfully authenticates with the Couchbase cluster. This means the provided credentials are valid, and the client has been granted access.

Service

Metric NameDescription
XDCR throughputMeasures the number of documents or the total bytes of data that are successfully replicated from the source cluster to the target cluster per unit of time. It provides an indication of the speed and efficiency of the data synchronization process.
XDCR latencyMeasures the time difference, typically in milliseconds or seconds, for a change to propagate from one cluster to another. It's a critical indicator of how "fresh" the data on the target cluster is relative to the source.
XDCR replication queue sizeMeasures the size of the internal queue within the XDCR pipeline. When a document mutation occurs on the source cluster, it's added to this queue to await transmission and application on the target cluster.
XDCR bandwidth usageMeasures the actual amount of network bandwidth being consumed by the XDCR process.
XDCR data replicatedThis metric is a cumulative counter. Each time a document mutation is successfully applied to the target cluster as a result of XDCR, the size of that document is added to this metric.
XDCR pipeline errorsA counter that increments each time a document or a batch of documents fails to replicate through the XDCR pipeline due to an error.
XDCR active replicationsCounts the number of configured XDCR replication streams that are in a "running" or "active" state. An active replication means that the XDCR process is attempting to detect changes on the source cluster and propagate them to the target cluster.
DCP connectionsCounts how many entities are currently connected to a specific Data Service node via DCP to receive a continuous stream of mutations.
DCP buffer ratioRepresents the percentage of the allocated DCP buffer space that is currently being used.
DCP back offMeasures the cumulative time (in milliseconds or seconds) that the DCP stream has been paused or backed off due to these flow control mechanisms over a given period.
DCP flow control buffer usageMeasures the current fill level of this buffer. It is paramount for maintaining the performance and stability of your Couchbase cluster.
DCP mutation rateMeasures the number of individual document mutations that a specific Data Service node is sending out over its DCP streams per second. It reflects the output rate of the DCP pipeline from that node.