When monitoring enterprise service there are 3 categories of how the service is performing that we generally capture and monitor. These are;
- Performance metrics
- Events (logs)
- Service availability
Performance metrics in Cassandra is highly extensive and there is a large number that can be captured to understand how Cassandra is performing. Another key metrics that also must be captured in order to effectively understand the performance of a database is the system resource utilisations.
AxonOps agent captures both Cassandra and OS metrics and pushes them to the AxonOps server.
Cassandra event logs are, by default, written to log files. There are important information in the log files that allows SRE's and DevOps engineers to identify issues when they occur. AxonOps agent captures the logs and pushes them to the AxonOps server. These logs are visible within AxonOps dashboard allowing quick access to them without having to log in to the individual servers.
Checking the momentary service availability and dashboards gives confidence that all services are running correctly as expected. Example service checks that allow engineers to gain confidence in the service availability are:
- System process
- Network open ports - e.g. CQL and storage ports
- Database availability - e.g. can execute CQL query
AxonOps implements all three types of monitoring described above. AxonOps agent captures the information, sends them securely to AxonOps server, and the information is stored in the backend data store.
AxonOps GUI provides comprehensive set of metrics dashboards combined with the event log view. It also provides separate service check status view showing the health of the cluster.
This section describes how the AxonOps GUI organises the dashboards of all three types of monitoring.