Using Nagios to Monitor Test Systems
Nagios is a well known tool with operations teams. It is used to monitor all kinds of operational parameters – from simple machine up/down monitoring to detailed data collection. However, I have rarely seen this useful tool used in test environments. Here are three ways Nagios can provide benefits to a test team.
First, just using Nagios to monitor whether test systems are up and running can provide useful information and possible time savings to a test team. Knowing that a database server has gone down might save the entire test team time and frustration from tracking down "bugs" which are just the result of a machine outage.
Second, consider basic CPU and memory utilization monitoring of all test systems. This data can be collected and graphed with a variety of tools. I have had success using RRDTool and nagiosgraph. This toolset allows the team to see the variation of CPU utilization, memory utilization and whatever else you decide to measure over time. This view may allow the team to spot potential performance or scaling issues long before formal performance testing begins.
Finally, consider writing your own plugins for measurements unique to the system under test. Once you start doing this, you will discover all kinds of things Nagios could be used for to aid in not only monitoring the test environment but actually testing the application. For example, I once wrote a plugin that would run a database query to verify the number of record processed in the last 15 minutes. I set appropriate thresholds. Weeks later, I received a monitor email alerting me that no records had been processed in the last 15 minutes. Even though I was not testing that part of the system, I immediately knew we had a major issue with the latest build.