OPNFV Barometer configuration Guide

Barometer Configuration Guide

This document provides guidelines on how to install and configure Barometer with Apex. The deployment script installs and enables a series of collectd plugins on the compute node(s), which collect and dispatch specific metrics and events from the platform.

Pre-configuration activities

Deploying the Barometer components in Apex is done through the deploy-opnfv command by selecting a scenario-file which contains the barometer: true option. These files are located on the Jump Host in the /etc/opnfv-apex/ folder. Two scenarios are pre-defined to include Barometer, and they are: os-nosdn-bar-ha.yaml and os-nosdn-bar-noha.yaml.

$ cd /etc/opnfv-apex
$ opnfv-deploy -d os-nosdn-bar-ha.yaml -n network_settings.yaml -i inventory.yaml –- debug

Hardware configuration

There’s no specific Hardware configuration required. However, the intel_rdt plugin works only on platforms with Intel CPUs.

Feature configuration

All Barometer plugins are automatically deployed on all compute nodes. There is no option to selectively install only a subset of plugins. Any custom disabling or configuration must be done directly on the compute node(s) after the deployment is completed.

Upgrading the plugins

The Barometer components are built-in in the Apex ISO image, and respectively the Apex RPMs. There is no simple way to update only the Barometer plugins in an existing deployment.

Barometer post installation procedures

This document describes briefly the methods of validating the Barometer installation.

Automated post installation activities

The Barometer test-suite in Functest is called barometercollectd and is part of the Features tier. Running these tests is done automatically by the OPNFV deployment pipeline on the supported scenarios. The testing consists of basic verifications that each plugin is functional per their default configurations. Inside the Functest container, the detailed results can be found in the /home/opnfv/functest/results/barometercollectd.log.

Barometer post configuration procedures

The functionality for each plugin (such as enabling/disabling and configuring its capabilities) is controlled as described in the User Guide through their individual .conf file located in the /etc/collectd/collectd.conf.d/ folder on the compute node(s). In order for any changes to take effect, the collectd service must be stopped and then started again.

Platform components validation

The following steps describe how to perform a simple “manual” testing of the Barometer components:

  1. Connect to any compute node and ensure that the collectd service is running. The log file collectd.log should contain no errors and should indicate that each plugin was successfully loaded. For example, from the Jump Host:

    $ opnfv-util overcloud compute0
    $ ls /etc/collectd/collectd.conf.d/
    $ systemctl status collectd
    $ vi /opt/stack/collectd.log
    

    The following plugings should be found loaded: aodh, gnocchi, hugepages, intel_rdt, mcelog, ovs_events, ovs_stats, snmp, virt

  2. On the compute node, induce an event monitored by the plugins; e.g. a corrected memory error:

    $ git clone https://git.kernel.org/pub/scm/utils/cpu/mce/mce-inject.git
    $ cd mce-inject
    $ make
    $ modprobe mce-inject
    

    Modify the test/corrected script to include the following:

    CPU 0 BANK 0
    STATUS 0xcc00008000010090
    ADDR 0x0010FFFFFFF
    

    Inject the error:

    $ ./mce-inject < test/corrected
    
  3. Connect to the controller and query the monitoring services. Make sure the overcloudrc.v3 file has been copied to the controller (from the undercloud VM or from the Jump Host) in order to be able to authenticate for OpenStack services.

    $ opnfv-util overcloud controller0
    $ su
    $ source overcloudrc.v3
    $ gnocchi metric list
    $ aodh alarm list
    

    The output for the gnocchi and aodh queries should be similar to the excerpts below:

    +--------------------------------------+---------------------+------------------------------------------------------------------------------------------------------------+-----------+-------------+
    | id                                   | archive_policy/name | name                                                                                                       | unit      | resource_id |
    +--------------------------------------+---------------------+------------------------------------------------------------------------------------------------------------+-----------+-------------+
      [...]
    | 0550d7c1-384f-4129-83bc-03321b6ba157 | high                | overcloud-novacompute-0.jf.intel.com-hugepages-mm-2048Kb@vmpage_number.free                                | Pages     | None        |
    | 0cf9f871-0473-4059-9497-1fea96e5d83a | high                | overcloud-novacompute-0.jf.intel.com-hugepages-node0-2048Kb@vmpage_number.free                             | Pages     | None        |
    | 0d56472e-99d2-4a64-8652-81b990cd177a | high                | overcloud-novacompute-0.jf.intel.com-hugepages-node1-1048576Kb@vmpage_number.used                          | Pages     | None        |
    | 0ed71a49-6913-4e57-a475-d30ca2e8c3d2 | high                | overcloud-novacompute-0.jf.intel.com-hugepages-mm-1048576Kb@vmpage_number.used                             | Pages     | None        |
    | 11c7be53-b2c1-4c0e-bad7-3152d82c6503 | high                | overcloud-novacompute-0.jf.intel.com-mcelog-                                                               | None      | None        |
    |                                      |                     | SOCKET_0_CHANNEL_any_DIMM_any@errors.uncorrected_memory_errors_in_24h                                      |           |             |
    | 120752d4-385e-4153-aed8-458598a2a0e0 | high                | overcloud-novacompute-0.jf.intel.com-cpu-24@cpu.interrupt                                                  | jiffies   | None        |
    | 1213161e-472e-4e1b-9e56-5c6ad1647c69 | high                | overcloud-novacompute-0.jf.intel.com-cpu-6@cpu.softirq                                                     | jiffies   | None        |
      [...]
    
    +--------------------------------------+-------+------------------------------------------------------------------+-------+----------+---------+
    | alarm_id                             | type  | name                                                             | state | severity | enabled |
    +--------------------------------------+-------+------------------------------------------------------------------+-------+----------+---------+
    | fbd06539-45dd-42c5-a991-5c5dbf679730 | event | gauge.memory_erros(overcloud-novacompute-0.jf.intel.com-mcelog)  | ok    | moderate | True    |
    | d73251a5-1c4e-4f16-bd3d-377dd1e8cdbe | event | gauge.mcelog_status(overcloud-novacompute-0.jf.intel.com-mcelog) | ok    | moderate | True    |
      [...]