9. Gaps analysis & Innovation

9.1. Introduction

9.2. OPNFV Pharos Project

For this RI, the OPNFV Pharos specification has been chosen as labs with pods of servers already exist under OPNFV’s Lab as a Service (LaaS).

The Pharos Project deals with developing an OPNFV lab infrastructure that is geographically and technically diverse. This will greatly assist in developing a highly robust and stable OPNFV platform. Community labs are hosted by individual companies and there is also an OPNFV lab hosted by the Linux Foundation that has controlled access for key development and production activities. The Pharos Specification defines a “compliant” deployment and test environment. Pharos is responsible for defining lab capabilities, developing management/usage policies and process; and a support plan for reliable access to project and release resources. Community labs are provided as a service by companies and are not controlled by Pharos however our goal is to provide easy visibility of all lab capabilities and their usage at all-times.

A requirement of Pharos labs is to provide bare-metal for development, deployment and testing. This is resource intensive from a hardware and support perspective while providing remote access can also be very challenging due to corporate IT policies. Achieving a consistent look and feel of a federated lab infrastructure continues to be an objective. Virtual environments are also useful and provided by some labs. Jira is currently used for tracking lab operational issues as well as for Pharos project activities.

9.2.1. Pharos Specification

The Pharos Specification defines a hardware environment for deployment and testing of the OPNFV platform release.

Pharos lab infrastructure has the following objectives:

  • Provides secure, scalable, standard and HA environments for feature development

  • Supports the full Euphrates deployment lifecycle (this requires a bare-metal environment)

  • Supports functional and performance testing of the Euphrates release

  • Provides mechanisms and procedures for secure remote access to Pharos compliant environments for OPNFV community

  • Deploying OpenStack in a Virtualized environment is possible and will be useful, however it does not provide a fully featured deployment and realistic test environment for the Euphrates release of OPNFV.

A pharos compliant OPNFV test-bed provides:

  • One CentOS/Ubuntu jump server (Foundation Node) which can be used to perform the OpenStack RI installation, or host any additional software needed. This server may also participate in the OpenStack cluster if desired instead of acting as a dedicated services node.

  • 5 target nodes which can be used in any combination, such as:

    • 3 controller nodes + 2 compute/storage nodes

    • 1 controller node + 4 compute/storage nodes

    • 1 controller node + 1 compute node + 3 storage nodes

  • A configured network with the ability to provide the following networks:

    • Out-of-band Management: Used for access to the lights out (ILO/IMPI/Redfish) network for the purpose of managing the bare metal aspects of the servers, such as power control, BIOS configuration, etc.

    • External (DMZ): Used to provide VMs with Internet access. Directly accessible from the VPN.

    • Provisioning / In-band Management (Admin): to perform management operations on the hypervisor software for each node. Can also be used for bootstrapping images using PXE or other installation technologies.

    • API Access (Public): Exposes all OpenStack APIs, including the OpenStack Networking API, to tenants.

    • Tenant Transport (Private): Used for VM data communication within the cloud deployment. The IP addressing requirements of this network depend on the OpenStack Networking plug-in in use and the network configuration choices of the virtual networks made by the tenant.

    • Storage Access (Storage): Exposes SDS services to client read/write requests. This is the data path for access to the content of the storage nodes, and also doubles as the storage replication network to provide data replication between storage nodes.

    • OpenStack Management (Management): Used for internal communication between OpenStack Components.

9.2.2. Hardware Specification

CPU:

  • Intel Xeon E5-2600v2 Series or newer

  • AArch64 (64bit ARM architecture) compatible (ARMv8 or newer)

Firmware:

  • BIOS/EFI compatible for x86-family blades

  • EFI compatible for AArch64 blades

Local Storage:

Below describes the minimum for the Pharos spec, which is designed to provide enough capacity for a reasonably functional environment. Additional and/or faster disks are nice to have and mayproduce a better result.

  • Disks: 2 x 1TB HDD + 1 x 100GB SSD (or greater capacity)

  • The first HDD should be used for OS & additional software/tool installation

  • The second HDD is configured for CEPH OSD

  • The SSD should be used as the CEPH journal

  • Virtual ISO boot capabilities or a separate PXE boot server (DHCP/tftp or Cobbler)

Memory:

  • 32G RAM Minimum

9.2.3. Network Specification

Network Hardware

  • 24 or 48 Port TOR Switch

  • NICs - Combination of 1GE and 10GE based on network topology options (per server can be on-board or use PCI-e)

  • Connectivity for each data/control network is through a separate NIC. This simplifies Switch Management however requires more NICs on the server and also more switch ports

  • BMC (Baseboard Management Controller) for lights-out mangement network using IPMI (Intelligent Platform Management Interface)

Network Options

  • Option I: 4x1G Control, 2x10G Data, 48 Port Switch

    • 1 x 1G for lights-out Management

    • 1 x 1G for Admin/PXE boot

    • 1 x 1G for control-plane connectivity

    • 1 x 1G for storage

    • 2 x 10G for data network (redundancy, NIC bonding, High bandwidth testing)

  • Option II: 1x1G Control, 2x 10G Data, 24 Port Switch

    • Connectivity to networks is through VLANs on the Control NIC

    • Data NIC used for VNF traffic and storage traffic segmented through VLANs

  • Option III: 2x1G Control, 2x10G Data and Storage, 24 Port Switch

    • Data NIC used for VNF traffic

    • Storage NIC used for control plane and Storage segmented through VLANs (separate host traffic from VNF)

    • 1 x 1G for lights-out mangement

    • 1 x 1G for Admin/PXE boot

    • 2 x 10G for control-plane connectivity/storage

    • 2 x 10G for data network

For this RI, Option III has been chosen.

9.3. NFR Considerations

Additional environmental specifications need to be considered when performing Non-Functional Requirement (NFR) testing, which includes performance, resiliency, and scalability, amongst other test categories not addressed through functional testing. Refer to What is Non Functional Testing? for information and examples of the various types of NFR testing.

The rational for reviewing and documenting environmental needs and specifications is that NFR-type testing introduces traffic, chaos, or instability (e.g. impulse, spike, long-duration, etc) to the environment, and if not sized properly, or contains robust equipment the test results will be undeterministic, or unreliable.

Examples of potential measurements and/or test scenarios for which NFR test tooling needs to support, or remain stable under execution includes:

  • Average (packet) Drop Rate

  • Average Latency

  • Execution of different frame size, packet path, or chain count

  • Testing single switch (VNF) packet path

  • Testing chained-switch (VNF) packet paths

  • Support SRIOV, OVS-DPDK, and VLAN configuration and/or options

Note, when it comes to NFR-tooling, the goals are to provide light-weight solutions that can be packaged within a cookbook to accelerate lab validations, agnostic to the type of hardware in the environment. This will enable third party suppliers to achieve compliance expectations for the targeted architecture.

9.3.1. Traffic Generators & NIC

Performance, or load testing, may (will) require specific NICs to achieve desired throughput (TPS, kbps, etc) to properly validate an instance-types (e.g. Basic(B)) stability when subject to traffic.

For example, the OPNFV project, NFVBench, utilizes the TRex traffic generator. While TRex offers stateful and stateless testing, achieves 200-400 Gb/sec, and captures latency/jitter measurements, there is a dependency on the type of NICs to be utilized to achieve optimal results:

  • Recommended NICs to utilize when adopting the NFVBench project (& TRex) include Intel X710 (10G), XXV710 (25G) and XL710 (40G).

Interior interface cards may result in unexpected, or degraded performance, issues, or capabilities.

When planning NFR test scenarios, the engineer needs to document the following which ensure the planned traffic generator, and target environment are satisfactory for the type(s) of test to be performed, and the measurements to be collected:

  • Desired throughput levels

  • Analysis and confirmation that throughput can be achieved with Traffic Generators chosen

  • Confirmation infrastructure (e.g. NIC, rack, switch, etc) is engineered to support the target load / traffic levels, unless testing to break- or saturation points

  • If testing to break-points, then engineering specs of supported load levels are to be documented with an understanding that a ceiling may be reached within the traffic generators before saturation of the infrastructure.