Introduction
============

Overview
--------

This Reference Architecture is focussed on OpenStack as the Virtualised
Infrastructure Manager (VIM) chosen based on the criteria laid out in
the Cloud Infrastructure Reference Model :cite:p:`refmodel`
(referred to as "Reference Model" or "RM" in the document).
OpenStack :cite:p:`openstack` has the advantage of being a
mature and widely accepted open-source technology; a strong ecosystem of
vendors that support it, the OpenInfra Foundation for managing the
community, and, most importantly, it is widely deployed by the global
operator community for both internal infrastructure and external facing
products and services. This means that resources with the right skill
sets to support a Cloud Infrastructure (or Network Function Virtualisation
Infrastructure, NFVI :cite:p:`etsinfvinf`) are available.
Another reason to choose OpenStack is that it has a large active
community of vendors and operators, which means that any code or
component changes needed to
support the Common Telco Cloud Infrastructure requirements can be
managed through the existing project communities' processes to add and
validate the required features through well-established mechanisms.

Vision
~~~~~~

This Reference Architecture specifies OpenStack based Cloud
Infrastructure for hosting NFV workloads, primarily VNFs
(Virtual Network Functions). The
Reference Architecture document can be used by operators to deploy
Anuket conformant infrastructure; hereafter, "conformant" denotes that
the resource can satisfy tests conducted to verify conformance with this
reference architecture.

Use Cases
---------

Several NFV use cases are documented in OpenStack. For more examples and
details refer to the OpenStack Use cases :cite:p:`openstackuc`.

Examples include:

-  **Overlay networks**: The overlay functionality design includes
   OpenStack Networking in Open vSwitch :cite:p:`ovs`
   GRE tunnel mode. In this
   case, the layer-3 external routers pair with VRRP, and switches pair
   with an implementation of MLAG to ensure that you do not lose
   connectivity with the upstream routing infrastructure.

-  **Performance tuning**: Network level tuning for this workload is
   minimal. Quality of Service (QoS) applies to these workloads for a
   middle ground Class Selector depending on existing policies. It is
   higher than a best effort queue, but lower than an Expedited
   Forwarding or Assured Forwarding queue. Since this type of
   application generates larger packets with longer-lived connections,
   you can optimise bandwidth utilisation for long duration TCP. Normal
   bandwidth planning applies here with regards to benchmarking a
   session's usage multiplied by the expected number of concurrent
   sessions with overhead.

-  **Network functions**: are software components that support the
   exchange of information (data, voice, multi-media)
   over a system's network. Some of these workloads
   tend to consist of a large number of small-sized packets that are
   short lived, such as DNS queries or SNMP traps. These messages need
   to arrive quickly and, thus, do not handle packet loss. Network
   function workloads have requirements that may affect configurations
   including at the hypervisor level. For an application that generates
   10 TCP sessions per user with an average bandwidth of 512 kilobytes
   per second per flow and expected user count of ten thousand (10,000)
   concurrent users, the expected bandwidth plan is approximately 4.88
   gigabits per second. The supporting network for this type of
   configuration needs to have a low latency and evenly distributed load
   across the topology. These types of workload benefit from having
   services local to the consumers of the service. Thus, use a
   multi-site approach, as well as, deploying many copies of the
   application to handle load as close as possible to consumers. Since
   these applications function independently, they do not warrant
   running overlays to interconnect tenant networks. Overlays also have
   the drawback of performing poorly with rapid flow setup and may incur
   too much overhead with large quantities of small packets and
   therefore we do not recommend them. QoS is desirable for some
   workloads to ensure delivery. DNS has a major impact on the load
   times of other services and needs to be reliable and provide rapid
   responses. Configure rules in upstream devices to apply a
   higher-Class Selector to DNS to ensure faster delivery or a better
   spot in queuing algorithms.

OpenStack Reference Release
---------------------------

This Reference Architecture document conforms to the OpenStack
Wallaby :cite:p:`wallaby` release.
While many features and capabilities are conformant with many OpenStack
releases, this document will refer to features, capabilities and APIs
that are part of the OpenStack Wallaby release. For ease, this
Reference Architecture document version can be referred to as "RA-1 OSTK
Wallaby."

Principles
----------

Architectural principles
~~~~~~~~~~~~~~~~~~~~~~~~

This Reference Architecture for OpenStack based Cloud Infrastructure must obey the following
set of architectural principles:

#. **Open-source preference:** To ensure, by building on
   technology available in open-source projects, that
   suppliers’ and operators’ investment have a tangible pathway
   towards a standard and production ready Cloud Infrastructure
   solution portfolio.
#. **Open APIs:** To enable interoperability and component
   substitution, and minimise integration efforts by using openly
   published API definitions.
#. **Separation of concerns:** To promote lifecycle independence of
   different architectural layers and modules (e.g. disaggregation of
   software from hardware).
#. **Automated lifecycle management:** To minimise costs of the
   end-to-end lifecycle, maintenance downtime (target zero downtime),
   avoid errors and discrepancies resulting from manual processes.
#. **Automated scalability:** To minimise costs and operational
   impacts through automated policy-driven scaling of workloads by
   enabling automated horizontal scalability of workloads.
#. **Automated closed loop assurance:** To minimise operational
   costs and simplify Cloud Infrastructure platform operations by
   using automated fault resolution and performance optimisation.
#. **Cloud nativeness:** To optimise the utilisation of resources
   and enable operational efficiencies.
#. **Security compliance:** To ensure the architecture follows
   the industry best security practices and is at all levels compliant
   to relevant security regulations.
#. **Resilience and Availability:** To allow High Availability and
   Resilience for hosted VNFs, and to avoid Single Point of Failure.

OpenStack specific principles
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

OpenStack considers the following Four Opens essential for success:

-  Open Source
-  Open Design
-  Open Development
-  Open Community

This OpenStack Reference Architecture is organised around the three
major Cloud Infrastructure resource types as core services of compute,
storage and networking, and a set of shared services of identity
management, image management, graphical user interface, orchestration
engine, etc.

Document Organisation
---------------------

Chapter 2 defines the Reference Architecture requirements and, when
appropriate, provides references to where these requirements are
addressed in this document. The intent of this document is to address
all of the mandatory ("must") requirements and the most useful of the
other optional ("should") requirements. Chapter 3 and 4 cover the Cloud
Infrastructure resources and the core OpenStack services, while the APIs
are covered in Chapter 5. Chapter 6 covers the implementation and
enforcement of security capabilities and controls. Life Cycle Management
of the Cloud Infrastructure and VIM are covered in Chapter 7 with stress
on Logging, Monitoring and Analytics (LMA), configuration management and
some other operational items. Please note that Chapter 7 is not a
replacement for the implementation, configuration and operational
documentation that accompanies the different OpenStack distributions.
Chapter 8 addresses the conformance. It provides an automated validation
mechanism to test the conformance of a deployed cloud infrastructure to
this reference architecture. Finally, Chapter 9 identifies certain Gaps
that currently exist and plans on howto address them (for example,
resources autoscaling).

Terminology
-----------

**Abstraction:** Process of removing concrete, fine-grained or
lower-level details or attributes or common properties in the study of
systems to focus attention on topics of greater importance or general
concepts. It can be the result of decoupling.

**Anuket:** A LFN open-source project developing open reference
infrastructure models, architectures, tools, and programs.

**Cloud Infrastructure:** A generic term covering **NFVI**, **IaaS** and
**CaaS** capabilities - essentially the infrastructure on which a
**Workload** can be executed.
**NFVI**, **IaaS** and **CaaS** layers can be built on top of each
other. In case of CaaS some cloud infrastructure features (e.g.: HW
management or multitenancy) are implemented by using an underlying
**IaaS** layer.

**Cloud Infrastructure Hardware Profile:** defines the behaviour,
capabilities, configuration, and metrics provided by a cloud
infrastructure hardware layer resources available for the workloads.

**Host Profile:** is another term for a Cloud Infrastructure Hardware
Profile.

**Cloud Infrastructure Profile:** The combination of the Cloud
Infrastructure Software Profile and the Cloud Infrastructure Hardware
Profile that defines the capabilities and configuration of the Cloud
Infrastructure resources available for the workloads.

**Cloud Infrastructure Software Profile:** defines the behaviour,
capabilities and metrics provided by a Cloud Infrastructure Software
Layer on resources available for the workloads.

**Cloud Native Network Function (CNF):** A cloud native network function
(CNF) is a cloud native application that implements network
functionality. A CNF consists of one or more microservices. All layers
of a CNF are developed using Cloud Native Principles including immutable
infrastructure, declarative APIs, and a “repeatable deployment process”.
This definition is derived from the Cloud Native Thinking for
Telecommunications Whitepaper, which also includes further detail
and examples.

**Compute Node:** An abstract definition of a server.
A compute node can refer to a set of hardware and software that
support the VMs or Containers running on it.

**Container:** A lightweight and portable executable image that contains
software and all of its dependencies.
OCI defines **Container** as "An environment for executing
processes with configurable isolation and resource limitations. For
example, namespaces, resource limits, and mounts are all part of the
container environment." A **Container** provides operating-system-level
virtualisation by abstracting the “user space”. One big difference
between **Containers** and **VMs** is that unlike VMs, where each **VM**
is self-contained with all the operating systems components are within
the **VM** package, containers "share" the host system’s kernel with
other containers.

**Container Image:** Stored instance of a container that holds a set of
software needed to run an application.

**Core (physical):** An independent computer processing unit that can
independently execute CPU instructions and is integrated with other
cores on a multiprocessor (chip, integrated circuit die). Please note
that the multiprocessor chip is also referred to as a CPU that is placed
in a socket of a computer motherboard.

**CPU Type:** A classification of CPUs by features needed for the
execution of computer programs; for example, instruction sets, cache
size, number of cores.

**Decoupling, Loose Coupling:** Loosely coupled system is one in which
each of its components has, or makes use of, little or no knowledge of
the implementation details of other separate components. Loose coupling
is the opposite of tight coupling

**Encapsulation:** Restricting of direct access to some of an object's
components.

**External Network:** External networks provide network connectivity for
a cloud infrastructure tenant to resources outside of the tenant space.

**Fluentd:** An open-source data collector for unified
logging layer, which allows data collection and consumption for better
use and understanding of data. **Fluentd** is a CNCF graduated project.

**Functest:** An open-source project part of Anuket LFN project.
It addresses functional testing with a collection of state-of-the-art
virtual infrastructure test suites, including automatic VNF testing.

**Hardware resources:** Compute/Storage/Network hardware resources on
which the cloud infrastructure platform software, virtual machines and
containers run on.

**Huge pages:** Physical memory is partitioned and accessed using the
basic page unit (in Linux default size of 4 KB). Hugepages, typically 2
MB and 1GB size, allows large amounts of memory to be utilised with
reduced overhead. In an NFV environment, huge pages are critical to
support large memory pool allocation for data packet buffers. This
results in fewer Translation Lookaside Buffers (TLB) lookups, which
reduces the virtual to physical pages’ address translations. Without
huge pages enabled high TLB miss rates would occur thereby degrading
performance.

**Hypervisor:** a software that abstracts and isolates workloads with
their own operating systems from the underlying physical resources. Also
known as a virtual machine monitor (VMM).

**Instance:** is a virtual compute resource, in a known state such as
running or suspended, that can be used like a physical server.
It can be used to specify VM Instance or Container Instance.

**Kibana:** An open-source data visualisation system.

**Kubernetes:** An open-source system for automating deployment, scaling,
and management of containerised applications.

**Monitoring (Capability):** Monitoring capabilities are used for the
passive observation of workload-specific traffic traversing the Cloud
Infrastructure. Note, as with all capabilities, Monitoring may be
unavailable or intentionally disabled for security reasons in a given
cloud infrastructure instance.

**Multi-tenancy:** feature where physical, virtual or service resources
are allocated in such a way that multiple tenants and their computations
and data are isolated from and inaccessible by each other.

**Network Function (NF):** functional block or application that has
well-defined external interfaces and well-defined functional behaviour.
Within **NFV**, a **Network Function** is implemented in a form of
**Virtualised NF** (VNF) or a **Cloud Native NF** (CNF).

**NFV Orchestrator (NFVO):** Manages the VNF lifecycle and **Cloud
Infrastructure** resources (supported by the **VIM**) to ensure an
optimised allocation of the necessary resources and connectivity.

**Network Function Virtualisation (NFV):** The concept of separating
network functions from the hardware they run on by using a virtual
hardware abstraction layer.

**Network Function Virtualisation Infrastructure (NFVI):** The totality
of all hardware and software components used to build the environment in
which a set of virtual applications (VAs) are deployed; also referred to
as cloud infrastructure.
The NFVI can span across many locations, e.g., places where data
centres or edge nodes are operated. The network providing connectivity
between these locations is regarded to be part of the cloud
infrastructure. **NFVI** and **VNF** are the top-level conceptual
entities in the scope of Network Function Virtualisation. All other
components are sub-entities of these two main entities.

**Network Service (NS):** composition of **Network Function**\ (s)
and/or **Network Service**\ (s), defined by its functional and
behavioural specification, including the service lifecycle.

**Open Network Automation Platform (ONAP):** A LFN project developing a
comprehensive platform for orchestration, management, and automation
of network and edge computing services for network operators,
cloud providers, and enterprises.

**ONAP OpenLab:** ONAP community lab.

**Open Platform for NFV (OPNFV):** A collaborative project under
the Linux Foundation. OPNFV is now part of the LFN Anuket project.
It aims to implement, test, and deploy tools for conformance and
performance of NFV infrastructure.

**OPNFV Verification Program (OVP):** An open-source,
community-led compliance and verification program aiming to demonstrate
the readiness and availability of commercial NFV products and services
using OPNFV and ONAP components.

**Platform:** A cloud capabilities type in which the cloud service user
can deploy, manage and run customer-created or customer-acquired
applications using one or more programming languages and one or more
execution environments supported by the cloud service provider. Adapted
from ITU-T Y.3500.
This includes the physical infrastructure, Operating Systems,
virtualisation/containerisation software and other orchestration,
security, monitoring/logging and life-cycle management software.

**Prometheus:** An open-source monitoring and alerting system.

**Quota:** An imposed upper limit on specific types of resources,
usually used to prevent excessive resource consumption by a given
consumer (tenant, VM, container).

**Resource pool:** A logical grouping of cloud infrastructure hardware
and software resources. A resource pool can be based on a certain
resource type (for example, compute, storage and network) or a
combination of resource types. A **Cloud Infrastructure** resource can
be part of none, one or more resource pools.

**Simultaneous Multithreading (SMT):** Simultaneous multithreading (SMT)
is a technique for improving the overall efficiency of superscalar CPUs
with hardware multithreading. SMT permits multiple independent threads
of execution on a single core to better utilise the resources provided
by modern processor architectures.

**Shaker:** A distributed data-plane testing tool built for OpenStack.

**Software Defined Storage (SDS):** An architecture which consists of
the storage software that is independent from the underlying storage
hardware. The storage access software provides data request interfaces
(APIs) and the SDS controller software provides storage access services
and networking.

**Tenant:** cloud service users sharing access to a set of physical and
virtual resources, ITU-T Y.3500.
Tenants represent an independently manageable logical pool of
compute, storage and network resources abstracted from physical
hardware.

**Tenant Instance:** refers to an Instance owned by or dedicated for use by a single **Tenant**.

**Tenant (Internal) Networks:** Virtual networks that are internal to
**Tenant Instances**.

**User**: Natural person, or entity acting on their behalf, associated
with a cloud service customer that uses cloud services.
Examples of such entities include devices and applications.

**Virtual CPU (vCPU):** Represents a portion of the host's computing
resources allocated to a virtualised resource, for example, to a virtual
machine or a container. One or more vCPUs can be assigned to a
virtualised resource.

**Virtualised Infrastructure Manager (VIM):** Responsible for
controlling and managing the Network Function Virtualisation
Infrastructure (NFVI) compute, storage and network resources.

**Virtual Machine (VM):** virtualised computation environment that
behaves like a physical computer/server.
A **VM** consists of all of the components (processor (CPU),
memory, storage, interfaces/ports, etc.) of a physical computer/server.
It is created using sizing information or Compute Flavour.

**Virtualised Network Function (VNF):** A software implementation of a
Network Function, capable of running on the Cloud Infrastructure.
**VNFs** are built from one or more VNF Components (VNFC) and, in most
cases, the VNFC is hosted on a single VM or Container.

**Virtual Compute resource (a.k.a. virtualisation container):**
partition of a compute node that provides an isolated virtualised
computation environment.

**Virtual Storage resource:** virtualised non-volatile storage allocated
to a virtualised computation environment hosting a **VNFC**.

**Virtual Networking resource:** routes information among the network
interfaces of a virtual compute resource and physical network
interfaces, providing the necessary connectivity.

**VMTP:** A data path performance measurement tool built specifically
for OpenStack clouds.

**Workload:** an application (for example **VNF**, or **CNF**) that
performs certain task(s) for the users. In the Cloud Infrastructure,
these applications run on top of compute resources such as **VMs** or
**Containers**.

Abbreviations
-------------

.. list-table::
   :widths: 20 60
   :header-rows: 1

   * - Abbreviation/Acronym
     - Definition
   * - API
     - Application Programming Interface
   * - BGP VPN
     - Border gateway Protocol Virtual Private network
   * - CI/CD
     - Continuous Integration/Continuous Deployment
   * - CNTT
     - Cloud iNfrastructure Task Force
   * - CPU
     - Central Processing Unit
   * - DNS
     - Domain Name System
   * - DPDK
     - Data Plane Development Kit
   * - DHCP
     - Dynamic Host Configuration Protocol
   * - ECMP
     - Equal Cost Multi-Path routing
   * - ETSI
     - European Telecommunications Standards Institute
   * - FPGA
     - Field Programmable Gate Array
   * - MB/GB/TB
     - MegaByte/GigaByte/TeraByte
   * - GPU
     - Graphics Processing Unit
   * - GRE
     - Generic Routing Encapsulation
   * - GSM
     - Global System for Mobile Communications (originally Groupe Spécial Mobile)
   * - GSMA
     - GSM Association
   * - GSLB
     - Global Service Load Balancer
   * - GUI
     - Graphical User Interface
   * - HA
     - High Availability
   * - HDD
     - Hard Disk Drive
   * - HTTP
     - HyperText Transfer Protocol
   * - HW
     - Hardware
   * - IaaC (also IaC)
     - Infrastructure as a Code
   * - IaaS
     - Infrastructure as a Service
   * - ICMP
     - Internet Control Message Protocol
   * - IMS
     - IP Multimedia Sub System
   * - IO
     - Input/Output
   * - IOPS
     - Input/Output per Second
   * - IPMI
     - Intelligent Platform Management Interface
   * - KVM
     - Kernel-based Virtual Machine
   * - LCM
     - LifeCycle Management
   * - LDAP
     - Lightweight Directory Access Protocol
   * - LFN
     - Linux Foundation Networking
   * - LMA
     - Logging, Monitoring and Analytics
   * - LVM
     - Logical Volume Management
   * - MANO
     - Management ANd Orchestration
   * - MLAG
     - Multi-chassis Link Aggregation Group
   * - NAT
     - Network Address Translation
   * - NFS
     - Network File System
   * - NFV
     - Network Function Virtualisation
   * - NFVI
     - Network Function Virtualisation Infrastructure
   * - NIC
     - Network Interface Card
   * - NPU
     - Numeric Processing Unit
   * - NTP
     - Network Time Protocol
   * - NUMA
     - Non-Uniform Memory Access
   * - OAI
     - Open Air Interface
   * - OS
     - Operating System
   * - OSTK
     - OpenStack
   * - OPNFV
     - Open Platform for NFV
   * - OVS
     - Open vSwitch
   * - OWASP
     - Open Web Application Security Project
   * - PCIe
     - Peripheral Component Interconnect Express
   * - PCI-PT
     - PCIe PassThrough
   * - PXE
     - Preboot Execution Environment
   * - QoS
     - Quality of Service
   * - RA
     - Reference Architecture
   * - RA-1
     - Reference Architecture 1 (i.e., Reference Architecture for OpenStack-based Cloud Infrastructure)
   * - RBAC
     - Role-based Access Control
   * - RBD
     - RADOS Block Device
   * - REST
     - Representational state transfer
   * - RI
     - Reference Implementation
   * - RM
     - Reference Model
   * - SAST
     - Static Application Security Testing
   * - SDN
     - Software Defined Networking
   * - SFC
     - Service Function Chaining
   * - SG
     - Security Group
   * - SLA
     - Service Level Agreement
   * - SMP
     - Symmetric MultiProcessing
   * - SMT
     - Simultaneous MultiThreading
   * - SNAT
     - Source Network Address Translation
   * - SNMP
     - Simple Network Management Protocol
   * - SR-IOV
     - Single Root Input Output Virtualisation
   * - SSD
     - Solid State Drive
   * - SSL
     - Secure Sockets Layer
   * - SUT
     - System Under Test
   * - TCP
     - Transmission Control Protocol
   * - TLS
     - Transport Layer Security
   * - ToR
     - Top of Rack
   * - TPM
     - Trusted Platform Module
   * - UDP
     - User Data Protocol
   * - VIM
     - Virtualised Infrastructure Manager
   * - VLAN
     - Virtual LAN
   * - VM
     - Virtual Machine
   * - VNF
     - Virtual Network Function
   * - VRRP
     - Virtual Router Redundancy Protocol
   * - VTEP
     - VXLAN Tunnel End Point
   * - VXLAN
     - Virtual Extensible LAN
   * - WAN
     - Wide Area Network
   * - ZTA
     - Zero Trust Architecture

Conventions
-----------

The key words "must", "must not", "required", "shall", "shall not",
"should", "should not", "recommended", "may", and "optional"
in this document are to be interpreted as described in
RFC 2119 :cite:p:`rfc2119`.