2.0 Hardware HA
The hardware HA can be solved by several legacy HA schemes. However, when
considering the NFV scenarios, a hardware failure will cause collateral damage to
not only to the services but also virtual infrastructure running on it.
A redundant architecture and automatic failover for the hardware are required
for the NFV scenario. At the same time, the fault detection and report of HW
failure from the hardware to VIM, VNFM and if necessary the Orchestrator to achieve HA in OPNFV. A
sample fault table can be found in the Doctor project. (https://wiki.opnfv.org/doctor/faults)
All the critical hardware failures should be reported to the VIM within 1s.
Other warnings for the hardware should also be reported to the VIM in a
timely manner.
General Requirements:
- Hardware Failures should be reported to the hypervisor and the VIM.
- Hardware Failures should not be directly reported to the VNF as in the traditional ATCA
architecture.
- Hardware failure detection message should be sent to the VIM within a specified period of time,
based on the SAL as defined in Section 1.
- Alarm thresholds should be detected and the alarm delivered to the VIM within 1min. A certain
threshold can be set for such notification.
- Direct notification from the hardware to some specific VNF should be possible.
Such notification should be within 1s.
- Periodical update of hardware running conditions (operational state?) to the
NFVI and VIM is required for further operation, which may include fault
prediction, failure analysis, and etc.. Such info should be updated every 60s
- Transparent failover is required once the failure of storage and network
hardware happens.
- Hardware should support SNMP and IPMI for centralized management, monitoring and
control.
Network plane Requirements:
- The hardware should provide a redundant architecture for the network plane.
- Failures of the network plane should be reported to the VIM within 1s.
- QoS should be used to protect against link congestion.
Power supply system:
- The power supply architecture should be redundant at the server and site level.
- Fault of the power supply system should be reported to the VIM within 1s.
- Failure of a power supply will trigure automatic failover to the redundant supply.
Cooling system:
- The architecture of the cooling system should be redundant.
- Fault of the cooling system should be reported to the VIM within 1s
- Failure of the cooling systme will trigger automatic failover of the system
Disk Array:
The architecture for the disk array should be redundant.
Fault of the disk array should be reported to the VIM within 1s
Failure of the the disk array will trigger automatic failover of the system
support for protected cache after an unexpected power loss.
- Data shall be stored redundantly in the storage backend
(e.g., by means of RAID across disks.)
Upon failures of storage hardware components (e.g., disks services, storage
nodes) automatic repair mechanisms (re-build/re-balance of data) shall be
triggered automatically.
Centralized storage arrays shall consist of redundant hardware
Servers:
- Support precise timming with accuracy higher than 4.6ppm