1. High Priority Traffic Path

https://wiki.opnfv.org/display/ovsnfv/OVSFV+Requirement+-+High+Priority+Traffic+Path

1.1. Problem description

A network design may need to adequately accommodate multiple classes of traffic, each class requiring different levels of service in critical network elements.

As a concrete example, a network element managed by a service provider may be handling voice and elastic data traffic. Voice traffic requires that the end-to-end latency and jitter is bounded to some numerical limit (in msec) accuracy in order to ensure sufficient quality-of-service (QoS) for the participants in the voice call. Elastic data traffic does not impose the same demanding requirements on the network (there will be essentially no requirement on jitter. For example, when downloading a large file across the Internet, although the bandwidth requirements may be high there is usually no requirement that the file arrives within a bounded time interval.

Depending on the scheduling algorithms running on the network element, frames belonging to the data traffic may get transmitted before frames belonging to the voice traffic introducing unwanted latency or jitter. Therefore, in order to ensure deterministic latency and jitter characteristics end-to-end, each network element through which the voice traffic traverses must ensure that voice traffic is handled deterministically.

Hardware switches have typically been designed to ensure certain classes of traffic can be scheduled ahead of other classes and are also over-provisioned which further ensures deterministic behavior when handling high priority traffic. However, software switches (which includes virtual switches such as Open vSwitch) may require modification in order to achieve this deterministic behavior.

1.1.1. Use Cases

  1. Program classes of service

The End User specifies a number of classes of service. Each class of service will be represented by the value of a particular field in a frame. The class of service determines the priority treatment which flows in the class will receive, while maintaining a relative level of priority for other classes and a default level of treatment for the lowest priority class of service. As such, each class of service will be associated with a priority. The End User will associate classes of service and priorities to ingress ports with the expectation that frames that arrive on these ingress ports will get scheduled following the specified priorities.

Note: Priority treatment of the classes of service cannot cause any one of the classes (even the default class) from being transferred at all. In other words, a strict priority treatment would likely not be successful for serving all classes eventually, and this is a key consideration.

  1. Forward high priority network traffic

A remote network element sends traffic to Open vSwitch. The remote network element, indicates the class of service to which this flow of traffic belongs to by modifying a pre-determined but arbitrary field in the frame as specified in Use Case 1. Some examples include the Differentiated Services Code Point (DSCP) in an IP packet or the Priority Code Point (PCP) in an Ethernet frame. The relative priority treatment that frames get processed by Open vSwitch can be guaranteed by the values populated in these fields when the fields are different. If the fields are the same, ordering is not deterministic.

For example: Packet A is sent with a DSCP value of 0 and packet B is sent with a value of 46; 0 has a lower priority than 46. Packet A arrives before packet B. If Open vSwitch has been configured as such, Packet B will be transmitted before Packet A.

1.2. Proposed change

Two possible alternative implementations are outlined below. Further prototyping will be required to determine the favoured solution.

1.2.1. Alternatives

There are currently two alternative implementations for this feature. Further prototyping will be required to determine the proposed change.

Option 1:

Figure 1 shows how an external application could be implemented that prioritizes traffic before sending to a virtual switch port. This option should not require modifications to OVS but will require more complicated management due the addition of a scheduling application.

Each OVS ingress queue should have an equivalent ingress queue in the scheduler. The scheduler has responsibility of ordering the frames in it’s own queues to ensure they respect the configured priorities.

In this model, OVS receives packets from rte_ring ports that have been provided by the scheduler.

Figure 1: Scheduling carried out by a secondary application. DPDK rings are
used to send frames to OVS.

                                         +----> To connect to an external
                                         |      application, these will
                                         |      need to be rte_rings
                                         |
                                         |           +--> Frames arrive in
                                         |           |    priority order to
    + Queue 0 (e.g. NIC queue)           |           |    OVS packet processing
    |                                    |           |    threads
    |  +  Queue 1 (e.g. rte_ring)        |           |
    |  |                                 |           |         +-> Packets processed
    |  |  +  Queue 2 (e.g. vhost)        |           |         |   in priority order
    |  |  |                              |           |         |
    |  |  |  +-------------------+       |           | +----------------+
    |  |  |  |                   |       |           | |       |        |
    |  |  |  | +-----------+     |   +-------------+ | | +------------+ |
    |  |  +----+           +---------+    Queue    +-----+ PMD Thread +----------+
    |  |     | |           |     |   +-------------+   | +------------+ |
    |  |     | |           |     |   +-------------+   | +------------+ |
    |  +-------+ Scheduler +---------+    Queue    +-----+ PMD Thread +----------+
    |        | |           |     |   +-------------+   | +------------+ |
    |        | |           |     |   +-------------+   | +------------+ |
    +----------+           +---------+    Queue    +-----+ PMD Thread +----------+
             | +-----------+     |   +-------------+   | +------------+ |
             |           |       |                     |                |
             | Scheduler |       |                     |                |
             | App       |       |                     |      OVS       |
             |           |       |                     |(vSwitch Cores )|
             |           |       |                     |                |
             +-------------------+                     +----------------+
                         |
                         |
                         |    Pluggable scheduler will
                         |    need to be a DPDK secondary
                         +--> process in order to interact
                              with OVS (primary process).
                              First implementation would be
                              a strict priority scheduler

Option 2:

Figure 2 shows how the OVS application could be modified to prioritize packets before processing. This would require an IO core in OVS to handle prioritisation of traffic coming from the rx queues in OVS.

Figure 2: Scheduling carried out by threads within OVS.


                                      +--> OVS internal ring
                                      |    structures
                                      |
                                      |     +> Frames arrive in
                                      |     |  priority order to
 + Queue 0 (e.g. NIC queue)           |     |  OVS packet processing
 |                                    |     |  threads
 |  +  Queue 1 (e.g. rte_ring)        |     |
 |  |                                 |     |       +-> Packets processed
 |  |  +  Queue 2 (e.g. vhost)        |     |       |   in priority order
 |  |  |                              |     |       |
 |  |  |  +--------------------------------------------------+
 |  |  |  |                           |     |       |        |
 |  |  |  | +-----------+   +---------+---- | +-----+------+ |
 |  |  +----+           +---+    Queue    +-+-+ PMD Thread +----------+
 |  |     | |           |   +-------------+   +------------+ |
 |  |     | |           |   +-------------+   +------------+ |
 |  +-------+ Scheduler +---+    Queue    +---+ PMD Thread +----------+
 |        | |           |   +-------------+   +------------+ |
 |        | |           |   +-------------+   +------------+ |
 +----------+           +---+    Queue    +---+ PMD Thread +----------+
          | +-+---------+   +-------------+   +------------+ |
          |   |                                              |
          |   |    OVS                                       |
          |   |(Scheduler                          OVS       |
          |   |  Core(s))                    (vSwitch Cores )|
          |   |                                              |
          +--------------------------------------------------+
              |
              |
              +--> Pluggable scheduler
                   First implementation would be
                   a strict priority scheduler

It should be noted that for both solutions, it should be possible to offload the scheduling to a capable NIC on ingress. An example of how this could be done for option 1 can be seen in Figure 3.

Figure 3: Example of how a NIC scheduler could be used to
offload scheduling

      +   Queue 0 (e.g. NIC queue)
      |
      |   +   Queue 1 (e.g. NIC queue)
      |   |
      |   |   +  Queue 2 (e.g. vhost)
      |   |   |
      |   |   |  +-------------------+                     +----------------+
      |   |   |  |                   |                     |                |
+-----+---+-+ |  | +-----------+     |   +-------------+   | +------------+ |
|           | +----+           +---------+    Queue    +-----+ PMD Thread +----------+
|    NIC    |    | |           |     |   +-------------+   | +------------+ |
| Scheduler |    | |           |     |   +-------------+   | +------------+ |
|           +------+ Scheduler +---------+    Queue    +-----+ PMD Thread +----------+
|           |    | |           |     |   +-------------+   | +------------+ |
|           +------+           |     |   +-------------+   | +------------+ |
|           |    | |           +---------+    Queue    +-----+ PMD Thread +----------+
+-----+-----+    | +-----------+     |   +-------------+   | +------------+ |
      |          |                   |                     |                |
      |          | Scheduler         |                     |                |
      +----------+ App               |                     |      OVS       |
                 |                   |                     |(vSwitch Cores )|
   Scheduler app |                   |                     |                |
   configures    +-------------------+                     +----------------+
   NIC scheduler

Other key points: * How do we handle egress? I assume we will only seal with ingress scheduling? * How do we prioritize upcalls to the slowpath? In OVS the first packets in a flow get handled by the slow path, there is no priority scheme for this data path. * We are really only implementing strict priority here. Do we need to implement other scheduling algorithms?

1.2.2. OVSDB schema impact

As the control interface may be implemented via Open vSwitch, configuration may require updates to the ovsdb schema. An example of how this could be done is presented below:

"Classes_of_Service": {
  "columns": {
    "cos_type": {
      "type": "string"},
    "cos": {
      "type": {"key": "integer", "value": "integer", "min": 0, "max": "200"}},
  }
 }

1.2.3. User interface impact

A control interface is required which allows the user to:

  • Specify the type of field used for determining the class of service. Examples are: dscp, vlan-pcp

  • Add a value of that field to a particular priority. Examples are:

    Value = 46, Priority = 0
    Value = 0, Priority = 7
    

    It should be possible to specify up to a maximum number (n) of values for each priority. All other traffic is presumed to have the lowest priority. There will be a fixed number of priorities.

  • Remove a value from a particular priority

  • List priority of a particular class of service

This configuration will be valid for all traffic being handled by the switch.

This control interface may be implemented in via Open vSwitch commands or via an external application (controlling, for example, a NIC or another piece of software).

An example of how this could be controlled via Open vSwitch commands follows:

ovs-vsctl add-cos <type> <value> <priority>

ovs-vsctl add-cos dscp 46 0

ovs-vsctl del-cos <type> <value>

ovs-vsctl del-cos dscp 46

ovs-vsctl show-cos 46

A similar interface could be used to control an external application.

1.2.4. Security impact

TBD

1.2.5. Other end user impact

TBD

1.2.6. Performance Impact

TBD

1.2.7. Other deployer impact

TBD

1.2.8. Developer impact

TBD

1.3. Implementation

1.3.1. Assignee(s)

Who is leading the writing of the code? Or is this a blueprint where you’re throwing it out there to see who picks it up?

If more than one person is working on the implementation, please designate the primary author and contact.

Primary assignee:
<email address>
Other contributors:
<email address>

1.3.2. Work Items

  • Carry out tests to determine current behaviour
  • Implement proposed solution alternative 1 as a proof point

1.4. Dependencies

TBD

1.5. Testing

In order to test how effectively the virtual switch handles high priority traffic types, the following scheme is suggested.:

+---------------------------+         Ingress Traffic Parameters
|                           |         +-------------------------------------------+
|                           |
|                           |         Packet Size: The size of the Ethernet frames
|                           |
|                           |         Tmax: RFC2544 Max. Throughput for traffic of
|                    PHY0   <-------+ "Packet Size"
|                           |
|                           |         Total Offered Rate: The offered rate of both
|                           |         traffic classes combined expressed as a % of
|                           |         Tmax
|                           |
|                           |         Ingress Rates are expressed as a percentage
|                           |         of Total Offered Rate.
|                           |
|                           |         Class A:
|             OVS           |         Ethernet PCP = 0 (Background)
|            (BR0)          |         Ingress Rate      : rate_ingress_a(n) Mfps
|                           |
|                           |         Class B:
|                           |         Ethernet PCP = 7 (Highest)
|                           |         Ingress Rate      : rate_ingress_b(n) Mfps
|                           |
|                           |         Egress Traffic Measurements
|                           |         +-------------------------------------------+
|                           |         Class A:
|                           |         Egress Throughput : rate_egress_a(n) Mfps
|                           |         Egress Latency    : max_lat_egrees_a(n) ms
|                           |         Egress Jitter     : max_jit_egress_a(n) ms
|                    PHY1   +------->
|                           |         Class B:
|                           |         Egress Throughput : rate_egress_b(n) Mfps
|                           |         Egress Latency    : max_lat_egrees_b(n) ms
+---------------------------+         Egress Jitter     : max_jit_egress_b(n) ms

Open vSwitch is configured to forward traffic between two ports agnostic to the traffic type. For example, using the following command:

ovs-ofctl add-flow br0 in_port=0,actions=output:1

The test will be carried out with the functionality to enable high-priority traffic enabled and disabled in order to guage the change in performance for both cases.

Two classes of traffic will be generated by a traffic generator. In the example above, the classes are differentiated using the Ethernet PCP field. However, another means for differentiating traffic could be used, depending the prioritization scheme that is developed.

Tests should be performed for each combination of:

  • Packet Sizes in (64, 512)
  • Total Offered Rate in (80, 120, 150)
  • rate_ingress_b(n) / rate_ingress_a(n) in (0.1, 0.2, 0.5)

For each set, the following metrics should be collected for each traffic class over a specified time period:

Egress Throughput (Mfps) Maximum Egress Latency (ms) Maximum Egress Jitter (ms)

1.6. Documentation Impact

The following documentation should be updated in OVS

  • “man” pages
  • NEWS
  • INSTALL.DPDK.md

1.7. References

Please add any useful references here. You are not required to have any reference. Moreover, this specification should still make sense when your references are unavailable. Examples of what you could include are:

  • Links to mailing list or IRC discussions
  • Links to relevant research, if appropriate
  • Related specifications as appropriate
  • Anything else you feel it is worthwhile to refer to

1.8. History

Optional section intended to be used each time the spec is updated to describe new design, API or any database schema updated. Useful to let reader understand what’s happened along the time.

Table 1.1 Revisions
Release Name Description
Colorado Introduced