Related terms:
- Congestion Control
- Round-Trip Time
- Timeouts
- Transmission Control Protocol
- Congestion Window
- Packet Loss Probability
- Retransmission
Introduction
Subir Varma, in Internet Congestion Control, 2015
1.3.4 Window Size Decrements, Part 2: Fast Retransmit and Fast Recovery
Section 1.3.3 discusses the rules that TCP Reno uses to increment its window in the absence of network congestion. This section discusses the rules that apply to decrementing the window size in the presence of congestion. TCP uses the design philosophy that dropped packets are the best indicator of network congestion, which falls within the implicit congestion indicator category from Section 1.2.4. When TCP Slow Start was first designed, this was the only feedback that could be expected from the simpler routers that existed at that time. In retrospect, this decision came with a few issues: The only way that the algorithm can detect congestion is by driving the system into a congestive state (i.e., near the cliff of the curve in Figure 1.2). Also, if the packet losses are because of a reason other than congestion (e.g., link errors), then this may lead to unnecessary rate decreases and hence performance loss. This problem is particularly acute for error-prone wireless links, as explained in Chapter 4.
As alluded to in Section 1.3.3.2, TCP uses two different mechanisms to detect packet loss. The first technique uses the retransmit timer RTO and is discussed in Section 1.3.2. The second technique is known as Fast Retransmit (Figure 1.7) and tries to make the system more responsive to packet losses. It works as follows:
Figure 1.7. Illustration of fast retransmit via duplicate ACKs.
- •
If the SN field in a packet at the receiver does not match the NSN counter, then the receiver immediately sends an ACK packet, with the ACK field set to NSN. The receiver continues to do this on subsequent packets until SN matches NSN (i.e., the missing packet is received).
- •
As a result of this, the sender begins to receive more than one ACK with the same ACK number; these are known as duplicate ACKs. The sender does not immediately jump to the conclusion that the duplicate ACKs are caused by a missing packet because in some cases, they may be caused by packets getting reordered before reception. However, if the number of duplicate ACKs reaches three, then the sender concludes that the packet has indeed been dropped, and it goes ahead and retransmits the packet (whose SN=duplicate ACK value) without waiting for the retransmission timer to expire. This action is known as Fast Retransmit.
- •
Just before the retransmission, the sender sets the ssthresh value to half the current window size W, and after the retransmission, it sets W←ssthresh+3.MSS. Each time another duplicate ACK arrives, the sender increases W by the packet size, and if the new value of W allows, it goes ahead and transmits a new packet. This action is known as Fast Recovery.
The justification for this is the following: When the receiver finally receives the missing packet, the next ACK acknowledges all the other packets that it has received in the window, which on the average can be half the transmit window size. When the sender receives this ACK, it is allowed to transmit up to half a window of back-to-back packets (because W is now at half the previous value), which can potentially overwhelm the buffers in the bottleneck node. However, by doing FAST Recovery, the sender keeps sending additional packets while it waits for the ACK for the missing packet to arrive, and this prevents the sudden onslaught of new packets into the network. When the ACK for the missing packet arrives, W is set back to the ssthresh value in the previous step. This behavior can be seen in Figure 1.8; note that the window size shows a spike of 3 MSS on error detection and falls back to W/2 after the missing packet is received.
Figure 1.8. Illustration of Fast Recovery.
The Fast Retransmit mechanism is able to efficiently recover from packet losses as long as no more than one packet is lost in the window. If more than one packet is lost, then usually the retransmit timer for the second or later expires, which triggers the more drastic step of resetting W back to one packet.
View chapterPurchase book
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780128035832000013
Packet Queueing and Scheduling
Deep Medhi, Karthik Ramasamy, in Network Routing (Second Edition), 2018
17.2.3 Fast Retransmit and Fast Recovery
The algorithms described so far, AIMD and slow start, are considered the main TCP congestion control mechanisms. In this section, we consider relatively newer features.
In early implementations of TCP, the sender retransmitted an unacknowledged packet only after the expiration of its timeout. It was observed that the coarse-grained implementation of TCP timers led to long periods of time during which the connection went dead. Thus, a new mechanism called fast retransmit was introduced that allows the retransmission of a lost packet even if the timeout for that packet has not expired. The fast retransmit mechanism does not replace the timeout mechanism of TCP, but it works in parallel to improve performance.
The idea behind fast retransmit is straightforward. Every time a packet arrives out of order because the previous packet was lost or delayed, the receiver sends an acknowledgment that is the same as the one sent the last time. The subsequent transmissions of the same acknowledgment is called a duplicate ACK. When the sender detects a duplicate ACK, it knows that the receiver must have received a packet out of order, implying that the earlier packet was lost or delayed. To detect reliably the packets that are lost, the sender waits until it sees some number of duplicate ACKs before retransmitting the missing packet. In practice, the sender waits until it has seen three duplicate ACKs, then retransmits the packet without waiting for its timer to expire.
Finally, there is another improvement we can make. Using the fast retransmit mechanism the sender detects a possible loss of a transmitted packet, implying congestion, and therefore, it is necessary to reduce its congestion window accordingly, after the transmission of the lost packet. However, when a fast retransmit algorithm is used and when duplicate ACKs are received by the sender, it means that the packets are still flowing to the receiver. How can it be safely deduced? The generation of duplicate ACKs at the receiver is triggered only on a packet arrival. This indicates that serious network congestion may not exist and the lost packet could be a transient condition. Thus, the sender, instead of reducing its transmission rate sharply using slow start, decreases the congestion window by half and performs only the additive increase part.
View chapterPurchase book
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B978012800737200020X
Historic Evolution of QoS Solutions
XiPeng Xiao, in Technical, Commercial and Regulatory Challenges of QoS, 2008
Transport-Layer and Application-Layer Solutions
Intelligence at the transport and application layers is very useful for providing good service quality for the applications. When QoS is narrowly scoped as using Diffserv and other traffic management schemes to provide some better-than-Best-Effort services, such intelligence may not be considered relevant to QoS. But with the QoS definition in this book, transport- and application-layer intelligence are relevant to QoS. It comprises the end-point part of a QoS solution.
Below we provide a few examples on how end-point intelligence can help the applications improve performance.
Example 1: TCP congestion control
Packet loss from network congestion can cause some received packets to become useless and create the need to retransmit even more packets. In some circumstances, this can form a vicious cycle and cause congestion collapse where the throughput of applications approaches zero while the networks become fully loaded. TCP uses a number of mechanisms to achieve high performance and avoid congestion collapse. These mechanisms include four intertwined algorithms: slow-start, congestion avoidance, fast retransmit, and fast recovery [RFC2581]. Simply put, with TCP, when a packet is received, the receiver will send an acknowledgement.2 Therefore, if the sender gets acknowledgements back in a timely manner, the sender knows that the network and the receiver are in good condition. Therefore, the sender can increase its sending rate. In contrast, lack of acknowledgement may indicate some problems. The sender will reduce its sending rate. This simple mechanism helps TCP hunt for the optimal sending rate at the moment. It is very useful for the applications running over TCP.
Example 2: HTTP 1.1 persistent connection and pipelining
Today's web browsing is built upon a protocol called Hypertext Transfer Protocol (HTTP). The current version of HTTP, Version 1.1, made numerous improvements over previous versions [HTTPdif]. Here we single out persistent connection and pipelining because their effect is easier to understand.
HTTP uses TCP as its transport protocol. The original HTTP design used a new TCP connection for each request. Therefore each request incurs the cost of setting up a new TCP connection (at least one round-trip time across the network, plus several overhead packets). Since most web interactions are short (the median response message size is about 4 Kbytes [Mogul]), the TCP connections seldom get past the “slow-start” region [Jacobson] and therefore fail to maximize their use of the available bandwidth.
Web pages frequently have many embedded images. Before Version 1.1, each image was retrieved via a separate HTTP request. The use of a new TCP connection for each image serialized the display of the entire page. This made the loading of web pages with many embedded objects very slow. This contributed to Bob Metcalfe's prediction in December 1995 that the Internet would collapse in 1996.
To resolve these problems, HTTP 1.1 introduced the use of persistent connections and the pipelining. Persistent connection means that a single TCP connection can be used for multiple HTTP requests. This eliminates the need to set up many TCP connections. Pipelining means that a client can send an arbitrary number of requests over a TCP connection before receiving any of the responses. This eliminates the serialization of requests and enables faster response for web browsing.
Example 3: End-point jitter buffer
By having a jitter buffer, the end-point application can significantly reduce the end-to-end jitter (in other words, delay variation) for the application, thus providing better performance for applications that are sensitive to jitter.
Example 4: Firefox browser
For web pages with many pictures, the Firefox browser [Mozilla] loads at least twice as fast as Internet Explorer. So with exactly the same network QoS, the end-user perception on service quality is much better.
Although the focus of this book is on network QoS, it is important to point out that end-point intelligence can be very useful to enhance user experience. To some extent, this can be an alternative or supplement to network QoS, because end users will not care whether the good performance comes from network QoS or end-point intelligence. The philosophy that a network should be simple and intelligence should be at the end points is generally referred to as the “End to End Principle,” originally expressed in [Saltzer]. In reality, end-point intelligence is widely used. Virtually every popular application in the Internet employs some sort of intelligence for enhancing performance.
View chapterPurchase book
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B9780123736932000033
Mobile Network and Transport Layer
Vijay K. Garg, in , 2007
14.5 Transmission Control Protocol
The TCP [31, 32] is the connection-oriented transport layer protocol designed to operate on the top of the datagram network layer IP. The two widely used protocols are known under the collective name TCP/IP. TCP provides a reliable end-to-end byte stream transport. The segmentation and reassembly of the messages are handled by IP, not by TCP.
TCP uses the selective repeat protocol (SRP) with positive acknowledgments and time-out. Each byte sent is numbered and must be acknowledged. A number of bytes can be sent in the same packet, and the acknowledgment (ACK) then indicates the sequence number of the next byte expected by the receiver. ACK carrying sequence number m provides acknowledgment for all packets up to, and including, packets with sequence number m −1. If a packet is lost, the receiver sends duplicate ACK for a subsequent correctly received packet.
The TCP header is at least 20 bytes, and has 16 error detection bits for the data and the header. The error detection bits are calculated by summing the 1's complements of the groups of 16 bits that make up the data and the header, and by taking the 1's complement of that sum. The number of data that can be sent before being acknowledged is the window size (Wmax) which can be adjusted by either the sender or the receiver to control the flow based on the available buffers and the congestion. Initial sequence numbers are negotiated by means of a three-way handshake at the outset of connection. Connections are released by means of a three-way handshake.
TCP transmitter (Tx) uses an adaptive window based transmit strategy. Tx does not allow more than Wmax unacknowledged packets outstanding at any given time. With the congestion window lower limit at time t equal to X(t), packets up to X(t) −1 have been transmitted and acknowledged. Tx can send starting from X(t). X(t) has a nondecreasing sample path. With congestion window width at time t equal to W(t), this is the amount of packets Tx is allowed to send starting with X(t). W(t) can increase or decrease (because of window adaptation), but never exceed Wmax. Transitions in X(t) and W(t) are triggered by receipt of ACK. Receiver (Rx) of an ACK increases X(t) by an amount equal to the amount of data acknowledged. Changes in W(t), however, depend on the version of TCP and the congestion control process.
Tx starts a timer each time a new packet is sent. If the timer reaches a round trip time-out (RTO) value before the packet is acknowledged, a time-out occurs. Retransmission is initiated on time-out. RTO value is derived from a round trip timer estimation procedure. RTO is sent only in multiples of a timer granularity.
The window adaptation procedure is as follows:
- 1.
Slow start phase. At the beginning of the TCP connection, the sender enters the slow start phase, in which window size is increased by 1 maximum segment size (MSS) for every ACK received; thus, the TCP sender window grows exponentially in round trip timer.
- 2.
Congestion avoidance phase. When the window size reaches Wth, the TCP sender enters the congestion avoidance phase. TCP uses a sliding window-based flow control mechanism allowing the sender to advance the transmission window linearly by one segment upon reception of an ACK, which indicates that the last in-order packet was received successfully by the receiver.
- 3.
Upon time-out. When packet loss occurs at a congested link due to buffer overflow at the intermediate router, either the sender receives duplicate ACKs, or the sender's RTO timer expires. These events activate TCP's fast retransmit and recovery, by which the sender reduces the size of the congestion window to half and linearly increases the congestion window as in the congestion avoidance phase, resulting in a lower transmission rate to relieve the link congestion.
Assuming long running connections and large enough window sizes, the upper bound on throughput, R, of a TCP connection is given by:
(14.1)
where:
MSS = maximum segment size
RTT = average end-to-end round trip time of the TCP connection
p = packet loss probability for the path.
Equation 14.1 neglects retransmissions due to errors. If the error rate is more than 1%, these retransmissions have to be considered. This leads to the following formula:
(14.2)
where:
RTO = retransmission time-out ∼ 5RTT
For a given object, the latency is defined as the time from when the client initiates a TCP connection until the time at which the client receives the requested object in its entirety. Using the following assumptions, we provide expressions for latency with a static and dynamic congestion window.
- •
The network is not congested.
- •
The amount of data that can be transmitted is dependent on the sender's congestion window size.
- •
Packets are neither lost nor corrupted.
- •
All protocol header overheads are negligible and ignored.
- •
The object to be transferred consists of an integer number of segments of size MSS.
- •
The only packets with non-negligible transmission times are packets that carry maximum-size TCP segments. Request messages, acknowledgments, and TCP connection establishment segments are small and have negligible transmission times.
- •
The initial threshold in the TCP congestion-control mechanism is a large value that is never attained by the congestion window.
Static Congestion Window
(14.3)
where:
L = latency of the connection
{x}* = max(x, 0)
K = O/(WS) round-up to the nearest integer
W = congestion window size
O = Size of the object to be transmitted
R = Transmission rate of the link from the server to the client
Maximum Segment Size (MSS) = S
RTT = round-trip time
Dynamic Congestion Window
(14.4)
where:
P = min {Q, K −1}
in which
Q = log2[1 + RTT/(S/R)] + 1
ISO defined five classes (0 to 4) of connection-oriented transport services (ISO 8073). We briefly describe class 4, which transmits packets with error recovery and in the correct order. This protocol is known as Transport Protocol Class 4 (TP4) and is designed for unreliable networks. The basic steps in the TP4 connection are given below:
- •
Connection establishment: This is performed by means of a three-way handshake to agree on connection parameters, such as a credit value that specifies how many packets can be sent initially until the next credit arrives, connection number, the transport source and destination access points, and a maximum time-out before ACK.
- •
Data transfer: The data packets are numbered sequentially. This allows resequencing. ACKs may be done for blocks of packets. There is a provision for expedited data transport in which the data packets are sent and acknowledged one at a time. Expedited packets jump to the head of the queues. Flow is controlled by windows or by credits.
- •
Clear connection: Connections are released by an expedited packet indicating the connection termination. The buffers are then flushed out of the data packets corresponding to that connection.
In practice, TCP has been tuned for a traditional network consisting of wired links and stationary hosts. TCP assumes that congestion in the network is the primary cause of packet losses and unusual delay. TCP performs well over wired networks by adapting to end-to-end delays and congestion losses. TCP reacts to packet losses by dropping its transmission (congestion) window size before retransmitting packets, initiating congestion control or avoidance mechanisms. These measures result in a reduction in the load on the intermediate links, thereby controlling the congestion in the network. While slow start is one of the most useful mechanisms in wireline networks, it significantly reduces the efficiency of TCP when used together with mobile receivers or senders.
View chapterPurchase book
Read full chapter
URL:
https://www.sciencedirect.com/science/article/pii/B978012373580550048X
TCP incast solutions in data center networks: A classification and survey
Melchizedek Alipio, ... Salman Khalid, in Journal of Network and Computer Applications, 2019
2.5 SACK
SACK (Khan, 2005) uses the slow-start and fast retransmit stages of the TCP Reno. SACK is an improvement of New Reno where segments are not acknowledged cumulatively. Segments in TCP SACK are acknowledged selectively. On entering fast recovery, it initializes a variable buffer that estimates how much data is remaining, and it also set congestion window to half of the current size. It reduces the buffer by one on every received ACK and increments it by one on every segment retransmission. When the buffer becomes smaller than the congestion window it sends the segments, which are not yet received. If buffer does not contain any outstanding segments then a new packet is sent. Thus, in one RTT, more than one lost segment can be sent. The problem with SACK is that, the receiver does not provide currently selective acknowledgements that are a necessary requirement for the proper functioning of SACK.
Table1 summarizes each of the abovementioned TCP variants and provides their limitations.
Table 1. Summary of TCP Tahoe, Reno, new Reno, SACK and vegas.
Variant | Summary | Limitations |
---|---|---|
Tahoe |
|
|
Reno |
|
|
New Reno |
|
|
SACK |
|
|
Vegas |
|
|
The TCP has been successfully adapted to several new environments like, long fat networks, wireless, and cellular networks. However, the unique workloads in data center environment break some basic regulations on which TCP was originally based upon. For example, round trip times (RTTs) in wide area networks (WANs) are multiple orders of magnitude greater than the average RTT in data centers. Similarly, because of the peculiar characteristics of a DCNs such as low-cost commodity switches with small buffers, high bandwidth, low latency links, and existence of data intensive applications (Web search, and email), new shortcomings in TCP are observed and one of them is termed as the incast problem. In general, TCP incast can be characterized as high-bandwidth and low-latency networks connected by switches with minimal buffer sizes. Within the network, barrier-synchronized requests are being issued by clients in a parallel fashion and servers simultaneously responding with a fragment of data block per request (Zhang and Ansari, 2011). There has been research done for designing efficient TCP schemes to avoid the typical TCP incast problem in DCNs.
View article
Read full article
URL:
https://www.sciencedirect.com/science/article/pii/S1084804519302553
Network and server resource management strategies for data centre infrastructures: A survey
Fung Po Tso, ... Dimitrios P. Pezaros, in Computer Networks, 2016
4.7 Open research issues
Most TE approaches and schemes discussed in this section share a common overall objective: to provide predictable and high-bandwidth network under highly variable traffic demands while also meeting other criteria such as, e.g., energy consumption minimisation. The common underlying control loop includes monitoring, detecting, and adapting promptly to problematic link load, providing a model that reacts to adverse conditions such as congestion.
The transient load imbalance induced by load-agnostic flow admission can significantly affect other flows using a heavily-utilised link that is common to both routes. Flows contending for the bandwidth of the shared link are more likely to create congestion which in turn causes packet drops for flows sharing the same bottleneck link. In most TCP implementations, packet loss will trigger packet retransmission when the retransmission timer expires or when fast-retransmit conditions are met. This additional latency can be a primary contributor to degradation of network performance since the retransmission timeout is a factor of 100 or more than the round trip time over a DC network environment.
Traffic flows are usually shuffled over shortest paths between communicating hosts. In some cases, however, selecting a non-shortest path can be advantageous for avoiding congestion or routing around a faulty path/node [67]. The surveyed proposals in this section only use multiple equal cost path in DC environment. In comparison, Baatdaat[90] and MPEFT[67] opportunistically include non-shortest paths for packet forwarding. However, finding flow routes in a general network while not exceeding the capacity of any link is the multi-commodity flow problem which is NP-complete for integer number of flows. Hence, the routing algorithm might consider non-shortest paths constrained by no more than n hops longer than the shortest path in practice because it does not significantly increase computation complexity [90].
The performance of current DC networks can be significantly improved if traffic flows can be adequately managed to avoid congestion on bottleneck links. This can be achieved by employing more elegant TE to offload traffic from congested links onto spare ones and alleviate the need for topological upgrades. Measurement-based traffic engineering techniques such as Baatdaat[90] and MPEFT[67] can play an essential role in response to the immediate load fluctuations. In contrast to reactive traffic engineering such as MicroTE[10], Baatdaat employs a measure-avoid-adapt proactive control scheme based on network programmability. Baatdaat uses direct measurement of link utilisation through dedicated switch-local hardware modules, and constructs a network-wide view of temporal bandwidth utilisation in short timescales through centralised SDN controllers. Subsequently, it schedules flows over both shortest and non-shortest paths to further exploit path redundancy and spare network capacity in the DC. It is shown in [90] that direct measurement of link utilisation can help make better flow scheduling decisions which result in considerable improvement in maximum link utilisation. We reproduced in Fig.4 the experimental results under different settings in Baatdaat. It can be seen that different measurement (and control) intervals can result in distinctively different performance results – Baatdaat’s performance gain over ECMP varies with the measurement timescale, and finer granularity yields better improvement. Even though the improvement is not uniform, in some regions can reach 20% over ECMP, while the practical measurement overhead is very low, especially if a dedicated management topology is used.
Fig. 4. Granularity of temporal link load measurement (extracted from [90]) has large impact on the performance.
View article
Read full article
URL:
https://www.sciencedirect.com/science/article/pii/S1389128616302298
The past, present, and future of transport-layer multipath
Sana Habib, ... Arjuna Sathiaseelan, in Journal of Network and Computer Applications, 2016
4.3 Congestion control algorithms for multiple paths
Multipath congestion control is a broad-ranging field; let us first spend some time to delineate our direction of research. Multipath congestion control algorithms can be explored along many aspects like TCP-friendliness, stability, Pareto optimality, resource pooling, among others (Xu et al., 2016a; Li et al., 2016). In keeping with the logical sequence of our paper (i.e., discussion on fair resource allocation in Section 2.3), we aspire to examine congestion control algorithms in terms of TCP-friendliness.
Note that congestion control algorithms have already been analyzed with regard to TCP fairness in Xu et al. (2016a) and Li et al. (2016). We intend to enhance the existing study by presenting three pieces. (1) We categorize the congestion control mechanism of various multipath transport-layer protocols with reference to uncoupled, uncoupled with shared bottleneck detection, and coupled. (2) Going deep into uncoupled and coupled methodology, we further compare and contrast the TCP-friendliness of different congestion control algorithms.13 (3) We study the strengths and weaknesses of congestion control algorithms along four aspects: flappiness, capture problem, resource pooling, and efficient network utilization. The discussion in the subsequent paragraphs harmonizes with the aforestated three points.
Congestion control algorithms for multiple paths can be broadly classified as uncoupled (or decoupled, Xu et al., 2016a) and coupled. Let us start our discussion with uncoupled congestion control. The simplest way to extend congestion control mechanism for use with multiple paths is to run traditional TCP congestion control (e.g., TCP Reno, TCP New Reno, Henderson et al., 2012, and TCP Westwood) on each sub-path. This design methodology was used in the early versions of SCTP and MPTCP (Xu et al., 2016a). However, this scheme runs the risk of being unfair to other Internet traffic in the case of shared bottlenecks14 as well as distinct bottlenecks.15
In order to ensure TCP fairness, researchers proposed the idea of identifying and detecting shared bottlenecks and suppressing them. This phenomenon is referred to as shared bottleneck detection (SBD). Using uncoupled congestion control with SBD addresses the problem of TCP fairness to some extent but has two serious limitations. (1) The problem of unfairness in the case of disjoint paths still remains. (2) The time required to identify and suppress shared bottlenecks can sometimes significantly degrade performance. In the pursuit of solving these shortcomings, researchers developed coupled congestion control algorithms.
Coupled congestion control algorithms, as the name suggests, couples the congestion windows across different paths in some ratio. A wide range of coupled congestion control algorithms (e.g., fully coupled, semi-coupled, BALIA) have been proposed in the literature with the plan of achieving better TCP fairness performance. Despite the fact that coupled refers to a broad spectrum of algorithms, we categorize multipath transport layer protocols in terms of uncoupled, uncoupled with SBD, and coupled for simplicity. We refer the reader to Table 2 in order to have a look at the results of our study.
Moving on to the second piece of our contribution, we first need to determine a group of algorithms for comparison. This task needs some foundation that we built next. Congestion control algorithms can be designed using different design methodologies, e.g., multipath additive increase and multiplicative decrease (AIMD) method, multipath multiplicative increase and multiplicative decrease (MIMD) (Kokku et al., 2007) scheme, utility optimization and control systems (Han et al., 2006; Radunovic et al., 2008; Lai et al., 2012; Peng et al., 2013), to name a few. We examine only multipath AIMD as it provides the necessary groundwork for descending on a set of algorithms for comparison.
Multipath AIMD algorithm additively increases window size in case of a successful transmission and causes multiplicative decrease when a transmission failure occurs. This approach consists of four algorithms: slow start, congestion avoidance, fast retransmit, and fast recovery. We further focus only on the congestion avoidance phase. Let us denote the increase and decrease in window size by and respectively. The aggregate price of a route qr is directly proportional to the probability of a transmission failure, since high qr indicates very little probability of successful transmission. Taking qr as the probability of transmission failure, the probability of a successful transmission is . The change in the size of congestion window on a route r in every RTT () can thus be represented as:
(5)
This model (5) has been adapted from Peng et al. (2016). The multipath congestion control algorithms increase the window size of a source s () on a route r if the transmission is successful otherwise reduce it using the specific decrease function, i.e., . Finally, we complete our preliminary work for comparative analysis of TCP fairness by presenting three design objectives of MPTCP (Khalili et al., 2013b):
- (1)
A multipath flow should at least perform as well as a single-path flow would on the best available path (i.e., performance).
- (2)
A multipath flow should be fair to other Internet traffic (i.e., TCP fairness).
- (3)
A multipath flow should balance congestion in the network by moving traffic from more congested paths to less congested ones (i.e., resource pooling).
This framework of multipath AIMD and design goals converge us on a handful of schemes: namely uncoupled Reno, evenly weighted TCP (EW-TCP), fully coupled, semi-coupled, linked increase adaptation (LIA), dynamic window coupling (DWC), opportunistic linked increase adaptation (OLIA), adapted OLIA (AOLIA), BALIA, EW-MPTCP, and Zhao et al. (2015), which will be inspected in terms of TCP fairness performance.16
Let us dive into the chapter and verse of the aforesaid congestion control algorithms by arranging their window-increase/window-decrease functions in Table 5. The parameters used in the window-increase/window-decrease functions are explained in Table 4. Note that window-increase/window-decrease function is specific to each protocol and is the key parameter for controlling the aggressiveness of traffic on a sub-path.
Table4. Notations used for multipath rate control algorithms.
Parameter | Specification |
---|---|
Window increase function for source s on route r | |
Window decrease function for source s on route r | |
Size of congestion window maintained at path i | |
BWi | Bandwidth obtained on path i |
τr | RTT on route r |
xr | Transmission rate on route r |
wr | Congestion window size on route r |
Total window size for source s | |
a | Used to control aggressiveness on various paths |
β | Used to control aggressiveness |
n | Number of paths |
Nm | It is the number of flows belonging to bottleneck sharing group m |
Nc | It is the total number of MPTCP connections |
γr | Uses both set of flows that are using best paths in terms of throughput and set of flows that have largest congestion window as parameters. This parameter balances load in the network by: |
(1) Increasing traffic on best paths, i.e., γ is positive | |
(2) Decreasing traffic on congested paths, i.e., γ is negative | |
(3) Do nothing, i.e., γ is zero | |
Only uses set of flows that are using best paths as parameter | |
λr | It is equal to |
br | It is equal to , where Br is the congestion balancing factor |
We begin our discussion with uncoupled Reno. Uncoupled Reno, due to maintaining a separate congestion window for each path, aggressively uses bandwidth and can deprive other Internet traffic of their fair share in case of shared as well as distinct bottlenecks.
EW-TCP came to improve this dire situation by evenly splitting a flow among all the available paths. Thus, the algorithm is more TCP-friendly than uncoupled Reno and also provide performance enhancements as it does not require SBD mechanism. Specifically, the factor in the window-increase function of the algorithm controls aggressiveness, as indicated in Table 5. However, evenly splitting a link bandwidth among multiple subflows may not conform to design goal 1 (i.e., a multipath flow should at least perform as well as a single-path flow). In such cases, weighted splitting can be performed. Honda et al. (2009) essentially based their work on EW-TCP but with using different weights on each path and adapting the weights in order to achieve fairness. However, their work could not handle heterogeneous RTTs and adaptive weighted TCP (AWTCP) (Hu et al., 2011) was proposed to overcome the RTT limitation of EW-TCP.
Table5. Congestion control algorithms for multipath transport.
Algorithms | Year | Ir(ws) | Dr(ws) | Strengths and weakness | |||
---|---|---|---|---|---|---|---|
Flappiness | Capture problem | Resource pooling | Efficient network utilization | ||||
Uncoupled Control | 2000s | × | × | × | × | ||
EW-TCP (Honda et al., 2009) | 2009 | × | × | × | |||
Fully Coupled (Kelly and Voice, 2005; Han et al., 2006; Wischik et al., 2011) | 2011 | ||||||
Semi-coupled (Wischik et al., 2011) | 2011 | × | × | ||||
LIA (RFC 6356) (Raiciu et al., 2011b) | 2011 | × | × | ||||
DWC (Hassayoun et al., 2011; Singh and Xiang, 2013a) | 2011, 2013 | × | × | ||||
OLIA (Khalili et al., 2012, 2013a,b) | 2012, 2013 | × | × | ||||
AOLIA (Singh et al., 2013b) | 2013 | × | × | ||||
BALIA (Peng et al., 2013, 2016) | 2013, 2016 | × | × | ||||
EW-MPTCP (Ming et al., 2014) | 2014 | × | × | ||||
Zhao et al. (2015) | 2015 | × | × |
Next, fully coupled algorithm (Kelly and Voice, 2005; Han et al., 2004) emerged on the scene with the ambition of improving TCP fairness performance by using total window term, i.e., , as shown in Table 5. This total window term couples the congestion window across all paths and provides better fairness in comparison to EW-TCP. However, the use of total window term makes fully coupled over ambitious in decreasing its window-size (since one bad path may result in decreasing traffic on good paths too). As a result, the algorithm may not conform to design goal 1 (performance) in some cases.
Realizing the performance loophole in fully coupled, semi-coupled algorithm (Wischik et al., 2011) came forth with a window-increase function that uses an additional aggressiveness factor to better meet design goals 1 (performance) and 2 (TCP fairness). Due to this aggressiveness factor, semi-coupled algorithm lies somewhere between EW-TCP and fully coupled in fairness performance.
Modified semi-coupled, also known as LIA (Raiciu et al., 2011b), is a modification to the window-increase function of semi-coupled to ensure that the total bandwidth obtained by multipath TCP flows does not starve regular TCP flows in case of shared bottlenecks. This tailoring in the window-increase function of LIA makes it more TCP-friendly than EW-TCP and semi-coupled. It is worth mentioning a very exciting enhancement to LIA, EW-MPTCP (Ming et al., 2014) that is targeted towards solving incast collision17 in datacenters. EW-MPTCP weights the size of congestion window in reverse proportion to the number of available paths and has same TCP-friendliness potential as LIA.
LIA, as mentioned before, assigns a bandwidth to subflows considering a shared bottleneck. But, what if the subflows are traversing distinct bottlenecks? In that case, LIA will unnecessarily reduce window size and consequently achieve less than actually available bandwidth. Furthermore, Khalili et al. (2013b) found out that LIA is not Pareto optimal.18 Investigation into the reasons behind non-Pareto optimality revealed that LIA inherently employs a trade-off between resource pooling and responsiveness, i.e., for good responsiveness LIA departs from Pareto optimality. This problem can also make LIA more aggressive towards other Internet traffic.
DWC (Hassayoun et al., 2011; Singh and Xiang, 2013a) addressed the first aforestated problem by coupling only those subflows that have a common bottleneck and keeping a separate congestion window for subflows belonging to distinct bottlenecks. Maintaining separate congestion windows for subflows belonging to shared and distinct bottlenecks makes DWC less TCP fair than LIA. OLIA (Khalili et al., 2013a) came out to solve the non-Pareto optimality of LIA. Its window-increase function, presented in Table 5, consists of two parts. The first part, based on the work of Kelly and Voice (2005), ensures Pareto optimality and improved TCP fairness while the second part guarantees good responsiveness. On this account, OLIA is more TCP-friendly than LIA.
Furthermore, Singh et al. (2013b) argued by comparing the first term in window-increase function of LIA and OLIA that the overall throughput of LIA is usually better than OLIA. In an effort to combine the best of both worlds (i.e., achieve a TCP fairness performance somewhere between LIA and OLIA), Singh et al. proposed adapted OLIA (AOLIA) that is a LIA-based extension of OLIA. Its window-increase function consists of two parts: the first part is a combination of OLIA and LIA while the second part is based on the regular TCP window-increase function.
Peng et al. (2016) developed BALIA on a theoretical framework. During their research, the authors observed that there exists an inevitable trade-off between TCP-friendliness and responsiveness (i.e., the most TCP-friendly algorithm is least responsive and vice versa). Due to this fact, BALIA is also referred to as the trade-off algorithm and has been found to strike the most balanced bargain between TCP-friendliness and responsiveness. Finally, Zhao et al. (2015) moved the theoretical setting of Peng et al. (2013) forward by proposing an algorithm that achieves better TCP fairness than BALIA and uses a congestion balancing factor br in its window-increase function to better meet design goal 3 (resource pooling).
Until now, we have performed a comparative analysis of various congestion control algorithms in terms of TCP fairness. Responsiveness is a topic closely related to TCP-friendliness. It is a measure of how fast an algorithm can respond to changes in network topology (by adjusting the window size on each path accordingly) and converge to equilibrium. The changes in network topology can be in the form of increased congestion level, loss rate, path failure, etc. We summarize our discussion on TCP fairness and exploit the observation made by Peng et al. (2016) to compare the responsiveness of various congestion control algorithms in Fig. 5.19 We do not go into an extensive discussion on responsiveness as it does not come in the scope of our paper. An inquisitive reader can refer to Xu et al. (2016a), Peng et al. (2016) and Singh et al. (2013b) for a more comprehensive study on this topic.
Fig.5. Comparison between congestion control algorithms with respect to TCP-friendliness and responsiveness.
Lastly, we go on to examine the strengths and weaknesses of congestion control algorithms along four attributes, which are flappiness, capture problem, resource pooling, and efficient network utilization. Flappiness and capture problem mark the deficiencies of an algorithm while resource pooling and efficient network utilization represent the benefits provided by an algorithm. In particular, flappiness measures stability while the other three attributes are related to MPTCP design objectives 1 (performance) and 3 (resource pooling).
We start our discussion with flappiness. Flappiness is a design imperfection that arises due to the flipping of subflows between optimal paths subject to small random fluctuations in the network. Uncoupled Reno maintains a separate window for each path and thus the question of flipping subflows never arises. EW-TCP, due to evenly splitting traffic among all available paths, is also safe from this defect. However, fully coupled algorithm that makes traffic splitting decision according to path characteristics is flappy. This is due to the fact that fully coupled treats temporary variations (e.g., small changes in bandwidth, congestion window size) in the network as permanent changes and keeps on shifting subflows between good paths that makes the algorithm unstable.
Wischik et al. (2010), while trying to solve flappiness, proposed the principle of equipoise. Equipoise defines a trade-off between resource pooling and traffic balancing by stating that congestion control algorithms should balance traffic among the best paths in such a way that it is robust to transient network fluctuations and responsive to persistent changes. This principle influenced all the subsequently proposed algorithms in the literature. As a result, later proposed algorithms do not suffer from flappiness.
Capture problem is a defect that causes subflows to be “trapped” or “captured” in a less desirable path and can significantly degrade the performance of an algorithm. Uncoupled Reno and EW-TCP do not suffer from this problem due to the same reasons as mentioned in flappiness. Fully coupled algorithm, however, experiences capture problem due to transmitting data on best paths and keeping very little traffic on worst path. Thus, if the path characteristics of worst path improve and that of good paths worsen, the algorithm cannot use the newly available better path due to insufficient probing traffic on it. The subflow in such a case is said to be trapped or captured in the less desirable paths and cannot utilize the available better paths. The principle of equipoise catered for this problem too, and because of that later proposed algorithms in the literature do not suffer from this handicap.
Resource pooling, in the context of coupled congestion control algorithms, is a big strength that allows an algorithm to pool congestion windows across multiple paths. This results in achieving more bandwidth, better reliability, and more robustness than each individual paths. Since uncoupled Reno does not unite the congestion windows on different paths, it is not able to reap the benefits of resource pooling while all aforenamed coupled algorithms implement resource pooling.
Efficient network utilization works on the brilliant idea of transmitting more traffic on good paths and less on bad paths. Uncoupled Reno, due to maintaining a window per path, cannot make very effective data distribution decisions. Also, EW-TCP allocates data without taking into account heterogeneous path characteristics. The available bandwidth could be utilized more efficiently if more traffic is sent on path with better path characteristics (i.e., low drop probability, small delay, small size of congestion window) as it will reduce congestion in the network and provide better load balancing. Therefore, coupled algorithms (other than EW-TCP) have been designed while keeping in mind efficient network utilization, i.e., they all assign traffic according to path characteristics. Table 5 provides a quick rundown of the above discussion by summarizing the strengths and weaknesses of heterogeneous congestion control algorithms.
This completes our discussion on the seven design questions that were motivated in Section 2.5. The gist of our study is reiterated in Table 6 for a quick recap. Other than these seven questions, we also looked into the HOL blocking problem, unnecessary fast retransmit, and multipath transport protocols' support for multimedia traffic and handovers. The issue of how many paths to use and path selection was also investigated. In addition to that, we compared TCP-friendliness and responsiveness of different congestion control algorithms, and also inquired into the advantages and disadvantages provided by them. Let us now move towards the last chapter of our paper, i.e., open research issues and challenges in the field of transport-layer multipathing.
Table6. Comparison between single-path and multipath transport layer.
Key mechanisms | Single-path transport layer | Multipath transport layer |
---|---|---|
Connection setup | 3-Way handshake | 3-Way or 4-way handshake |
Flow control | Per connection basis | Per connection or per path basis |
Sequence space | Single | Single or double or triple |
ACK | Cumulative ACK, SACK, and Delayed ACK | Cumulative ACK, SACK, delayed ACK, and duplicated and delayed ACK |
Flow scheduling | N/A | WRR, OMS, FPS, and EDPF |
NUM framework | ||
Congestion control | Uncoupled | Uncoupled or coupled |
View article
Read full article
URL:
https://www.sciencedirect.com/science/article/pii/S1084804516302028
FAQs
What is meant by the concept Fast Retransmit? ›
Fast retransmit is a modification to the congestion avoidance algorithm. As in Jacobson's fast retransmit algorithm, when the sender receives 3rd duplicate ACK, it assumes that the packet is lost and retransmit that packet without waiting for a retransmission timer to expire.
What is Fast Retransmit fast recovery? ›Fast Retransmit and Fast Recovery have been designed to speed up the recovery of the connection, without compromising its congestion avoidance characteristics. Client now acknowledges the first segment, thus completing the three way handshake. The receive window is set to 5000.
What is the main difference between fast recovery and Fast Retransmit? ›With using only Fast Retransmit, the congestion window is dropped down to 1 each time network congestion is detected. Thus, it takes an amount of time to reach high link utilization as before. Fast Recovery, however, alleviates this problem by removing the slow-start phase.
What causes TCP fast retransmission? ›TCP Fast Retransmission - Occurs when the sender retransmits a packet before the expiration of the acknowledgement timer. Senders receive some packets which sequence number are bigger than the acknowledged packets. Senders should Fast Retransmit upon receipt of 3 duplicate ACKs.
What are the disadvantages of fast retransmit? ›Disadvantages of Fast retransmit/fast recovery
If the handover from one foreign agent to another takes a longer time, the correspondent host will have already started retransmission. 2. The approach focuses on loss due to handover: packet loss due to problems on the wireless link is not considered.
Duplicate acknowledgement is the basis for the fast retransmit mechanism. After receiving a packet an acknowledgement is sent for the last in-order byte of data received.