QoS

Introduction
Nowadays, peer-to-peer programs are becoming extremely popular. Their use easily saturates an Internet connection to the point where other users on the same local network will experience unbearable latencies. This problem is prone in connections that have a much smaller upload bandwidth than download, such as ADSL, because of the nature of TCP: each time a packet is received, the receiver must ACKnowledge his reception by sending an ACKnowledgement packet (ACK packet) back to the server. The server will not send any more data until it receives this acknowledgement. In other words, the faster you are able to reply with an ACK packet acknowledging that you have received the previous packet in good shape, the less delay there will be in the server sending you the next one, because you are able to acknowledge faster, which simply means faster download speeds. BitTorrent and other peer-to-peer clients usually saturate a connection's upload bandwidth to the point where the modem's outgoing queue becomes filled with payload packets from the peer-to-peer data immediately introducing latency for other Internet users because their ACK packets are being caught in a very large queue instead of being sent out immediately.

The solution is traffic shaping. This article only provides an overview of a QoS solution; the actual application of this solution is described in QoS-Applied. However, it is extremely vital that you understand the processes of QoS before trying to implement it in your setup, as it can become very confusing very easily.

Common setup
It can simply be thought (as a rule of thumb) that QoS should be used whenever a single Internet connection is shared with more than one person. Most, if not all homes now follow this statement. The following is an example of a private network (a home) containing an ADSL modem (8mbps down, 1mbps up) and a router (linux box) acting as the local gateway whilst performing NAT for the local clients:


 * The ADSL modem has IPs: 192.168.1.1 (private), 123.123.123.123 (public), and does not perform NAT (as this is router's job)
 * The gateway has the IPs: 192.168.2.1 (eth0) and 192.168.1.2 (eth1)
 * The local clients have the IPs: 192.168.2.2+ and all use 192.168.2.1 as their default gateway.

On the linux box, eth0 represents the local network (i.e. connected through a switch), while eth1 represents the modem's one and only DHCP client -- the linux box (connected directly with a patch cable.)

This is the route packets will follow when a local client sends to an IP address not part of the local network:


 * The packet leaves the local client (192.168.2.2+)
 * The packet arrives at the default gateway through eth0 (192.168.2.1)
 * The default gateway forwards the packet from eth0 to eth1
 * The packet leaves the default gateway through eth1 (192.168.1.2)
 * The packet arrives at the modem (192.168.1.1) and enters its output queue at position #23
 * Delay: the earlier 22 packets have default priority because they arrived beforehand, so the packet at position #23 must wait for these to be sent first
 * ... 22 packets are sent out to the Internet ... (time spent sending: 1.2 seconds)
 * The example packet is now first in the modem's output queue and is sent out to the Internet from 123.123.123.123 (latency: 1.2 seconds + time spent sending example packet)

Important to understand here is the difference between latency and bandwidth. Latency is a measure of the time a packet needs to get from point A to point B. Bandwidth measures the amount of data that got from A to B in a certain time. So, if I were to take a dictionary to my friend on the other side of town, my bandwidth would be good, but the latency would be bad (the time spent driving, to be exact). However, if I were to phone my friend and start reading the dictionary to him, the latency would be lower, but the bandwidth would be substantially less than in the first example. Also important to note is that bandwidth and latency are not directly connected. If it took me the same time to read the dictionary on the phone and drive it over, then the bandwidth in both cases would be equal. However, the latency will not change.

Back to the DSL-modem example. As defined above, the upload speed of the DSL modem is only 128 kbit. This means that the modem will only send 128 kbit of data per second (bandwidth). If the gateway sends more data than that, the packets are sent to an output queue to wait in line for their turn to be sent, creating a backlog. What happens next is that the DSL modem's output queue is filled. If a packet takes five seconds to get from the bottom of the queue to the top, we have a latency of five seconds. That's bad for interactive sessions.

Since we have no control over how the DSL modem works, we need to move the speed-limiting queue from the DSL modem to the Linux box. By lowering the output speed of eth1 on the linux box to a speed slightly lower than the upload speed of the DSL modem, the packets will be queued in the Linux box before being shipped off to the DSL-modem, which would now have an empty output queue, so it should (hopefully) immediately transmit them to the Internet.

Once the queue has been moved to the Linux box, we have control over it, and the ability to shape outbound traffic.

Requirements
Well you don't need much. A computer running Linux, which has two network interfaces should suffice. I have been running this on a 200 MHz machine, and I believe even smaller machines can handle it.

Shaping strategy
When a packet comes in, you can make iptables do several things. Here are some examples of what you can shape by:


 * Port
 * Packet size
 * Traffic type
 * User/Client

Lets say you want to give BitTorrent a lower priority. You know it uses ports 6881 through 6889. Iptables can easily track this, but as soon as a user finds out that these ports have a lower priority, he configures his BitTorrent client to use another port. There is no way to know what ports P2P programs are running on.

Shaping by packet size does have some advantages. You can give smaller packets higher priority than larger packets. Since sending lots of data is best accomplished by sending large packets, this is exactly what P2P programs do. But then again a client could change the MTU (Maximum Transmission Unit; describes the maximum packet size) on his outgoing interface, thereby sending only small packets.

The thing you really want is being able to recognize a packet by its contents. You need to make iptables look into every packet and analyze the contents to figure out if its from a P2P program or not. As of now you can only do this with the following two projects:


 * IPP2P - Identifies P2P traffic
 * L7-filter - Identifies your packets (not only BitTorrent but also HTTP, Jabber, FTP and other protocols)

Shaping by user can be interesting for small home networks. It allows you to give every user a fair share of available bandwidth which they can then use for whatever they like, removing the case where one user takes all bandwidth away from other users altogether.

The various approaches can also be combined using a classful scheduler like HTB. It allows you to first shape by user and then prioritize by traffic type for every user.

Prioritizing
You might want to prioritize your traffic into the following classes:


 * 1) Interactive
 * 2) Misc
 * 3) Browsing
 * 4) P2P

Interactive
Interactive is for small packets that require very small latencies. This could for instance be ICMP, DNS, Voice over IP or SSH. This does also include TCP packet reception acknowledgement (ACK) packets.

Misc
Misc is for packets that fit nowhere else.

Browsing
Browsing is for packets that should have smaller latencies than P2P, but shouldn't really take priority over SSH. This could for instance also be HTTP, IMAP, IRC, POP3 or SMTP.

P2P
P2P are for P2P programs and programs that try to upload lots of data. These have the lowest priority possible.

Note that, for example, giving P2P a lower priority than browsing does not mean that P2P will get less bandwidth. It means that the system would prefer to transmit browsing packets over P2P packets. Only if you are saturating your upload stream P2P will take a hit in available bandwidth.

You might want to think about how you use your network and define the priorities accordingly, or change the order, but this example is a good starting point.

Understanding HTB
What is it really HTB does? HTB is a system that divides bandwidth into separate queues. It is a class-based scheduler. A class can be a user or a protocol group. It is important to remember that HTB is made to guarantee bandwidth and not to guarantee interactivity. HTB doesn't count packets, it counts bytes! This is why it requires some cleverness to get interactivity out of it. Well, here's a short explanation of some of the inner workings.

Qdiscs
Qdisc is short for Queue Discipline, meaning a specific strategy used to manage a queue. The queue in the post-office and the queue at an emergency room are both queues in the sense they are both lines of "items", but the strategy (or qdisc) used to manage them is very different.

Classes
The HTB qdisc organizes packets into classes, using filters. Filtering can be done using marks. Each class is a queue in its own right, and therefore uses yet another qdisc (e.g. SFQ). You can think of classes as the doors from where the bandwidth pass. You must classify the traffic in the correct door that limits that kind of traffic.

Rates
A rate is the amount of bandwidth a qdisc is guaranteed. For example, in an ideal world, an upload rate of 128 kbit/s would mean that the ISP will always give us at least that amount of bandwidth, or more if available.

Ceil
Ceil (Bandwidth Ceiling) is the maximum amount of bandwidth a qdisc can have. Continuing from the previous example, in the real world, an "upload rate" of 128kbit/s really means that the ISP has set a limit on the maximum bandwidth that we can use, even if more is available.

Bursts
There are two types of bursts - burst and cburst. burst is an amount of bytes by which a qdisc can exceed its rate, while cburst is the same, but is for exceeding the ceil.

On lartc.org it is very little explained. It says burst is for how many bytes can be sent after a class hits ceil, in excess of the configured rate.

Burst is the amount of bytes a class is allowed to send at rate ceil. Important to remember is that ceil is a rate (bytes per second) and burst is an amount (bytes). You can look at burst as the size of the bucket, ceil as the maximum at which you can take tokens from the bucket and rate as the speed at which the bucket refills. So if we have a burst of 5000, a rate of 1000 and a ceil of 2500, then we can sustain a connection a 1000 bytes/sec, if at some point a sudden "burst" of data is available it is possible to send at 2500 for (5000/2500=) 2 seconds. After that the bucket is empty and starts filling up at 1000 bytes/sec.

Quantum
Quantum describes how bandwidth is divided between qdiscs. It works like this:


 * All quantums for all qdiscs are added together and the sum is remembered.
 * Each qdisc gets priority according to $$\frac{quantum}{sum}$$.

This is used when you have two qdiscs with the same rate and ceil, but want to give them different priority.

While using SFQ, the amount of data is measured in bytes, not in packets, therefore quantum value equals to number of bytes that particular qdisc is allowed to send in one cycle.

It is important to remember that to get the finest precision you should select as small quantum as possible, while still larger than the MTU. When classes want to borrow bandwidth they are each given the number of bytes in quantum before serving other competing class. A too large quantum can create long response times.

If we have two qdiscs (A and B) with quantums (two and one) it's like 2:1 that means for every two packets of qdisc A allow one packet of qdisc B so qdisc A has 66% of the whole bandwidth (2+1=3 and 2/3=0.66...). But because the quantum equals bytes not packets and if your MTU is like 1500 bytes, then the one packet in quantum B would be a 1500 and the two packets in quantum A is a 3000).

r2q
The r2q value (which is 10 by default) can be specified when you create an HTB root. r2q means rate to quantum and is the conversion factor used to calculate the quantum value using the specified rate for a queue. You can always override this by explicitly specifying a quantum for a class.

The quantum are thus calculated by quantum = rate (in bytes)/r2q. The quantum values must be bigger than the MTU for your setup, which is 1500 for most cases (that is the MTU of ethernet). The quantum values must also be smaller than 60 000, a value hard-coded to prevent poor prioritizing. If your quantum values are wrong, you will get error messages in your kern.log like HTB: quantum of class 10101 is big. Consider r2q change. In such a case, calculate a better r2q value using your rates (set it so that all quantum end up being > 1 500 but < 60 000). If the quantum values are wrong, the division of bandwidth will not be proper.

Prio
If a class has a higher ceil than rate, it is allowed to borrow bandwidth from other classes. By default, the bandwidth priority of a class is weighted by its rate, i.e. a class with twice the rate of another class can also borrow twice as much bandwidth. This behavior can be overridden with the prio parameter. The smaller the number, the higher the priority. Classes with higher priority will borrow first, classes with lower priority can borrow only if there still is bandwidth left. A class that can't borrow bandwidth will not be able to exceed its rate. The prio parameter is often misunderstood as it does not actually affect the order in which packets are sent out.

Understanding HFSC
HFSC has the major advantage compared to HTB that it also allows a proportional distribution of bandwidth as well as control and allocation of latencies. This enables you to better and more efficiently use connections for situations in which both bandwidth intensive data services and interactive services share a single network link. This is why you should rather use HFSC than HTB. It is particularly interesting for Voice over IP and other real-time connections. Visit (1) for further information (in German) or (2) for the English translation.

Understanding SFQ
SFQ stands for Stochastic Fairness Queueing. It will queue packets that belong to different connections and try to allow every connection to send the same amount of packets, for example to achieve that several concurrent FTP uploads will all run at the same speed and not one choke the other. SFQ treats all connections equally. As such, SFQ does a great work on its own, by balancing "everything" without any additional configuration. The downside to SFQ is that it is a Queue. As such, it delays packets and introduces lag. Attaching several SFQ qdiscs to HTB classes makes this problem worse, as every instance of SFQ keeps its own queue, increasing the total number of packets that will be delayed. SFQ is especially suitable for peer to peer connections that jam the upstream. You should not use it for interactive connections such as SSH, IPTV or Voice over IP because it works with a queue. This means that the packets are not sent immediately but are processed in turn.

limit
The default queue size of SFQ is 128 packets large. With the limit parameter (which is only available in newer kernels and tc versions) you can set your own custom queue size. Smaller queue sizes will make the stochastic fairness less accurate, but will improve latency at the same time.

Multiple interfaces
Thinking of traffic shaping multiple interfaces as one in kernel 2.6? Or want to packet shape ingress? IMQ could be the answer, but it might not be stable. Or is it? There is next to no online documentation. Please refer to these two links:


 * IMQ ported to 2.6 discusion
 * IMQ usage example