Updated
Nov 15th, 2018
First Posted
Apr 13th, 2012

Bandwidth Management Tutorial

Note: These screenshots are v4 of the GUI and are no longer current.

The "Goods" on Bandwidth Management

Bandwidth management is one of the most widely misunderstood subjects in modern networking. The reason is simple: the term itself doesn't convey a specific meaning. Terms like "shaping" further confuse the issue. What does "shaping" mean? What does "bandwidth management" mean? Its just a lot of shady terms that sound good but don't really mean anything. In fact, the terminology used is purposefully vague, perhaps so that you can't complain about the performance of whatever it is you're using. What are you going to say? "The product isn't shaping the way we'd hoped, so we'd like our money back". What are the magnitudes of shaping? How many shapers does it take to screw in a lightbulb? Nobody knows. Bandwidth Management is a lot like economics, because the complexities of how it works are beyond simple logic. For example, a simple view of economics might be that if the government raises taxes, they will have more money, right? Well, not really, because if they raise taxes people have less money to spend, so they buy less which causes people who sell things to have lower incomes, so they hire fewer people and more people are out of work. So effectively, everyone along the line is making less money, and paying less taxes overall, even though the tax rate is higher, so the government gets less money. There is a dynamic present in economics that makes what seems to be quite simple a lot more complicated as you gain more of an understanding of how things REALLY work. With bandwidth management, for example, you might think that giving HTTP high priority will make everyone's browsing experience better. But if you do so, its likely that several other protocols that you've never even heard of will stop working altogether because there is so much HTTP traffic that nothing else will be able to get through. There's a domino effect that you didn't anticipate. You've actually made things worse with what seemed to be a very logical setting, because you didn't quite understand the bigger picture.

What is "Bandwidth"?

The first problem is that the term "bandwidth" is deceiving, because there are no "bands" and it doesn't have any width. Internet "bandwidth" is not a spectrum; traffic streams are one bit at a time. Bandwidth on the internet can only be conceptualized over time, and the amount of time that you talk about can greatly change the user experience. For example, if you say that a user gets 256Kb/second, does that mean every second? Can he get 512Kb/s in one second and none the next? Thats 256Kb/s. Or a megabit one second and then no bandwidth for the next 3 seconds? What about slices of a second? As a provider, or someone involved with the management of bandwidth, you have to understand whats really happening. When you talk about "allocating" bandwidth, you have to realize that its not like setting aside a lane on a highway, because there is only one lane on the internet. All of the bits travel up the same pipe. So you are really only allocating a time slot, which can have a variety of properties, and things will work quite differently depending on the amount of time you use as your baseline.

Why "Application-Aware" Bandwidth Management is Backward

What's truly ironic is that virtually everyone is doing bandwidth management wrong. Our competitors, you've heard of them: Packeteer, Allot, etc., try to tell you to do it the way that best suits their products. The problem is that their products are based on an incorrect premise: that the way to manage your network is to micro-manage content, that by identifying every protocol and deciding which ones are "good" and which are "bad," and streamlining the "good" ones will make your network better. The truth is that "application aware" bandwidth management is more about marketing than it is about managing a large network. Its a way to get you to keep buying upgrades, because if there's one thing you can count on its that there will always be more applications next year. There are many problems with what our competitors tell you to do, and not all of them are technical. Firstly, not everyone has the background to understand how every protocol ever created works, and how they affect the overall network. And as earlier mentioned, there are dynamics present that will present moving targets as you try to micro-manage each protocol or service. And by the time you learn, it's likely wasted time, as new versions with different behaviors are released that have to be dealt with. Not to mention the the cost of buying all of the upgrades to handle the new protocols. There are also philisophical problems with the way our competitors tell you to manage your network if you are an ISP. An ISP is a reseller of bandwidth. You offer "internet service", not just "services that you think are worthy". If someone wants to use his bandwidth allocation to play games or download music, who are you to tell them that they can't? They can't do what they want to do because you can't figure out a better way to fairly allocate your resources? That's not acceptable to most end users. From a technical standpoint, what do you do when the traffic is encrypted or compressed or both? All of a sudden your entire strategy has been defeated. All of your work has been wasted. You can no longer "see" whats going on. You only have big blobs traveling back and forth. Many bandwidth-hog protocols like "p2p" are moving toward encryption and tunneling. The most important reason that our competitors are wrong is that their way is a short term solution. If you stop abusers from doing one thing, they'll just find something else. You can't manage abusers long term with the protocol method, without constant adjusting to match new conditions. All you do is fix things for a short time, until the dynamic changes.

So What's the RIGHT way to manage a network?

Think of your network as a society and your bandwidth management device as a police force. The obvious way to stop crime is to get all of the criminals, right? No, thats wrong. There are too many potential criminals. If its easy to get away with a crime you will have more and more criminals. And the more criminals you have, the more difficult it is to get them all. Its like a video game where you shoot a bad guy and 2 more appear. They keep coming. You'll never get them all, because there are more of them than there are of you. You'll be constantly fighting a losing battle. The way to stop crime is to create a deterrent. I just heard that car thefts are down 20% in the USA. Is that because there are fewer criminals? No, its because today's automobiles are more difficult to steal. If you make your bandwidth more difficult to steal, then you won't have to worry about catching people stealing it. The question is, how do you create a deterrent? Using the crime analogy, you do it by creating an environment that lets the good guys go about their business normally, and that makes it difficult or impossible for the bad guys to do bad things. You can do it by putting a policeman on every corner, and by putting cameras in banks and at ATMs. You put controls in place that make doing bad things hard to do, and that makes it easier to identify those that do them. The concept can be applied to bandwidth management as well. You don't create a fast, stable network by chasing down the "bad guys". You do it by making it difficult to do bad things. You do it by creating overall policies that thwart the effectiveness of doing things that cause problems. And you don't have to "know" what your clients are doing in order to enforce such policies. Because it doesnt matter WHAT they are doing. It only matters how much bandwidth they are using compared to what they are paying for.

Preparing to use the ET/BWMGR

Clearing your Brain

The first task is to realize that micro-managing hundreds or thousands of protocols is not only needlessly complex, but also something that can be harmful if improperly implemented. The interactions on a large-scale network can't be fit into a static model as they change constantly, with endless variables and combinations. The best approach is to generalize, because the more specific the approach, the more often the model will be wrong. Brain-clearing is especially necessary when coming from a background using another bandwidth management product. A product that uses terminology like "defining queues or pipes," or "setting up partitions," is asking you to conform to a methodology of operation using terms that are only applicable within the context of that particular product. Using these concepts on another product won't be of help, so it's best to start fresh.

Defining a Strategy

In our experience, most users of bandwidth management devices don't have a formal strategy. Most just want to stop abusers from using all of their bandwidth, which is certainly useful and in many cases will (for a short time) make the network run more smoothly. The problem is that the abuser today many not be the abuser tomorrow, which means that you are never done. There will be someone else tomorrow. The protocol method is the worst choice. If you "limit" one protocol, they'll find something else. The abusers are clever. You don't want to engage them in a daily battle. You'll spend your whole life chasing down these people. If you want to control abusers, limit their IP or MAC address. By controlling ALL of their traffic, no matter what they do, they are under control. Its as simple as that. Supposing that abusers aren't your target, if you use our competitor's method and sculpt your network by allocating bandwidth and priorities to scores of protocols, you haven't really defined a strategy either. You've created a framework based on a static model, which unfortunately only applies if your network happens to match the model all of the time, which is highly unlikely. You may have determined that your network is 75% http traffic, but the dynamic changes constantly. Not day to day, but minute to minute. So any time that the requirement is above 75%, or if the requirement for some other important protocol increases, your model doesn't work, and everyone on your network will suffer the consequences. No matter what you use your network for, the real goal is almost always the same: to make your network run smoothly without too many restrictions. Consider the case when you have "more than enough" bandwidth to do anything you need to do without any problems. Things work great. You don't have to examine your usage constantly to see what "the problem" is. You don't have to "catch" anyone doing something that you didn't anticipate. You don't have to run into the office in the middle of the night because your network is so slow that your customers can't even get their mail. This IS the primary goal of bandwidth management. If you have no congestion, you have no problems.

The Problem with Priorities

Back when cave men walked the earth, someone thought of something called "priority queuing", which was the first "method" used for bandwidth management. It made sense at the time, although its not clear why so many "products" still use it. I suspect that the reason is that its easy to implement. Any "pretty good programmer" can do it, and it can have a beneficial effect on a small network. What prioritization does is take a bunch of traffic, decide that some traffic is more important than some other traffic, and then send the more important traffic first. But if you think about it, the "method" is flawed... There is one basic problem with "prioritization". In order to prioritize, you have to have a delay queue, and then reordering your traffic to shift the delay from one protocol to another. This may solve some problems, but it doesn't reduce your bottleneck. And then traffic you haven't thought of is being delayed even more, so things may work worse until you figure out what you've missed. Its like taking a pile of garbage and making several smaller piles so you can get rid of the ugliest stuff first. It may look nicer, but you still have garbage, as long as you're making it faster then it can be taken away. You don't want to transfer the delays from one type of traffic to another, you want to get rid of the delays or at least reduce them. Good bandwidth management is about preventing the delays althogether.

Internet Traffic is Largely Self-Throttling

People ask us all of the time how you can control traffic for an entire network in both directions from a single access point. They understand that you can send traffic out at a slower rate than it comes in, but how can the device control incoming traffic? The answer is simple. Just about all protocols have built-in mechanisms to control flows, because otherwise things wouldn't work at different speeds. Most traffic is largely "self-throttling", because if you slow things down in one direction it slows down overall, adjusting to the environment. With TCP traffic, you can additionally "shrink" the window, which causes less traffic to be sent by the server. By doing this you lessen the amount of traffic that is in your network at any given time, which reduces the latency of ALL protocols. Even other protocols, UDP based protocols for example, have similar characteristics. UDP traffic is often a predecessor to TCP traffic or some transfer, so by slowing it down you will slow what happens after it. Large chunks of data are usually not sent without some sort of acknowledgement. A server doesnt know the access speed of the client, so it can't just send data at wire speed to the client. Protocols are generally "well-behaved". You have to trust the fact that for the vast majority of cases, bandwidth can be controlled from a single point between the client and the server.

Getting Rid of Congestion

Simply put, congestion is when you have more traffic then you have bandwidth. The "big picture" goal of any good bandwidth management strategy is to change the way traffic flows through your network in such a way that congestion is eliminated. Think of the situation when you have more bandwidth than you need, like a 100Mb/s pipe connected to a PC that can only pull down 10Mb/s. Traffic flows freely through your router, it gets sent out as soon as it arrives. You don't need bandwidth management, because there is no chance of exceeding the capacity of the upstream connection. Much of today's congestion is due to larger TCP windows being used in client systems. As the number of active sessions increase, congestion increases, more so with higher window settings. So the most important function that your bandwidth management device must do is TCP window shaping. Without window shaping you CANNOT reduce the amount of traffic in your router's queues, so the best you can do is shift delays from one user to another. Most products on the market do not properly window shape to reduce congestion.

Controlling Users

If you think about it, what you really want to do is control your users, not your protocols. Each "customer" is paying for something, and as long as no one uses more than they're supposed to, there are no problems. Imagine a highway where you can force all of the cars to go a certain speed. Per-user management has many advantages. It allows you to properly engineer your network to determine how much bandwidth you need for your customer base, and how much bandwidth you can give to each one. It allows you to tier your offerings, so you can charge more for more bandwidth or higher quality service. In fact, its the exact way that large networks (such as telephone systems) are engineered. You have X number of users using the network at any given time, and each is allowed to use Y bandwidth. Now, a telephone network is easier to engineer because requirements are fixed, but you need a lot more bandwidth, because even if no-one is talking they are using bandwidth. On a data network, there are normally lots of users doing nothing or very little (reading a web page, composing mail, making a sandwich, etc). Thats where bursting comes in. Personally I don't like the term "bursting", because it not accurate, but to some extent we're stuck with existing terminology. For the context of this discussion, "bursting" means "temporarily allowing higher speed transfers". Bursting techniques allow you to temporarily allow users to get more bandwidth when conditions permit it, which allows you to dynamically utilize "excess" bandwidth in a controlled manner.

How to implement an effective strategy

The hardest part about implementing an effective bandwidth management strategy is finding a product capable of implementing it. At minimum you'll need a product that has the following features:
  • Can handle your traffic levels with a policy for every one of your customers/users
  • Implements window shaping in order to reduce the overall amount of traffic that needs to be managed
  • Has flexible bursting controls so that end user performance can be maximized and bandwidth can be dynamically shared
It doesn't seem like much, but the vast majority of products on the market cannot do what is needed. Luckily for you, our products can. The only brain-work (as opposed to grunt-work) is to formulate the bandwidth policies. For the simplest case, you can define a default policy for everyone on your network. The elements of a bandwidth profile are as follows:
  • Bandwidth Limits (directional or combined) - The maximum amount of bandwidth a user is allowed (but not guaranteed 100% of the time)
  • Burst Limit - The maximum amount of bandwidth a user can get no matter what the conditions (again, not guaranteed)
  • Minimum Bandwidth - Pre-allocated Bandwidth (guaranteed)
  • Max Burst Duration - The maximum amount of time that continuous bursting can occur
  • Burst Trigger - The condition which determines whether an end user can burst or not
  • packets per second - The maximum packets/second that a user can get at any time
Before trying to implement it on your device, write it down in words, using words that are quantifiable. Using terms like "when bandwidth is available" can lead to trouble, if there's no definition of what that means. Typically you will not use the Minimum Bandwidth setting, because in order to guarantee bandwidth, you must permanently set aside bandwidth that cannot be shared, even when it's not needed.. Thats a more advanced concept that will be explained in one of our more complex case studies.

Defining a Bandwidth Profile

The first thing that needs to be said is that there is no one formula for engineering a general purpose data network. Every network is going to have a different mix of casual users and constant users ( or abusers ) who have active downloads all day long. So there's no single setup or formula that will work on every network - observation and fine tuning are a necessary component of implementing any bandwidth management strategy. When defining a default bandwidth profile, there are some limits. For example, on a T1 circuit, it would be a bad idea to give 200 customers a 512Kb/s limit each at all times. A mathematical approach might be to multiply your available bandwidth by 4 and divide by the number of users online at a given time. An example might be a network with 6Mb/s and 200 users on-line. That's 24Mb / 200, which is 120Kb/s. Using this example, let's formulate a default profile in plain language:

"Each user gets 120Kb/s and can burst to 512Kb/s for up to 60 seconds whenever overall network usage is below 5.9Mb/s"

What this means is that your default bandwidth profile will allow each user to always get 120Kbs and will allow 512Kb/s downloads for up to 60 seconds when conditions permit. So your profile would be: Bandwidth-in = 120K Bandwidth-out = 120K Bandwidth-burst = 512K Burst Max = 60 Burst Trigger = TriggerRule
Now that was pretty easy. Now you might ask why there is no minimum setting? The reason is that you are "assuming" that each user is not going to use 120K all of the time. That's why you can multiply your available bandwidth by 4. If you "guarantee" the bandwidth, you'd have to only allow 1/4th of the bandwidth per user. And thats not enough. Its simply not necessary to guarantee bandwidth if you engineer your network properly. More than likely, each user will be able to get their 120K whenever they want it. Another element of a bandwidth profile is a packets per second setting. You can use PPS settings to minimize the unfair use of resources. After all, even though you are giving everyone 120Kbs you don't actually want them to use that 100% of the time — if they did, you'd only have 1/4th the bandwidth that you need. Typical internet use is not continuous, except for large downloads, streaming video, and such activities. 10 PPS allows for a 120Kbs download. But suppose you have 2 customers on the same service, one which is a small drug store, and another which is a travel business with 50 users on an internal network. The drug store hardly ever uses their bandwidth, and the travel business has all of their agents online, all day long. Setting the profile to 25 or so PPS will allow a single user to get their full allotment, while keeping usage down if there are a lot of active connections. It's also a great way to minimize the impact of P2P protocols, most of which send out lots and lots of small packets as they discover servers and share directory contents.

Secondary Topics - AutoShaping

Ok, so there's that word again "shaping". What does that mean? Well, to be truthful, its a bit difficult to explain what the ET/BWMGR AutoShaping feature does. But one thing I can tell you is that it's REALLY handy! Put simply, it sets a threshold at which every session on your network is automatically slowed until an equilibrium is reached. Flows are reduced, sessions are paced, all in a very economical way that doesn't require much processing power, and its done in a very fair manner. For a network that is only occasionally congested and fairly well-behaved, AutoShaping may be all that's needed to clean things up during the peak times Its also a way to manage bursting if you don't want to shut off bursting when your network is running at capacity. For example, the normal mechanism will shut down bursting when your network hits a threshold. If your network is constantly at capacity, this will result in bursting turning on and off as all of the sessions slow down and speed up. Autoshaping can be used to pace the bursting when required, which allows the clients to run faster than they would if bursting is shut down. We'll use Autoshaping in our MiniWISP example to illustrate.

Note on PPS rules

If you run a bridged network with lots of units on the same "network" then you may have a LOT more ARP packets flying around than you realize. For example, on a cable network you may have 30pps streaming in on a regular basis. You'll need to either compensate for these in your PPS rules, or better yet not count ARPs as traffic by placing a static rule at the beginning of your ruleset which matches ARPs.

Case Study - MiniWisp Wireless ISP

Network info:
  • 12Mb/s of internet bandwidth
  • 800 customers
  • 500 IP addresses assigned by DHCP
Goals:
  1. Set up a 4 tiered structure to allow for low-usage residential, broadband residential and 2 business offerings
  2. Only allow "known" MAC addresses to access the internet through their wireless facilities
  3. Monitor the network to control users running parasitic, abusive applications
  4. Business Accounts can host a web site and get a static IP, whereas port 80 is blocked on residential
Implementation:
  • Set Burst Threshold at 11.5Mb/s
  • Set AutoShaping Threshold at 10Mb/s
  • Set default policy to deny access.
  • Create a policy (rule) for each MAC in database (script generated initially to default profile)
Policy Definitions:

Default (Basic Residential)

Bandwidth In = 128Kb/s Bandwidth Out = 56Kb/s Burst = 512Kb/s Burst Max = 30 seconds PPS in = 30 PPS out = 30 Note: Allows for 1.35MB file download in burst period. (Note that the customers can burst to 512K, but can only do sustained downloads of 360K due to the 30pps limit, 30 * 1500 * 8). Limited outbound bandwidth. PPS rules limit overall activity (so that you can price a household with 6 people with laptops on a wireless router the same as one guy with a desktop).

Broadband Residential

Bandwidth In = 256Kb/s Bandwidth Out = 56Kb/s Burst = 750Kb/s Burst Max = 240 seconds PPS In = 80 PPS Out = 80 Note: Allows for download of 25MB file in burst period.

Business Standard

Bandwidth In =512Kb/s Bandwidth out = 256Kb/s Burst = 1Mb/s Burst Max = 240 seconds PPS In = 80 PPS Out = 80 Notes: Allows for greater bandwidth out and higher burst. Port 80 services blocked

Business Premium with Hosting

Bandwidth In =512Kb/s Bandwidth out = 512KKb/s Burst = 1Mb/s Burst Max = 240 seconds PPS In = 80 PPS Out = No Limit Notes: Allows for higher outbound usage, port 80 services allowed.

Screen Shots of the Set up

The Profiles:

The Policies / Rules:

Some Notes on the Setup:

Note that the trigger must be a global rule, and that it could also be used to graph / monitor the interface usage (minus ARPs) by enabling stats on the rule. The port 80 handling rules must also be global. A packet will only be dropped if the FIRST rule that matches is a discard. All port 80 traffic will match rule 900. But for the dogood-hospital, 850 will match first, so the packets will not be dropped. You could also put this logic into the firewall, particularly if you only have a few cases where its allowed. However firewall rules are scanned sequentially, so if you have a lot of rules with IP or MAC addresses its more efficient to put them in the normal ruleset. A non-wireless ISP (or one that has authentication and doesn't have to worry about unknown clients attaching to their network) can use IP addresses more efficiently rather than MACs. These are also easier to set up using our Do Range feature that allows you to ad any number of rules with sequential IP addresses. Regarding triggers, you could have different triggers for different types of service. Perhaps you'd prefer to allow your premium customers to burst all the time for their Burst Max setting no matter what the traffic. You could create a trigger that is higher than your available bandwidth, such as 20Mb/s in this case. Since the trigger could never be reached, bursting would be enabled all of the time, so they'd always be able to burst for their max allowed burst period. Of course this would affect the engineering of your network. Also, don't forget to allow your own hosts and servers if you are using the internal network. Be careful not to put rules here that will allow users to get more bandwidth than they are supposed to. However typically you don't count accessing your servers towards their bandwidth allocations.

Case Study - CityWide Internet

Network info:
  • 45Mb/s of internet bandwidth
  • 1600 customers
  • 1024 IP addresses assigned by DHCP
Goals:
  1. Set up a 4 tiered structure to allow for low-usage residential, broadband residential and 2 business offerings
  2. Monitor the network to control users running parasitic, abusive applications
  3. Business Accounts can host a web site and get a static IP, whereas port 80 is blocked on residential
Implementation:
  • Set Burst Threshold at 43Mb/s
  • Set AutoShaping Threshold at 38Mb/s
  • Use AutoMgr to monitor long term usage for of basic residential clients
  • Assign Static IPs to all premium service customers (this can be done with MAC mapping in your DHCP config. You could also create DHCP pools of MAC addresses in pariticular service tiers and then create generic static rules for those IPs).
Policy Definitions:

Default (Basic Residential)

Bandwidth In = 128Kb/s Bandwidth Out = 56Kb/s Burst = 512Kb/s Burst Max = 30 seconds PPS in = 30 PPS out = 30 Note: Allows for 2MB file download in burst period. Limited outbound bandwidth. PPS rules limit overall activity (so that you can price a household with 6 people with laptops on a wireless router the same as one guy with a desktop).

Broadband Residential

Bandwidth In = 256Kb/s Bandwidth Out = 56Kb/s Burst = 750Kb/s Burst Max = 240 seconds PPS In = 80 PPS Out = 80 Note: Allows for download of 25MB file in burst period. The profile allows longer burst periods simulating continuous broadband service without allowing for continuous use or resources.

Business Standard

Bandwidth In =512Kb/s Bandwidth out = 256Kb/s Burst = 1Mb/s Burst Max = 240 seconds PPS In = 80 PPS Out = 80 Notes: Allows for greater bandwidth out and higher burst. Port 80 services blocked

Business Premium with Hosting

Bandwidth In =512Kb/s Bandwidth out = 512KKb/s Burst = 1Mb/s Burst Max = 240 seconds PPS In = 80 PPS Out = No Limit Notes: Allows for higher outbound usage, port 80 services allowed

Screen Shots of the Set up

The Profiles:

The Policies / Rules

Add Comment

Next: P2P Protocols