Issue No. #13 | 31 February 2002 | ISSN: 1532-1886 |

We
encourage you to forward this newsletter to others. Please also CC to **addme@servicelevel.net**
so that they get a free subscription too.

ISP Bandwidth Billing - How To Make More Or Pay Less by Steve Cerruti and Carl Wright

Billing for Network Traffic

Service providers
measure IP network usage for billing in different ways. Residential ISPs
typically offer unlimited or flat rate usage and hourly usage. In addition
to unlimited usage, wireless ISPs typically charge per Megabyte transferred.
ISPs providing service in hosting data centers may also offer per MB billing,
but more often you will find them billing based on a statistical method
called "95^{th} percentile". The charging between ISPs
is also based on 95^{th} percentile billing.

95^{th}
percentile billing is an outgrowth of network monitoring systems that use
95^{th} percentile measurements for capacity planning. A statistical
measurement of network usage was ideal for this task. However, there is
no standard defined for 95^{th} percentile computations and many
ISPs are implementing billing based on this mechanism without an understanding
of the errors they introduce. This lack of a consistent, accurately implemented
billing mechanism makes it impossible to compare service providers based
on price or to estimate future bills based upon projected business.

95^{th}
percentile measurement allows the ISP to bill the customer for the maximum
bandwidth used during the billing period while forgiving a small amount
of bandwidth spiking.

Figure
1. An example of the amount charged for using a 95^{th} percentile
method

Analogy to Physical Content Delivery

Explaining percentile billing in terms of cumulatively billed usage products, like water, can help describe how the mechanism operates.

For example,
assume the water company provides 95^{th} percentile billing for
water usage on a 2-hour average gallons per minute scale. The water meter
no longer keeps track of your total water usage for the month, but instead
records the water usage for each 2-hour period during the month.

Twin brothers Jim and Tim live next door to each other in identical houses with identical yards. Both have automatic sprinklers that water the yard at 7 AM for 2 hours each morning. They have rather large yards and this constitutes the bulk of their water usage. However, on weekdays, Jim showers at 7 AM and Tim showers at 7 PM. At the end of the month Jim and Tim have used exactly the same amount of water, however, Jim's water bill is higher because he is being billed for the water used each weekday to water the lawn and shower while Tim is only billed for watering the lawn.

Figure 2. Showing how Tim pays less for the same water because he lowers his peak usage by spreading out his usage.

Jim will pay 28% more than Tim because he showers in the morning. Because percentile billing is focused on your peak usage, Tim gets water for showers for free. Buyers of bandwidth can get free bandwidth when they "shower at night". It is more difficult for most buyers because most usage is from an uncontrolled population of Internet users that you want to provide quality response.

As a buyer of IP bandwidth, I'm first concerned with delivering an experience that gets my revenue, then I'm concerned about controlling costs. Since I don't want to force users to wait for me to respond, I have to buy the throughput to satisfy their "busy hour" needs. I can reduce my cost by funneling every other kind of usage (i.e. backups of web content, downloads of access logs, etc.) into my less busy times of the day.

Definition of Percentile

The p^{th} percentile
of a set of measurements is the value for which at most p% of the measurements
are less than that value and at most (100-p)% of measurements are greater
than that value.

Some special
cases of percentiles exist. The median is equivalent to the 50^{th}
percentile. Quartiles occur at the 25^{th} and 75^{th} percentile.
Deciles occur every 10^{th} percentile, thus the ninth decile is
the 90^{th} percentile.

Two separate computations of percentile exist in the real world, **discrete
percentile** and **continuous percentile**. Discrete percentile differs
from continuous percentile in that the discrete percentile value must be
a member of the data set. Use discrete percentile only in the case of a
discrete distribution. It is important to note that the median of a discrete
distribution may not be defined; therefore, the 50^{th} discrete
percentile may not be the median if you don't have an odd number of measurements.

(Note: you don't want to use discrete percentiles in billing applications.)

Continuous
percentile treats the measurements as a statistical population and determines
the value that would be the discrete percentile by interpolating a value
when it isn't present. For example, if you are doing the 50^{th} percentile
and you have an odd number of measurements, the continuous and discrete
values are the same. If you have an even number of measurement, you interpolate
a value between the actual measurements that are just above and just below
the "perfect" center of your measurements.

Computing Continuous Percentile

Step 1. Sort the measurements

Samples must be ordered. For example, 1, 3, 7, 21, 25, 26 and 72 are my example measurements.

Row Number | 1 | 2 | 3 | 4 | 5 | 6 | 7 |

Value | 1 | 3 | 7 | 21 | 25 | 26 | 72 |

Step 2. Compute Row Numbers of the percentile value

Using percentile
value P and number of rows N

RN = 1+((N-1)*P)

FRN = floor(RN)

CRN = ceiling(RN)

For my example above, let the percentile number be 90 (USE 0.90). The number of rows is the number of data samples. This is 7.

The value for RN in our example is 1+((7-1)*90). This evaluates to 6.4. The FRN (floor row number) is 6. The CRN (Ceiling row number) is 7.

Step 3. Determine the Result

if (CRN =
FRN = RN) then

(value of expression from row at RN)

else

(value of expression for row at FRN) + (RN - FRN) * (CRN row - FRN row value)

Since the row numbers don't all match, we have to interpolate a value between two measurements. For our example, we interpolate between row 6 (value 26) and row 7 (value 72).

The equation is 26 + ((6.4 - 6.0) * (72 - 26)). It evaluates to 44.4. This is your continuous percentile for this sample.

ISP Errors

Improper Computation of RN

A simple mistake is the initial computation of RN as RN=N*P. The range of input for percentile is {0..1} inclusive. RN must be 1 for input 0 and N for input 1.

Use of Discrete Percentile

Bandwidth measurement data is a continuous distribution; using a discrete percentile value is inappropriate. Don't do it.

Rounding RN

This is not relevant if a continuous percentile is used. When a discrete percentile is computed the larger value, the value with the higher RN, must be selected to ensure that n% of the set values are equal to or below the result.

Using Percentage Rather Than Percentile

They use 95% of the maximum average bandwidth instead of the 95

^{th}percentile results in grossly different results whose difference is dependent on the overall traffic pattern of the customer.Sample Set

The majority of ISPs use SNMP to capture two counters for each network port. These counters represent the aggregate number of bytes sent or received. The mediation system should compute the average megabytes per second value from the difference between each consecutive poll, taking into account the actual time elapsed between polls.

Averaging Effects

The sample set is composed of the average bandwidth for each sample period during the billing period. If the sample period is the same, or greater than, the duration of the traffic spikes, averaging distorts the measurements.

In an idealized
case, the customer has high bandwidth utilization for 5 minute periods alternating
with 5 minute periods of no bandwidth usage. Their ISP polls their network
port using a 5 minute sample period. In the case where the ISP's sample
period is synchronous with the customer's high activity period, the ISP
collects alternating high and low samples. However, if the ISP's sampling
is 150 seconds out of phase with the customer's high activity period then
the ISP collects uniform samples with half the bandwidth used during the
high activity periods. The 95^{th} percentile value of the synchronous
sample is 100% more than the 95^{th} percentile value of the out
of phase sample.

Since our final mechanism for determining the customers billed traffic depends on a single value, any averaging effect that affects a single value can have an effect on the total billed amount.

Example - Averaging Effects

In our original example, Tim and Jim were watering their yard at 7 AM. After learning Tim was paying less, Jim moved his shower to 7 PM.

Since the water company can not read everyone's meter simultaneously they have decided to read Tim's meter on the even hours and Jim's meter on the odd hours.

Tim and Jim use the same amount of water each month, but when the water bills arrive, Tim's usage is half of the Jim's usage. This is because sampling divides Tim's peak usage into two sample periods while Jim's usage is contained in a single sample period.

Figure 3. Jim matches Tim's usage pattern. Then, the measurement pattern averages Jim's usage lower.

It seems unlikely that this could happen to an Internet user, but consider the following real example. A network based business that monitors remote devices on a regular business tells it customers that they will connect to their devices every hour and gather information from them. Their programmers are given the job of gathering information from all of the devices shown in a database on an hourly basis. They create program that will read the database, connect to the devices, and save the gathered information for additional processing. They schedule the program to run every hour at the beginning of the hour. This means that all the network traffic occurs during the first five minutes of each hour. In this example, they will move 200 megabytes in that five minute period. They will have one measurement of 200 megabytes and eleven measurements of 0 megabytes per hour.

Imagine instead that they break the job into quarters and check 1/4 of the devices every 15 minutes. This changes their measurements to 4 measurements of 50 megabytes and 8 measurements of 0 megabytes. If they split the work evenly over the hour, they will get 12 measurements of about 16 megabytes.

This is how their usage charts.

Figure 4. Example of the impact of scheduling work over time on network charges

If
they change the programming to spread the work out over the hour, they will
have 1/12th the network usage compared to doing their work in one batch
at the beginning of the hour. (Please note: If they can squeeze all their
usage into less than 5% of all the sample periods, then they're usage would
be eliminated in the calculation of the 95^{th} percentile. They're
vendor would probably have a minimum fee that they charge them.)

Jitter - The Deviation From The Ideal Timing Of An Event

Since most of the mechanisms for capturing traffic measurements depend on non-real-time systems, the periods for each measurement are rarely uniform. Because percentile billing operates on a set of values with equal weight, a wide time variance between samples may skew the results.

This is especially noticeable when traffic volumes delay sample measurements. In that case, high bandwidth samples represent longer periods than low bandwidth samples. However, the percentile function treats them equally causing an increase of the final bandwidth result.

Turning each sample period into multiple equal 1 second samples could mitigate these jitter effects.

Example - Sample Period Effects

Tim and Jim align their watering schedule with the water company's sampling schedule. They intend to have their peak usage divided equally into two separate sampling periods.

The water company consistently samples Jim's first sample period twenty minutes late making it a 140 minute sample. They correct for it by shortening a sampling period later in the day. The water company samples Tim's water usage correctly.

Tim's bill is ½ the average gallons per minute usage for watering his lawn. The water company bills Jim the usage recorded for the first period, however they divide the sample by the ideal sample period 120 minutes, and Jim's bill shows 80/120 or 2/3 of average gallons per minute for watering the lawn. This is 33% higher than Tim's usage measurement.

Example - Jitter Effects

Jim complains to the water company about his bill and they respond by correctly converting each sample period into gallons per minute, dividing by the actual sample period rather than the ideal sample period. The gallons per minute average for the first sample period correctly reflects usage and Jim's bill reflects a usage of 80/140 gallons per minute, more than 14% more than Tim's billed usage.

Figure 5. The impact of a distorted sample period timing and distorted sample length (jitter)

More Errors Made When Doing Percentile Billing

Ignoring Collection Period Length

Ignoring the actual duration of the collection period distorts the measurement. Each sample represents the average megabytes per second for the collection period. The bandwidth provider must therefore divide the total bytes transferred during the sample period by the actual length of the sample period and not by the idealized length of the sample period.

If the duration of the collection period is longer than the ideal duration, this error increases the average megabytes per second measurement and probably increases the customer billing.

Lost Samples

If the bandwidth provider loses a sample due to outage or packet loss then the only way to account for the traffic is to average it over a larger sample period. This accentuates the averaging effects described above possibly depriving the bandwidth provider of revenue.

**Number of
Samples Needed by a Bandwidth Provider**

If you are doing 90th percentile billing, you need to gather the usage measurements for each customer. For monthly billing with 5 minute measurement periods, you have a worst case scenario of having to process 8928 transactions per customer (31 days * 24 hours/day * 12 measurements/day).

If you've got a 1,000 customers, you are going to look at 8.9 million transactions per month.

>**You Can
Throw Away Data**

One of the attractions of 95th percentile billing is that you can throw away data after you've stored the first 38 hours of data each month. What I mean is that you only have to store and bill with the top 5.1% of the measurements made of a customer's usage. When you get new measurements, you can compare them to the measurements you've already stored for the month and only keep that ones that are the highest measurements for the month. For each customer you only have to store that highest 447 measurements and the number of measurements made during the month. You'll also want to store the date/time of the first and last measurements of the month so that you can pro-rate partial months of service.

**How Fragile
This Process**

The amount
charged depends on at most two measurements during the entire month. Sometimes
it will depend on a single measurement. If the number of samples works out
just right, there will be one measurement to determine the 95^{th}
percentile amount. Most of the time the value is an interpolation between
just two measurements. The amounts of the measurements above these 1-2 measurements
only have to be greater. It doesn't matter how much greater. The value of
all the other measurements lower than these 1-2 measurement are completely
irrelevant.

This means that it's important how you measure and calculate this information. If you make mistakes with these measurements, your costs or your revenues hang in the balance. The funny thing is that you can make mistakes on 99% of the measurements you get the same results so long as you don't screw up the one or two measurements that determine your percentile charge.

**What You
Really Charge For With 95th Percentile Billing**

When you do percentile billing, you charge for the maximum demand on the network. Yes, it's obvious to you by now, but it means that you aren't charging for the results received. Your charging is disconnected from the actions taken by the customer and their motivation for moving data. You are charging for your network costs without any relationship to the customer's business model. Think of it as "cost plus" charging.

**Summary**

Percentile billing provides opportunities and obstacles for the buyers of services charged on this basis. If you can spread out your usage, you can lower your average cost for data transferred. If you make the mistake of adding non-urgent data transfers during your busiest hours, you'll pump up your costs without getting any more total work done.

For the seller of network services, percentile billing works for and against them. Most providers are selling and buying bandwidth on this basis. Since most errors with percentile billing increase the charge, they must struggle with insuring that their vendors are properly calculating the amount that they are charged while taking whatever action they can to reduce peak bandwidth demand to reduce their costs. When they charge others they need to insure that they are correctly charging to avoid mis-billing and all the problems that arise from that.

I think that they would all be better off charging for the amount of bandwidth consumed with discounts for usage during lower demand periods. I'd also charge a premium for higher levels of guaranteed service (QoS).

Tell Me What You Want To Hear About

The subjects that I cover in Rating Matters are driven by my personal interests in rating and billing. These are limited by the breadth of my personal experience. Please let me know about items you want to hear about or you'd like explored further. Send me your requests at .

©Copyright
2002 Service Level LLC

Rating Matters is a trademark of Service Level LLC

This
newsletter was corrected on 28 February 2003. A sharp-eyed reader found
a calculation error in the example of calculating the 95^{th}
percentile. We also simplified the equation to calculate the percentile. |