您的位置:首页 > 运维架构 > Linux

Linux Advanced Routing and Traffic Control HOWTO

2006-12-02 10:42 190 查看
Linux Advanced Routing and Traffic Control HOWTO
上一页 第 11 章 unix和linux下一页

Linux Advanced Routing and Traffic Control HOWTO

Linux Advanced Routing and Traffic Control HOWTO

routing policy
Routing for multiple uplinks/providers
GRE and other tunnels
IP in IP tunneling
IPSEC: secure IP over the Internet
Multicast routing
Queueing Disciplines for Bandwidth Management
pfifo_fast
有3个queue,根据tos来分
ifconfig eth0 txqueuelen 10

Token Bucket Filter
The Token Bucket Filter (TBF) is a simple qdisc that only passes packets arriving at a rate which is not exceeding some administratively set rate, but with the possibility to allow short bursts in excess of this rate.

#tc qdisc add dev ppp0 root tbf rate 220kbit latency 50ms burst 1540

Stochastic(随机的) Fairness Queueing
It is important to note that SFQ is only useful in case your actual outgoing interface is really full

If your link is truly full and you want to make sure that no single session can dominate your outgoing bandwidth, use Stochastical Fairness Queueing

# tc qdisc add dev ppp0 root sfq perturb 10
# tc -s -d qdisc ls
qdisc sfq 800c: dev ppp0 quantum 1514b limit 128p flows 128/1024 perturb 10sec
Sent 4812 bytes 62 pkts (dropped 0, overlimits 0)

Summarizing, these are the simple queues that actually manage traffic by reordering, slowing or dropping packets.

classful qdisc

It is important to know that the filters are called from within a qdisc, and not the other way around!

Classes need to have the same major number as their parent. This major number must be unique within a egress or ingress setup. The minor number must be unique within a qdisc and his classes.

1:   root qdisc
|
1:1    child class
/  |  /
/   |   /
/    |    /
/    |    /
1:10  1:11  1:12   child classes
|      |     |
|     11:    |    leaf class
|            |
10:         12:   qdisc
/   /       /   /
10:1  10:2   12:1  12:2   leaf classes
But don't let this tree fool you! You should *not* imagine the kernel to be at the apex of the tree and the network below, that is just not the case. Packets get enqueued and dequeued at the root qdisc, which is the only thing the kernel talks to.

A packet might get classified in a chain like this:

1: -> 1:1 -> 1:12 -> 12: -> 12:2

The packet now resides in a queue in a qdisc attached to class 12:2. In this example, a filter was attached to each 'node' in the tree, each choosing a branch to take next. This can make sense. However, this is also possible:

1: -> 12:2

In this case, a filter attached to the root decided to send the packet directly to 12:2.

In short, nested classes ONLY talk to their parent qdiscs, never to an interface. Only the root qdisc gets dequeued by the kernel!
The upshot of this is that classes never get dequeued faster than their parents allow. And this is exactly what we want: this way we can have SFQ in an inner class, which doesn't do any shaping, only scheduling, and have a shaping outer qdisc, which does the shaping

PRIO qdisc

The PRIO qdisc doesn't actually shape, it only subdivides traffic based on how you configured your filters. You can consider the PRIO qdisc a kind of pfifo_fast on steroids, whereby each band is a separate class instead of a simple FIFO.

When a packet is enqueued to the PRIO qdisc, a class is chosen based on the filter commands you gave. By default, three classes are created. These classes by default contain pure FIFO qdiscs with no internal structure, but you can replace these by any qdisc you have available.

Whenever a packet needs to be dequeued, class :1 is tried first. Higher classes are only used if lower bands all did not give up a packet.

We will create this tree:

1:   root qdisc
/ | /
/   |   /
/   |   /
1:1  1:2  1:3    classes
|    |    |
10:  20:  30:    qdiscs    qdiscs
sfq  tbf  sfq
band  0    1    2

Bulk traffic will go to 30:, interactive traffic to 20: or 10:.

Command lines:

# tc qdisc add dev eth0 root handle 1: prio
## This *instantly* creates classes 1:1, 1:2, 1:3

# tc qdisc add dev eth0 parent 1:1 handle 10: sfq
# tc qdisc add dev eth0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600 limit 3000
# tc qdisc add dev eth0 parent 1:3 handle 30: sfq

CBQ qdisc
不是很清楚原理,再看看,太复杂
Besides being classful, CBQ is also a shaper and it is in that aspect that it really doesn't work very well.

1:           root qdisc
|
1:1           child class
/   /
/     /
1:3     1:4       leaf classes
|       |
30:     40:       qdiscs
(sfq)   (sfq)

# tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit         /
avpkt 1000 cell 8
# tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit  /
rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20      /
avpkt 1000 bounded
# tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit  /
rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20      /
avpkt 1000
# tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit  /
rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20      /
avpkt 1000

# tc qdisc add dev eth0 parent 1:3 handle 30: sfq
# tc qdisc add dev eth0 parent 1:4 handle 40: sfq

# tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip /
sport 80 0xffff flowid 1:3
# tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip /
sport 25 0xffff flowid 1:4

Hierarchical Token Bucket(HTB)

HTB works just like CBQ but does not resort to idle time calculations to shape. Instead, it is a classful Token Bucket Filter - hence the name. It has only a few parameters, which are well documented on his site.

# tc qdisc add dev eth0 root handle 1: htb default 30

# tc class add dev eth0 parent 1: classid 1:1 htb rate 6mbit burst 15k

# tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit burst 15k
# tc class add dev eth0 parent 1:1 classid 1:20 htb rate 3mbit ceil 6mbit burst 15k
# tc class add dev eth0 parent 1:1 classid 1:30 htb rate 1kbit ceil 6mbit burst 15k

The author then recommends SFQ for beneath these classes:

# tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10
# tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10
# tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10

Add the filters which direct traffic to the right classes:

# U32="tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32"
# $U32 match ip dport 80 0xffff flowid 1:10
# $U32 match ip sport 25 0xffff flowid 1:20

Intermediate queueing device (IMQ)
qdisc的缺点
1。Only egress shaping is possible (an ingress qdisc exists, but its possibilities are very limited compared to classful qdiscs).
2。A qdisc can only see traffic of one interface, global limitations can't be placed.

IMQ is there to help solve those two limitations. In short

tc qdisc add dev imq0 root handle 1: htb default 20

tc class add dev imq0 parent 1: classid 1:1 htb rate 2mbit burst 15k

tc class add dev imq0 parent 1:1 classid 1:10 htb rate 1mbit
tc class add dev imq0 parent 1:1 classid 1:20 htb rate 1mbit

tc qdisc add dev imq0 parent 1:10 handle 10: pfifo
tc qdisc add dev imq0 parent 1:20 handle 20: sfq

tc filter add dev imq0 parent 10:0 protocol ip prio 1 u32 match /
ip dst 10.0.0.230/32 flowid 1:10

iptables -t mangle -A PREROUTING -i eth0 -j IMQ --todev 0

ip link set imq0 up

Load sharing over multiple interfaces

# tc qdisc add dev eth1 root teql0
# tc qdisc add dev eth2 root teql0
# ip link set dev teql0 up

Netfilter & iproute - marking packets
iptables -A PREROUTING -i eth0 -t mangle -p tcp --dport 25 /
-j MARK --set-mark 1

# echo 201 mail.out >> /etc/iproute2/rt_tables
# ip rule add fwmark 1 table mail.out
# ip rule ls
0:	from all lookup local
32764:	from all fwmark        1 lookup mail.out
32766:	from all lookup main
32767:	from all lookup default

/sbin/ip route add default via 195.96.98.253 dev ppp0 table mail.out

It is important to realise that we can only shape data that we transmit.

incoming (ingress) or outgoing (egress).

tc图
Userspace programs
^
|
+---------------+-----------------------------------------+
|               Y                                         |
|    -------> IP Stack                                    |
|   |              |                                      |
|   |              Y                                      |
|   |              Y                                      |
|   ^              |                                      |
|   |  / ----------> Forwarding ->                        |
|   ^ /                           |                       |
|   |/                            Y                       |
|   |                             |                       |
|   ^                             Y          /-qdisc1-/   |
|   |                            Egress     /--qdisc2--/  |
--->->Ingress                       Classifier ---qdisc3---- | ->
|   Qdisc                                   /__qdisc4__/  |
|                                            /-qdiscN_/   |
|                                                         |
+----------------------------------------------------------+

The u32 classifier

The U32 filter is the most advanced filter available in the current implementation. It entirely based on hashing tables, which make it robust when there are many filter rules.

# tc filter add dev eth0 protocol ip parent 1:0 pref 10 u32 /
match u32 00100000 00ff0000 at 0 flowid 1:10

The route classifier
# ip route add 192.168.10.0/24 via 192.168.10.1 dev eth1 realm 10
# tc filter add dev eth1 parent 1:0 protocol ip prio 100 /
route to 10 classid 1:10

ip route add 192.168.2.0/24 dev eth2 realm 2
# tc filter add dev eth1 parent 1:0 protocol ip prio 100 /
route from 2 classid 1:2

Policing filters

tc filter add dev $DEV parent ffff: /
protocol ip prio 20 /
u32 match ip protocol 1 0xff /
police rate 2kbit buffer 10k drop /
flowid :1

tc filter add dev $DEV parent ffff: /
protocol ip prio 20 /
u32 match tos 0 0 /
police mtu 84 drop /
flowid :1

tc filter add dev $DEV parent ffff: /
protocol ip prio 20 /
u32 match ip protocol 1 0xff /
police mtu 1 drop /
flowid :1

Hashing filters for very fast massive filtering

Configuration is pretty complicated, but very worth it by the time you have this many rules. First we make a filter root, then we create a table with 256 entries:

# tc filter add dev eth1 parent 1:0 prio 5 protocol ip u32
# tc filter add dev eth1 parent 1:0 prio 5 handle 2: protocol ip u32 divisor 256

# tc filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 2:7b: /
match ip src 1.2.0.123 flowid 1:1
# tc filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 2:7b: /
match ip src 1.2.1.123 flowid 1:2
# tc filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 2:7b: /
match ip src 1.2.3.123 flowid 1:3
# tc filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 2:7b: /
match ip src 1.2.4.123 flowid 1:2

# tc filter add dev eth1 protocol ip parent 1:0 prio 5 u32 ht 800:: /
match ip src 1.2.0.0/16 /
hashkey mask 0x000000ff at 12 /
link 2

bfifo/pfifo

DSMARK--好像时控制Qos的
????
skb->ihp->tos
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >
|                                                       |     ^
| -- If you declare set_tc_index, we set DS             |     |  <-----May change
|    value into skb->tc_index variable                  |     |O       DS field
|                                                      A|     |R
+-|-+      +------+    +---+-+    Internal   +-+     +---N|-----|----+
| | |      | tc   |--->|   | |-->  . . .  -->| |     |   D|     |    |
| | |----->|index |--->|   | |     Qdisc     | |---->|    v     |    |
| | |      |filter|--->| | | +---------------+ |   ---->(mask,value) |
-->| O |      +------+    +-|-+--------------^----+  /  |  (.  ,  .)    |
| | |          ^         |                |       |  |  (.  ,  .)    |
| | +----------|---------|----------------|-------|--+  (.  ,  .)    |
| | sch_dsmark |         |                |       |                  |
+-|------------|---------|----------------|-------|------------------+
|            |         | <- tc_index -> |       |
|            |(read)   |    may change  |       |  <--------------Index to the
|            |         |                |       |                    (mask,value)
v            |         v                v       |                    pairs table
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ->
skb->tc_index

Ingress qdisc

Random Early Detection (RED)


上一页 上一级下一页
Bugzilla使用 起始页在linux下刻录光盘
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: