深入理解Linux网络技术内幕 第8章 设备注册和初始化
设备注册和初始化
设备注册
网络设备注册发生在下列情况:
- 加载NIC设备驱动程序
NIC设备驱动初始化时,该驱动程序控制的所有的NIC都会被注册。 - 插入可热插拔网络设备
前边章节知道加载PCI设备驱动程序导致pci_driver->probe函数执行,probe函数由驱动程序提供,并由该函数负责设备的注册。
设备注销
以下情况触发设备的注销:
- 卸载NIC设备驱动程序
仅仅针对那些以模块加载的驱动程序。不适用内建到内核的驱动程序。 - 删除可热插拔设备
分配net_device结构
内核使用alloc_etherdev_mqs函数分配struct net_device结构,该函数会调用alloc_netdev_mqs函数进行实际的分配。
传入的第一个参数是驱动程序扩充私有数据块区域大小,驱动程序可以用此区域存储驱动程序参数信息。
第二个时设备名称,在alloc_etherdev_mqs函数中生成网卡命名规则,为eth%d,。
setup函数参数用于初始化net_device的部分字段。
/** * alloc_netdev_mqs - allocate network device * @sizeof_priv: size of private data to allocate space for * @name: device name format string * @name_assign_type: origin of device name * @setup: callback to initialize device * @txqs: the number of TX subqueues to allocate * @rxqs: the number of RX subqueues to allocate * * Allocates a struct net_device with private data area for driver use * and performs basic initialization. Also allocates subqueue structs * for each queue on the device. */ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, unsigned char name_assign_type, void (*setup)(struct net_device *), unsigned int txqs, unsigned int rxqs)
一般会使用包裹函数对alloc_netdev_mqs进行包裹,比如Ethernet设备使用alloc_etherdev_mqs函数申请net_device。
struct net_device *alloc_etherdev_mqs(int sizeof_priv, unsigned int txqs, unsigned int rxqs) { return alloc_netdev_mqs(sizeof_priv, "eth%d", NET_NAME_UNKNOWN, ether_setup, txqs, rxqs); }
NIC注册和注销架构
设备注册两个关键步骤
- 使用alloc_etherdev分配net_device结构,alloc_etherdev会为Ethernet设备通用参数做初始化。
- 调用register_netdev为函数注册。
设备注销两个关键步骤
- unregister_netdev函数将设备注销掉
- free_netdev将申请的netdev释放掉。
设备初始化
Ethernet设备在申请netdev时使用ether_setup函数初始化netdev中的某些字段。
header_ops包含操作L2链路层报文的函数。
/** * ether_setup - setup Ethernet network device * @dev: network device * * Fill in the fields of the device structure with Ethernet-generic values. */ void ether_setup(struct net_device *dev) { dev->header_ops = ð_header_ops; dev->type = ARPHRD_ETHER; dev->hard_header_len = ETH_HLEN; dev->min_header_len = ETH_HLEN; dev->mtu = ETH_DATA_LEN; dev->min_mtu = ETH_MIN_MTU; dev->max_mtu = ETH_DATA_LEN; dev->addr_len = ETH_ALEN; dev->tx_queue_len = DEFAULT_TX_QUEUE_LEN; dev->flags = IFF_BROADCAST|IFF_MULTICAST; dev->priv_flags |= IFF_TX_SKB_SHARING; eth_broadcast_addr(dev->broadcast); }
驱动程序初始化netdev_ops和ethtool_ops 两个字段。
netdev_ops包括管理网卡的可能的函数。
ethtool_ops 包括可选的网卡设备操作。
netdev->netdev_ops = &e100_netdev_ops; netdev->ethtool_ops = &e100_ethtool_ops;
上面提到的函数很多不需要初始化,相关的函数指针时NULL,使用前需要判断。
net_device组织
net_device数据结构插入全局链表和两个hash表中。
dev_list将内核中的net_device通过链表的形式组织起来。
name_hlist将内核中的net_device通过以网卡Name为key的HASH表组织起来。
index_hlist将内核中的net_device通过以网卡ifindex为key的HASH表组织起来。
这些不同的结构让内核按需求查找net_device结构。
struct hlist_node name_hlist; struct hlist_node index_hlist; struct list_head dev_list;
设备状态
net_device结构中和设备状态有关的字段:
unsigned long state;//@state:Generic network queuing layer state, see netdev_state_t unsigned int flags;//@flags:Interface flags (a la BSD) enum { NETREG_UNINITIALIZED=0, NETREG_REGISTERED, /* completed register_netdevice */ NETREG_UNREGISTERING, /* called unregister_netdevice */ NETREG_UNREGISTERED, /* completed unregister todo */ NETREG_RELEASED, /* called free_netdev */ NETREG_DUMMY, /* dummy device for NAPI poll */ } reg_state:8;//Register/unregister state machine
队列规则状态
每个网络设备都会被分配一种队列规则,流量控制使用这种队列规则实现QoS机制。net_device结构的state字段是流量控制使用的字段之一。
state可以设置以下标识:
-
__LINK_STATE_START
设备开启,可以由函数netif_running检测。 -
__LINK_STATE_PRESENT
设备存在,可热插拔设备可以暂时删除。当系统进入挂起模式然后重新继续运行时,此标志也会被清除然后再取回值。 -
__LINK_STATE_NOCARRIER
NIC接口没有载波,网口处于down的状态。 -
__LINK_STATE_LINKWATCH_PENDING
-
__LINK_STATE_DORMANT
/* These flag bits are private to the generic network queueing * layer; they may not be explicitly referenced by any other * code. */ enum netdev_state_t { __LINK_STATE_START, __LINK_STATE_PRESENT, __LINK_STATE_NOCARRIER, __LINK_STATE_LINKWATCH_PENDING, __LINK_STATE_DORMANT, };
注册状态
网络设备的注册状态存储在reg_state字段中。
enum { NETREG_UNINITIALIZED=0, NETREG_REGISTERED, /* completed register_netdevice */ NETREG_UNREGISTERING, /* called unregister_netdevice */ NETREG_UNREGISTERED, /* completed unregister todo */ NETREG_RELEASED, /* called free_netdev */ NETREG_DUMMY, /* dummy device for NAPI poll */ } reg_state:8;
设备的注册和注销
网络设备的驱动程序通过register_netdev和unregister_netdev函数向内核注册和注销设备。
设备注册
register_netdev会调用register_netdevice进一步的处理。
register_netdevice会使用dev_get_valid_name为网卡完成命名。alloc_etherdev_mqs在申请net_device时,网卡的名字初始化为"eth%d",在dev_get_valid_name将%d修改为网口编号。
如果dev->netdev_ops->ndo_init设置了回调函数则需要调用该函数。
向通知链发送网卡注册消息。
向sysfs注册网卡信息。
标记这个net_device的注册状态为NETREG_REGISTERED。
ret = dev_get_valid_name(net, dev, dev->name); if (ret < 0) goto out; /* Init, if this function is available */ if (dev->netdev_ops->ndo_init) { ret = dev->netdev_ops->ndo_init(dev); if (ret) { if (ret > 0) ret = -EIO; goto out; } } ... ret = call_netdevice_notifiers(NETDEV_POST_INIT, dev); ret = notifier_to_errno(ret); if (ret) goto err_uninit; ret = netdev_register_kobject(dev); if (ret) { dev->reg_state = NETREG_UNREGISTERED; goto err_uninit; } dev->reg_state = NETREG_REGISTERED;
函数list_netdevice负责将该net_device放入全局链表和两个hash表中。
list_netdevice(dev); static void list_netdevice(struct net_device *dev) { struct net *net = dev_net(dev); ASSERT_RTNL(); write_lock_bh(&dev_base_lock); list_add_tail_rcu(&dev->dev_list, &net->dev_base_head); hlist_add_head_rcu(&dev->name_hlist, dev_name_hash(net, dev->name)); hlist_add_head_rcu(&dev->index_hlist, dev_index_hash(net, dev->ifindex)); write_unlock_bh(&dev_base_lock); dev_base_seq_inc(net); }
通过函数dev_init_scheduler初始化设备的队列规则,实现Qos功能。队列规则定义出口报文如何进入、退出出口队列的规则。定义开始丢掉报文前有多少报文可以在队列中等。
netdev_run_todo
register_netdevice函数负责一部分注册工作,然后在让netdev_run_todo完成其余的工作。
对net_device结构的修改需要rtnl_mutex(Rounting Netlink)信号量的保护。所以在调用register_netdevice函数之前需要先调用rtnl_lock_killable锁定该信号量,并在完成后释放该信号量。
int register_netdev(struct net_device *dev) { int err; if (rtnl_lock_killable()) return -EINTR; err = register_netdevice(dev); rtnl_unlock(); return err; }
rtnl_unlock函数中调用netdev_run_todo函数。为什么需要在释放锁的时候调用这个netdev_run_todo函数呢?
void rtnl_unlock(void) { /* This fellow will unlock it for us. */ netdev_run_todo(); }
查看netdev_run_todo函数代码和注释可知,这样设计原因可以解决以下问题:
- 这样避免因删除sysfs objects时引起的热插拔事件通过keventd导致的和linkwatch的死锁。
- 因为我们运行时没有获得RTNL信号量,我们可以为了等待netdev的refcnt到0而安全的进入睡眠。我们必须在所有的注销事件完成后才能返回。
/* The sequence is: * * rtnl_lock(); * ... * register_netdevice(x1); * register_netdevice(x2); * ... * unregister_netdevice(y1); * unregister_netdevice(y2); * ... * rtnl_unlock(); * free_netdev(y1); * free_netdev(y2); * * We are invoked by rtnl_unlock(). * This allows us to deal with problems: * 1) We can delete sysfs objects which invoke hotplug * without deadlocking with linkwatch via keventd. * 2) Since we run with the RTNL semaphore not held, we can sleep * safely in order to wait for the netdev refcnt to drop to zero. * * We must not return until all unregister events added during * the interval the lock was held have been completed. */ void netdev_run_todo(void) { struct list_head list; /* Snapshot list, allow later requests */ list_replace_init(&net_todo_list, &list); __rtnl_unlock(); /* Wait for rcu callbacks to finish before next phase */ if (!list_empty(&list)) rcu_barrier(); while (!list_empty(&list)) { struct net_device *dev = list_first_entry(&list, struct net_device, todo_list); list_del(&dev->todo_list); if (unlikely(dev->reg_state != NETREG_UNREGISTERING)) { pr_err("network todo '%s' but state %d\n", dev->name, dev->reg_state); dump_stack(); continue; } dev->reg_state = NETREG_UNREGISTERED; netdev_wait_allrefs(dev); /* paranoia */ BUG_ON(netdev_refcnt_read(dev)); BUG_ON(!list_empty(&dev->ptype_all)); BUG_ON(!list_empty(&dev->ptype_specific)); WARN_ON(rcu_access_pointer(dev->ip_ptr)); WARN_ON(rcu_access_pointer(dev->ip6_ptr)); #if IS_ENABLED(CONFIG_DECNET) WARN_ON(dev->dn_ptr); #endif if (dev->priv_destructor) dev->priv_destructor(dev); if (dev->needs_free_netdev) free_netdev(dev); /* Report a network device has been unregistered */ rtnl_lock(); dev_net(dev)->dev_unreg_count--; __rtnl_unlock(); wake_up(&netdev_unregistering_wq); /* Free network device */ kobject_put(&dev->dev.kobj); } }
设备注册状态通知
网络设备注册、注销、关闭、开启事件通过两个通知链传递
- netdev_chain
- Netlink的REMGRP_LINK多播群组
netdev_chain
设备注册和注销各个阶段都是通过这个通知链报告的。
内核通过register_netdevice_notifier和unregister_netdevice_notifier两个函数处理通知链。
通过call_netdevice_notifiers函数发送通知链信息,支持的信息如下:
/* netdevice notifier chain. Please remember to update netdev_cmd_to_name() * and the rtnetlink notification exclusion list in rtnetlink_event() when * adding new types. */ enum netdev_cmd { NETDEV_UP = 1, /* For now you can't veto a device up/down */ NETDEV_DOWN, NETDEV_REBOOT, /* Tell a protocol stack a network interface detected a hardware crash and restarted - we can use this eg to kick tcp sessions once done */ NETDEV_CHANGE, /* Notify device state change */ NETDEV_REGISTER, NETDEV_UNREGISTER, NETDEV_CHANGEMTU, /* notify after mtu change happened */ NETDEV_CHANGEADDR, NETDEV_GOING_DOWN, NETDEV_CHANGENAME, NETDEV_FEAT_CHANGE, NETDEV_BONDING_FAILOVER, NETDEV_PRE_UP, NETDEV_PRE_TYPE_CHANGE, NETDEV_POST_TYPE_CHANGE, NETDEV_POST_INIT, NETDEV_RELEASE, NETDEV_NOTIFY_PEERS, NETDEV_JOIN, NETDEV_CHANGEUPPER, NETDEV_RESEND_IGMP, NETDEV_PRECHANGEMTU, /* notify before mtu change happened */ NETDEV_CHANGEINFODATA, NETDEV_BONDING_INFO, NETDEV_PRECHANGEUPPER, NETDEV_CHANGELOWERSTATE, NETDEV_UDP_TUNNEL_PUSH_INFO, NETDEV_UDP_TUNNEL_DROP_INFO, NETDEV_CHANGE_TX_QUEUE_LEN, NETDEV_CVLAN_FILTER_PUSH_INFO, NETDEV_CVLAN_FILTER_DROP_INFO, NETDEV_SVLAN_FILTER_PUSH_INFO, NETDEV_SVLAN_FILTER_DROP_INFO, };
当其他子系统通过register_netdevice_notifier注册通知链时,该函数会将内核中已经存在的网卡信息重新回放给注册者。
这样新注册的系统也可以得知系统网卡的状态。
注册netdev_chain的内核组件有:
- 路由
- 防火墙
- 协议代码
- 虚拟设备
- RTnetlink
设备注销
要把设备注销,内核需要操作如下:
- 以dev_close关闭设备
- 释放所有的资源(IO IRQ 端口)
- 将全局链表和两个hash表中的netdevice指针删除。
- 一旦结构中的所有引用计数都释放后,将释放netdevice结构。
- 删除/proc/和sysfs下添加的文件。
unregister_netdev函数
unregister_netdev函数和register_netdev函数类似先调用rtnl_lock加锁。
void unregister_netdev(struct net_device *dev) { rtnl_lock(); unregister_netdevice(dev); rtnl_unlock(); } EXPORT_SYMBOL(unregister_netdev);
unregister_netdev调用unregister_netdevice_queue函数在内核中将设备移除。之后将剩余工作交给通过调用net_set_todo在rtnl_unlock调用时完成。
/** * unregister_netdevice_queue - remove device from the kernel * @dev: device * @head: list * * This function shuts down a device interface and removes it * from the kernel tables. * If head not NULL, device is queued to be unregistered later. * * Callers must hold the rtnl semaphore. You may want * unregister_netdev() instead of this. */ void unregister_netdevice_queue(struct net_device *dev, struct list_head *head) { ASSERT_RTNL(); if (head) { list_move_tail(&dev->unreg_list, head); } else { rollback_registered(dev); /* Finish processing unregister after unlock */ net_set_todo(dev); } }
rollback_registered函数负责实际的注销工作。
static void rollback_registered_many(struct list_head *head) { struct net_device *dev, *tmp; LIST_HEAD(close_head); BUG_ON(dev_boot_phase); ASSERT_RTNL(); list_for_each_entry_safe(dev, tmp, head, unreg_list) { /* Some devices call without registering * for initialization unwind. Remove those * devices and proceed with the remaining. */ if (dev->reg_state == NETREG_UNINITIALIZED) { pr_debug("unregister_netdevice: device %s/%p never was registered\n", dev->name, dev); WARN_ON(1); list_del(&dev->unreg_list); continue; } dev->dismantle = true; BUG_ON(dev->reg_state != NETREG_REGISTERED); } /* If device is running, close it first. */ list_for_each_entry(dev, head, unreg_list) list_add_tail(&dev->close_list, &close_head); dev_close_many(&close_head, true); list_for_each_entry(dev, head, unreg_list) { /* And unlink it from device chain. */ unlist_netdevice(dev); dev->reg_state = NETREG_UNREGISTERING; } flush_all_backlogs(); synchronize_net(); list_for_each_entry(dev, head, unreg_list) { struct sk_buff *skb = NULL; /* Shutdown queueing discipline. */ dev_shutdown(dev); dev_xdp_uninstall(dev); /* Notify protocols, that we are about to destroy * this device. They should clean all the things. */ call_netdevice_notifiers(NETDEV_UNREGISTER, dev); if (!dev->rtnl_link_ops || dev->rtnl_link_state == RTNL_LINK_INITIALIZED) skb = rtmsg_ifinfo_build_skb(RTM_DELLINK, dev, ~0U, 0, GFP_KERNEL, NULL, 0); /* * Flush the unicast and multicast chains */ dev_uc_flush(dev); dev_mc_flush(dev); if (dev->netdev_ops->ndo_uninit) dev->netdev_ops->ndo_uninit(dev); if (skb) rtmsg_ifinfo_send(skb, dev, GFP_KERNEL); /* Notifier chain MUST detach us all upper devices. */ WARN_ON(netdev_has_any_upper_dev(dev)); WARN_ON(netdev_has_any_lower_dev(dev)); /* Remove entries from kobject tree */ netdev_unregister_kobject(dev); #ifdef CONFIG_XPS /* Remove XPS queueing entries */ netif_reset_xps_queues_gt(dev, 0); #endif } synchronize_net(); list_for_each_entry(dev, head, unreg_list) dev_put(dev); }
引用计数
net_device只有在所有的引用计数都释放时才会被释放。
所以unregister_netdev调用后,引用计数不为0,不能讲net_device结构删除,内核必须等待内核其他部分将引用都释放为止。但是该设备注销后就不能再使用了,内核必须通知所有的引用持有者使其释放引用,通知过程也是通过向netdev_chain发送注销通知信息实现的。
上一小节说到rtnl_unlock函数调用netdev_run_todo,而netdev_run_todo会调用netdev_wait_allrefs。一直等待下去,知道net_device的引用计数为0。
netdev_wait_allrefs
netdev_wait_allrefs由一个循环组成,netdev_refcnt降为0时结束。
循环中没一秒发送一次NETDEV_UNREGISTER到netdev_chain通知链。
每隔10秒钟打印一次警告信息。
/** * netdev_wait_allrefs - wait until all references are gone. * @dev: target net_device * * This is called when unregistering network devices. * * Any protocol or device that holds a reference should register * for netdevice notification, and cleanup and put back the * reference if they receive an UNREGISTER event. * We can get stuck here if buggy protocols don't correctly * call dev_put. */ static void netdev_wait_allrefs(struct net_device *dev) { unsigned long rebroadcast_time, warning_time; int refcnt; linkwatch_forget_dev(dev); rebroadcast_time = warning_time = jiffies; refcnt = netdev_refcnt_read(dev); while (refcnt != 0) { if (time_after(jiffies, rebroadcast_time + 1 * HZ)) { rtnl_lock(); /* Rebroadcast unregister notification */ call_netdevice_notifiers(NETDEV_UNREGISTER, dev); __rtnl_unlock(); rcu_barrier(); rtnl_lock(); if (test_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) { /* We must not have linkwatch events * pending on unregister. If this * happens, we simply run the queue * unscheduled, resulting in a noop * for this device. */ linkwatch_run_queue(); } __rtnl_unlock(); rebroadcast_time = jiffies; } msleep(250); refcnt = netdev_refcnt_read(dev); if (refcnt && time_after(jiffies, warning_time + 10 * HZ)) { pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n", dev->name, refcnt); warning_time = jiffies; } } }
开启设备
设备一旦注册就可用了,大师除非由应用程序明确开启,否则还是无法传输和接收报文。开启设备由dev_open函数负责。
开启设备由下列人物要做:
- 调用驱动程序注册的dev->netdev_ops中的相关回调函数。
- 设置dev->state的__LINK_STATE_START标记。
- 设置dev->flags的IFF_UP标记。
- 调用dev_activate函数初始化流量控制使用的出口队列规则,然后启动看门狗定时器。如果没有配置流量控制,就指定默认的FIFO队列。
- 传送NETDEV_UP到Netdev_chain通知链
关闭设备
网络设备由dev_close负责,大概有以下任务要做
- 传送NETDEV_DOWN到netdev_chain通知链。
- 调用dev_deactivate_many函数关闭出口队列规则。设备将无法再用于传输数据,停止看门狗定时器。
- 清除dev->state的__LINK_STATE_START标记
- 清除dev->flags 的~IFF_UP标记
- 如果dev->netdev_ops->ndo_stop有定义,就调用该函数。
更新设备队列规则状态
和电源管理之间的交互
pci_driver结构的suspend和resume函数根据内核是否支持电源管理进行初始化。系统进入挂起状态时,执行设备驱动程序提供的suspend函数,让驱动程序采取动作,电源管理不影响netdevice->reg_state但是要更新netdevice->state结构。
挂起设备
挂起设备时调用suspend函数处理此事件,动作包括:
- 清除dev_state的__LINK_STATE_PRESENT标记。
- 如果设备已开启就调用netif_stop_queue关闭出口队列。防止再次传递数据包。
netif_device_detach函数负责处理
/** * netif_device_detach - mark device as removed * @dev: network device * * Mark device as removed from system and therefore no longer available. */ void netif_device_detach(struct net_device *dev) { if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) && netif_running(dev)) { netif_tx_stop_all_queues(dev); } }
设备继续运行
resume函数负责设备继续运行,由netif_device_attach负责处理:
void netif_device_attach(struct net_device *dev) { if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) && netif_running(dev)) { netif_tx_wake_all_queues(dev); __netdev_watchdog_up(dev); } }
链接状态变更检测
当NIC设备驱动程序侦测到载波信号是否存在时,由NIC通知或者读取NIC寄存器得出。可以利用netif_carrier_on和netif_carrier_off通知内核。
链接状态变化情况:
- 网线插入或者拔出NIC
- 网线另一侧设备状态发生变化
设备驱动程序发现载波消失调用netif_carrier_off函数。函数会设置__LINK_STATE_NOCARRIER标记,并调用linkwatch_fire_event处理。
void netif_carrier_off(struct net_device *dev) { if (!test_and_set_bit(__LINK_STATE_NOCARRIER, &dev->state)) { if (dev->reg_state == NETREG_UNINITIALIZED) return; atomic_inc(&dev->carrier_down_count); linkwatch_fire_event(dev); } }
驱动程序检测到链接由载波时调用netif_carrier_on函数。清除__LINK_STATE_NOCARRIER标记,并调用linkwatch_fire_event函数。
void netif_carrier_on(struct net_device *dev) { if (test_and_clear_bit(__LINK_STATE_NOCARRIER, &dev->state)) { if (dev->reg_state == NETREG_UNINITIALIZED) return; atomic_inc(&dev->carrier_up_count); linkwatch_fire_event(dev); if (netif_running(dev)) __netdev_watchdog_up(dev); } }
linkwatch_fire_event函数检查net_device->state字段是否有设置__LINK_STATE_LINKWATCH_PENDING标记,如果没有设置的话就调用linkwatch_add_event函数,该函数只是将dev->link_watch_list放到lweventlist链表结尾。
lweventlist链表中的设备载波发生了变化,即使发生了多次链表中也只有一个元素,因为持有的时net_device结构指针,
一旦net_device加入到了lweventlist链表或者linkwatch_urgent_event函数返回true,就需要把这个事件交给keventd_wq内核线程调度执行。
为了防止linkwatch_event执行过于频繁,其执行频率限制为每秒1次。
static void linkwatch_add_event(struct net_device *dev) { unsigned long flags; spin_lock_irqsave(&lweventlist_lock, flags); if (list_empty(&dev->link_watch_list)) { list_add_tail(&dev->link_watch_list, &lweventlist); dev_hold(dev); } spin_unlock_irqrestore(&lweventlist_lock, flags); } void linkwatch_fire_event(struct net_device *dev) { bool urgent = linkwatch_urgent_event(dev); if (!test_and_set_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) { linkwatch_add_event(dev); } else if (!urgent) return; linkwatch_schedule_work(urgent); }
linkwatch_schedule_work函数会调用工作队列workqueue_struct的调度函数,使workqueue注册时的回调函数linkwatch_event被调用。
static DECLARE_DELAYED_WORK(linkwatch_work, linkwatch_event); static void linkwatch_schedule_work(int urgent) { unsigned long delay = linkwatch_nextevent - jiffies; if (test_bit(LW_URGENT, &linkwatch_flags)) return; /* Minimise down-time: drop delay for up event. */ if (urgent) { if (test_and_set_bit(LW_URGENT, &linkwatch_flags)) return; delay = 0; } /* If we wrap around we'll delay it by at most HZ. */ if (delay > HZ) delay = 0; /* * If urgent, schedule immediate execution; otherwise, don't * override the existing timer. */ if (test_bit(LW_URGENT, &linkwatch_flags)) mod_delayed_work(system_wq, &linkwatch_work, 0); else schedule_delayed_work(&linkwatch_work, delay); }
linkwatch_event函数调用__linkwatch_run_queue函数。
在__linkwatch_run_queue函数会为link_watch_list上的每个设备调用linkwatch_do_dev函数。
linkwatch_do_dev函数中清除__LINK_STATE_LINKWATCH_PENDING标记,并向netdev_chain发送通知信息。
static void linkwatch_do_dev(struct net_device *dev) { /* * Make sure the above read is complete since it can be * rewritten as soon as we clear the bit below. */ smp_mb__before_atomic(); /* We are about to handle this device, * so new events can be accepted */ clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state); rfc2863_policy(dev); if (dev->flags & IFF_UP) { if (netif_carrier_ok(dev)) dev_activate(dev); else dev_deactivate(dev); netdev_state_change(dev); } dev_put(dev); } static void __linkwatch_run_queue(int urgent_only) { struct net_device *dev; LIST_HEAD(wrk); /* * Limit the number of linkwatch events to one * per second so that a runaway driver does not * cause a storm of messages on the netlink * socket. This limit does not apply to up events * while the device qdisc is down. */ if (!urgent_only) linkwatch_nextevent = jiffies + HZ; /* Limit wrap-around effect on delay. */ else if (time_after(linkwatch_nextevent, jiffies + HZ)) linkwatch_nextevent = jiffies; clear_bit(LW_URGENT, &linkwatch_flags); spin_lock_irq(&lweventlist_lock); list_splice_init(&lweventlist, &wrk); while (!list_empty(&wrk)) { dev = list_first_entry(&wrk, struct net_device, link_watch_list); list_del_init(&dev->link_watch_list); if (urgent_only && !linkwatch_urgent_event(dev)) { list_add_tail(&dev->link_watch_list, &lweventlist); continue; } spin_unlock_irq(&lweventlist_lock); linkwatch_do_dev(dev); spin_lock_irq(&lweventlist_lock); } if (!list_empty(&lweventlist)) linkwatch_schedule_work(0); spin_unlock_irq(&lweventlist_lock); } static void linkwatch_event(struct work_struct *dummy) { rtnl_lock(); __linkwatch_run_queue(time_after(linkwatch_nextevent, jiffies)); rtnl_unlock(); }
虚拟设备
虚拟设备使用场景
- Bonding接口
- VLAN接口
- 深入理解Linux网络技术内幕——设备的注册与初始化(一)
- 深入理解Linux网络技术内幕——设备的注册与初始化(二)
- 深入理解Linux网络技术内幕——设备的注册与初始化(二)
- 深入理解Linux网络技术内幕-设备注册和初始化(二)
- 深入理解Linux网络技术内幕-设备注册和初始化(四)
- 深入理解Linux网络技术内幕-设备注册和初始化(一)
- 深入理解Linux网络技术内幕——设备的注册与初始化(一)
- 深入理解Linux网络技术内幕——网络设备初始化
- 深入理解Linux网络技术内幕——网络设备初始化
- 深入理解Linux网络技术内幕——内核基础架构和组件初始化
- 深入理解Linux网络技术内幕——用户空间与内核空间交互
- 深入理解Linux网络技术内幕——IPv4 分段与重组
- 深入理解linux网络技术内幕:用户空间与内核的接口
- 深入理解Linux网络技术内幕——L4层协议与Raw IP的处理
- 深入理解Linux网络技术内幕——帧的接收与传输
- 深入理解Linux网络技术内幕——中断与网络驱动程序
- 深入理解Linux网络技术内幕——协议处理函数
- 深入理解Linux网络技术内幕——Notification内核通知表链
- 深入理解Linux网络技术内幕-设备注册和初始化(三)
- 深入理解Linux网络技术内幕——Notification内核通知表链