您的位置:首页 > 编程语言 > PHP开发

hotplug --- netlink

2010-03-30 15:14 393 查看
hotplug and firmware loading with sysfs.
========================================

The 2.6.x Linux kernels export a device tree through sysfs, which is a
synthetic filesystem generally mounted at "/sys".  Among other things,
this filesystem tells userspace what hardware is available, so userspace tools
(such as udev or mdev) can dynamically populate a "/dev" directory with device
nodes representing the currently available hardware.

Notification when hardware is inserted or removed is provided by the
hotplug mechanism.  Linux provides two hotplug interfaces: /sbin/hotplug and
netlink.

The combination of sysfs and hotplug obsoleted the older "devfs", which was
removed from the 2.6.16 kernel.

Device nodes:
=============

Sysfs exports major and minor numbers for device nodes with which to populate
/dev via mknod(2).  These major and minor numbers are found in files named
"dev", which contain two colon separated ascii decimal numbers followed by
exactly one newline.  I.E.

$ cat /sys/class/mem/zero/dev
1:5

Note that the name of the directory containing a dev entry is usually the
traditional name for the device node.  (The above entry is for "/dev/zero".)

Entries for block devices are found at the following locations:

/sys/block/*/dev
/sys/block/*/*/dev

Entries for char devices are found at the following locations:

/sys/bus/*/devices/*/dev
/sys/class/*/*/dev

A very simple bash script to populate /dev from /sys (without addressing
ownership or permissions of the resulting /dev nodes, and with truly horrible
performance) might look like:

#!/bin/bash

# Populate block devices

for i in /sys/block/*/dev /sys/block/*/*/dev
do
if [ -f $i ]
then
MAJOR=$(sed 's/:.*//' < $i)
MINOR=$(sed 's/.*://' < $i)
DEVNAME=$(echo $i | sed -e 's@/dev@@' -e 's@.*/@@')
mknod /dev/$DEVNAME b $MAJOR $MINOR
fi
done

# Populate char devices

for i in /sys/bus/*/devices/*/dev /sys/class/*/*/dev
do
if [ -f $i ]
then
MAJOR=$(sed 's/:.*//' < $i)
MINOR=$(sed 's/.*://' < $i)
DEVNAME=$(echo $i | sed -e 's@/dev@@' -e 's@.*/@@')
mknod /dev/$DEVNAME c $MAJOR $MINOR
fi
done

Hotplug:
========

The hotplug mechanism asynchronously notifies userspace when hardware is
inserted, removed, or undergoes a similar significant state change.  Linux
provides two interfaces to hotplug; the kernel can spawn a usermode helper
process, or it can send a message to an existing daemon listening to a netlink
socket.

-- Usermode helper

The usermode helper hotplug mechanism spawns a new process to handle each
hotplug event.  Each such helper process belongs to the root user (UID 0) and
is a child of the init task (PID 1).  The kernel spawns one process per hotplug
event, supplying environment variables to each new process describing that
particular hotplug event.  By default the kernel spawns instances of
"/sbin/hotplug", but this default can be changed by writing a new path into
"/proc/sys/kernel/hotplug" (assuming /proc is mounted).

A simple bash script to record variables from hotplug events might look like:

#!/bin/bash

env >> /filename

It's possible to disable the usermode helper hotplug mechanism (by writing an
empty string into /proc/sys/kernel/hotplug), but there's little reason to
do this unless you want to disable an existing hotplug mechanism.  (From a
performance perspective, a usermode helper won't be spawned if /sbin/hotplug
doesn't exist, and negative dentries will record the fact it doesn't exist
after the first lookup attempt.)

-- Netlink

A daemon listening to the netlink socket receives a packet of data for each
hotplug event, containing the same information a usermode helper would receive
in environment variables.

The netlink packet contains a set of null terminated text lines.
The first line of the netlink packet combines the $ACTION and $DEVPATH values,
separated by an @ (at sign).  Each line after the first contains a
KEYWORD=VALUE pair defining a hotplug event variable.

Here's a C program to print hotplug netlink events to stdout:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <sys/poll.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <unistd.h>

#include <linux/types.h>
#include <linux/netlink.h>

void die(char *s)
{
write(2,s,strlen(s));
exit(1);
}

int main(int argc, char *argv[])
{
struct sockaddr_nl nls;
struct pollfd pfd;
char buf[512];

// Open hotplug event netlink socket

memset(&nls,0,sizeof(struct sockaddr_nl));
nls.nl_family = AF_NETLINK;
nls.nl_pid = getpid();
nls.nl_groups = -1;

pfd.events = POLLIN;
pfd.fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);
if (pfd.fd==-1)
die("Not root/n");

// Listen to netlink socket

if (bind(pfd.fd, (void *)&nls, sizeof(struct sockaddr_nl)))
die("Bind failed/n");
while (-1!=poll(&pfd, 1, -1)) {
int i, len = recv(pfd.fd, buf, sizeof(buf), MSG_DONTWAIT);
if (len == -1) die("recv/n");

// Print the data to stdout.
i = 0;
while (i<len) {
printf("%s/n", buf+i);
i += strlen(buf+i)+1;
}
}
die("poll/n");

// Dear gcc: shut up.
return 0;
}

Hotplug event variables:
========================

Every hotplug event should provide at least the following variables:

ACTION
The current hotplug action: "add" to add the device, "remove" to remove it.
The 2.6.22 kernel can also generate "change", "online", "offline", and
"move" actions.

DEVPATH
Path under /sys at which this device's sysfs directory can be found.

SUBSYSTEM
If this is "block", it's a block device.  Anything other subsystem is
either a char device or does not have an associated device node.

The following variables are also provided for some devices:

MAJOR and MINOR
If these are present, a device node can be created in /dev for this device.
Some devices (such as network cards) don't generate a /dev node.

[QUESTION: Any reliable way to get the default name?]

DRIVER
If present, a suggested driver (module) for handling this device.  No
relation to whether or not a driver is currently handling the device.

INTERFACE and IFINDEX
When SUBSYSTEM=net, these variables indicate the name of the interface
and a unique integer for the interface.  (Note that "INTERFACE=eth0" could
be paired with "IFINDEX=2" because eth0 isn't guaranteed to come before lo
and the count doesn't start at 0.)

FIRMWARE
The system is requesting firmware for the device.  See "Firmware loading"
below.

Injecting events into hotplug via "uevent":
===========================================

Events can be injected into the hotplug mechanism through sysfs via the
"uevent" files.  Each directory in sysfs containing a "dev" file should also
contain a "uevent" file.  Write the name of the event (such as "add" or
"remove") to the appropriate uevent file, and the kernel will deliver such
an event for that device via the hotplug mechanism.

Note that in kernel versions 2.6.24 and newer, "uevent" is readable.  Reading
from uevent provides the set of "extra" variables associated with this event.

A note about race conditions (or "why bother with netlink?"):
=============================================================

Some simple systems (such as embedded systems) scan sysfs once at boot time
to populate /dev, and ignore any hotplug events.  Scanning again to probe for
new devices is a workable option (as long as mknod failing because the
device already exists isn't considered an error condition).

Systems that actually support hotplug should start to handle hotplug events
_before_ scanning sysfs for existing devices, to ensure that that any devices
added during the scan reliably have a /dev entry created for them.

Devices removed while scanning /sys may still result in leftover /dev nodes
after the scan.  The race is that the scanning process may read the "dev"
entry for a device from sysfs, be interrupted by a hotplug event which attempts
to remove that device, and then the scanning process resumes and creates the
device node for the already-removed device.  In theory this is no more of a
security concern than having a statically allocated /dev (the device node
will return -ENODEV to programs that try to use it) but, it's untidy.

In theory, transient devices (which are created and removed again almost
instantly, which can be caused by poorly written drivers that fail their device
probe) could have similar "leftover" /dev entries from the /sbin/hotplug
mechanism.  (If two processes are spawned simultaneously, which one completes
first is not guaranteed.)  This is not common, but theoretically possible.

These sort of races are why the netlink mechanism was created.  To avoid
such potential races when using netlink, instead of reading each "dev" entry,
fake "add" events by writing to each device's "uevent" file in sysfs.  This
filters the sequencing through the kernel, which will not deliver an "add"
event packet to the netlink process for a device that has been removed.

Note also that on very large mainframe systems, /sbin/hotplug can potentially
fork bomb the system during system bringup.

Firmware loading
================

If the hotplug variable FIRMWARE is set, the kernel is requesting firmware
for a device (identified by $DEVPATH).  To provide the firmware to the kernel,
do the following:

echo 1 > /sys/$DEVPATH/loading
cat /path/to/$FIRMWARE > /sys/$DEVPATH/data
echo 0 > /sys/$DEVPATH/loading

Note that "echo -1 > /sys/$DEVPATH/loading" will cancel the firmware load
and return an error to the kernel, and /sys/class/firmware/timeout contains a
timeout (in seconds) for firmware loads.

See Documentation/firmware_class/ for more information.

Loading firmware for statically linked devices
==============================================

An advantage of the usermode helper hotplug mechanism is that if initramfs
contains an executable /sbin/hotplug, it can be called even before the kernel
runs init.  This allows /sbin/hotplug to supply firmware (out of initramfs) to
statically linked device drivers.  (The netlink mechanism requires a daemon to
listen to a socket, and such a daemon cannot be spawned before init runs.)

For licensing reasons, binary-only firmware should not be linked into the
kernel image, but instead placed in an externally supplied initramfs which
can be passed to the Linux kernel through the old initrd mechanism.
See Documentation/filesystems/ramfs-rootfs-initramfs.txt for details.

stable_api_nonsense:
====================

Note: Sysfs exports a lot of kernel internal state, and the maintainers of
sysfs do not believe that exposing information to userspace for use by
userspace programs constitues an "API" that must be "stable".  The sysfs
infrastructure is maintained by the author of
Documentation/stable_api_nonsense.txt, who seems to believe it applies to
userspace as well.  Therefore, at best only a subset of the information in
sysfs can be considered stable from version to version.

The information documented here should remain stable.  Some other parts of
sysfs are documented under Documentation/API, although that directory comes
with a warning that anything documented there can go away after two years.
Any other information exported by sysfs should be considered debugging info
at best, and probably shouldn't have been exported at all since it's not a
"stable API" intended for use by actual programs.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息