您的位置:首页 > 运维架构 > Linux

Linux那些事儿之我是Block层(5)浓缩就是精华?(二)

2008-01-31 09:11 197 查看
第二个,register_disk,来头不小,它来自遥远的fs/partitions/check.c:

473 /* Not exported, helper to add_disk(). */
474 void register_disk(struct gendisk *disk)
475 {
476 struct block_device *bdev;
477 char *s;
478 int i;
479 struct hd_struct *p;
480 int err;
481
482 strlcpy(disk->kobj.name,disk->disk_name,KOBJ_NAME_LEN);
483 /* ewww... some of these buggers have / in name... */
484 s = strchr(disk->kobj.name, '/');
485 if (s)
486 *s = '!';
487 if ((err = kobject_add(&disk->kobj)))
488 return;
489 err = disk_sysfs_symlinks(disk);
490 if (err) {
491 kobject_del(&disk->kobj);
492 return;
493 }
494 disk_sysfs_add_subdirs(disk);
495
496 /* No minors to use for partitions */
497 if (disk->minors == 1)
498 goto exit;
499
500 /* No such device (e.g., media were just removed) */
501 if (!get_capacity(disk))
502 goto exit;
503
504 bdev = bdget_disk(disk, 0);
505 if (!bdev)
506 goto exit;
507
508 /* scan partition table, but suppress uevents */
509 bdev->bd_invalidated = 1;
510 disk->part_uevent_suppress = 1;
511 err = blkdev_get(bdev, FMODE_READ, 0);
512 disk->part_uevent_suppress = 0;
513 if (err < 0)
514 goto exit;
515 blkdev_put(bdev);
516
517 exit:
518 /* announce disk after possible partitions are already created */
519 kobject_uevent(&disk->kobj, KOBJ_ADD);
520
521 /* announce possible partitions */
522 for (i = 1; i < disk->minors; i++) {
523 p = disk->part[i-1];
524 if (!p || !p->nr_sects)
525 continue;
526 kobject_uevent(&p->kobj, KOBJ_ADD);
527 }
528 }

如果你不懂Linux 2.6的统一设备模型,那你要看懂这段代码估计够呛.但好在我们在<<我是Sysfs>>中对kobject方面的东西做了介绍.所以这里我们不会深入到kobject相关的函数内部中去,也不会深入到sysfs提供的函数内部,点到为止.
首先487行这个kobject_add的作用是很直观的,在Sysfs中为这块磁盘建一个子目录.就比如下面这些目录中的那个sdf,就是为我的U盘而建立的,我要是把这个调用kobject_add函数这行注释掉,保证你就看不到这个sdf目录.

[root@lfg2 ~]# ls /sys/block/
md0 ram1 ram11 ram13 ram15 ram3 ram5 ram7 ram9 sdb sdd sdf ram0 ram10 ram12 ram14 ram2 ram4 ram6 ram8 sda sdc sde sdg

这时候网友”塞翁失身”提出两个问题:
第一,为什么kobject_add这么一调用,生成的这个子目录的名字就叫做”sdf”,而不叫做别的?君还记得在sd_probe中我们做过一件事情么,当时我们可是精心计算过disk_name的,而disk_name正是struct gendisk的一个成员,这里我们看到482行我们把disk_name给了kobj.name,这就是为什么我们调用kobject_add添加一个kobject的时候,它的名字就是我们当时的disk_name.
第二,为什么生成的这个子目录是在/sys/block目录下面,而不是在别的位置?还记得在alloc_disk_node中我们申请struct gendisk的情景么?那句kobj_set_kset_s(disk,block_subsys)做的就是让disk对应的kobject从属于block_subsys对应的kobject下面.这就是为什么我们现在添加这个kobject的时候,它很自然的就会在/sys/block子目录下面建立文件.
继续走, disk_sysfs_symlinks来自fs/partitions/check.c,这个函数虽然不短,但是比较浅显易懂.

429 static int disk_sysfs_symlinks(struct gendisk *disk)
430 {
431 struct device *target = get_device(disk->driverfs_dev);
432 int err;
433 char *disk_name = NULL;
434
435 if (target) {
436 disk_name = make_block_name(disk);
437 if (!disk_name) {
438 err = -ENOMEM;
439 goto err_out;
440 }
441
442 err = sysfs_create_link(&disk->kobj, &target->kobj, "device");
443 if (err)
444 goto err_out_disk_name;
445
446 err = sysfs_create_link(&target->kobj, &disk->kobj, disk_name);
447 if (err)
448 goto err_out_dev_link;
449 }
450
451 err = sysfs_create_link(&disk->kobj, &block_subsys.kobj,
452 "subsystem");
453 if (err)
454 goto err_out_disk_name_lnk;
455
456 kfree(disk_name);
457
458 return 0;
459
460 err_out_disk_name_lnk:
461 if (target) {
462 sysfs_remove_link(&target->kobj, disk_name);
463 err_out_dev_link:
464 sysfs_remove_link(&disk->kobj, "device");
465 err_out_disk_name:
466 kfree(disk_name);
467 err_out:
468 put_device(target);
469 }
470 return err;
471 }

我们用实际效果来解读这个函数.首先我们看正常工作的U盘会在/sys/block/sdf下面有哪些内容:

[root@localhost ~]# ls /sys/block/sdf/
capability dev device holders queue range removable size slaves stat subsystem uevent

442行的sysfs_create_link这么一行创建的就是这里这个device这个软链接文件.我们来看它链接到哪里去了?

[root@localhost ~]# ls -l /sys/block/sdf/device
lrwxrwxrwx 1 root root 0 Dec 13 07:09 /sys/block/sdf/device -> ../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0/host24/target24:0:0/24:0:0:0

而446行这个sysfs_create_link则从那边又建立一个反链接,又给链接回来了.

[root@localhost~]# ls /sys/devices/pci0000/:00/0000/:00/:1d.7/usb4/4-4/4-4/:1.0/host24/target24/:0/:0/24/:0/:0/:0/
block:sdf driver ioerr_cnt model rescan scsi_generic:sg7 timeout bus generic iorequest_cnt power rev scsi_level type delete iocounterbits max_sectors queue_depth scsi_device:24:0:0:0 state uevent device_blocked iodone_cnt modalias queue_type scsi_disk:24:0:0:0 subsystem vendor

很明显,就是这个block:sdf.

[root@localhost~]# ls -l /sys/devices/pci0000/:00/0000/:00/:1d.7/usb4/4-4/4-4/:1.0/host24/target24/:0/:0/24/:0/:0/:0/block/:sdf
lrwxrwxrwx 1 root root 0 Dec 13 21:16 /sys/devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0/host24/target24:0:0/24:0:0:0/block:sdf -> ../../../../../../../../../block/sdf

于是这就等于你中有我我中有你,你那边有一个文件链接到了我这边,我这边有一个文件链接到了你那边.
然后451行再次调用sysfs_create_link.这次很显然,生成的是/sys/block/sdf/subsystem这个软链接文件.

[root@localhost ~]# ls -l /sys/block/sdf/subsystem
lrwxrwxrwx 1 root root 0 Dec 13 07:09 /sys/block/sdf/subsystem -> ../../block

三个链接文件建立好之后,disk_sysfs_symlinks也就结束了它的使命.接下来一个函数是 disk_sysfs_add_subdirs.同样来自fs/partitions/check.c:

342 static inline void disk_sysfs_add_subdirs(struct gendisk *disk)
343 {
344 struct kobject *k;
345
346 k = kobject_get(&disk->kobj);
347 disk->holder_dir = kobject_add_dir(k, "holders");
348 disk->slave_dir = kobject_add_dir(k, "slaves");
349 kobject_put(k);
350 }

这个函数的意图太明显了,相信虹口足球场外倒卖演唱会门票的黄牛党们都能看懂,无非就是建立holders和slaves两个子目录.
504行,bdget_disk,这是一个内联函数,<<Thinking in C++>>告诉我们内联函数最好定义在头文件中,所以这个函数来自include/linux/genhd.h:

433 static inline struct block_device *bdget_disk(struct gendisk *disk, int index)
434 {
435 return bdget(MKDEV(disk->major, disk->first_minor) + index);
436 }

又是一次声东击西的调用.bdget来自fs/block_dev.c:

554 struct block_device *bdget(dev_t dev)
555 {
556 struct block_device *bdev;
557 struct inode *inode;
558
559 inode = iget5_locked(bd_mnt->mnt_sb, hash(dev),
560 bdev_test, bdev_set, &dev);
561
562 if (!inode)
563 return NULL;
564
565 bdev = &BDEV_I(inode)->bdev;
566
567 if (inode->i_state & I_NEW) {
568 bdev->bd_contains = NULL;
569 bdev->bd_inode = inode;
570 bdev->bd_block_size = (1 << inode->i_blkbits);
571 bdev->bd_part_count = 0;
572 bdev->bd_invalidated = 0;
573 inode->i_mode = S_IFBLK;
574 inode->i_rdev = dev;
575 inode->i_bdev = bdev;
576 inode->i_data.a_ops = &def_blk_aops;
577 mapping_set_gfp_mask(&inode->i_data, GFP_USER);
578 inode->i_data.backing_dev_info = &default_backing_dev_info;
579 spin_lock(&bdev_lock);
580 list_add(&bdev->bd_list, &all_bdevs);
581 spin_unlock(&bdev_lock);
582 unlock_new_inode(inode);
583 }
584 return bdev;
585 }

真是祸不单行今日行啊,一下子跳出来两个变态的结构体来.struct block_device和struct inode.
在include/linux/fs.h中定义了这么一个结构体:

460 struct block_device {
461 dev_t bd_dev; /* not a kdev_t - it's a search key */
462 struct inode * bd_inode; /* will die */
463 int bd_openers;
464 struct mutex bd_mutex; /* open/close mutex */
465 struct semaphore bd_mount_sem;
466 struct list_head bd_inodes;
467 void * bd_holder;
468 int bd_holders;
469 #ifdef CONFIG_SYSFS
470 struct list_head bd_holder_list;
471 #endif
472 struct block_device * bd_contains;
473 unsigned bd_block_size;
474 struct hd_struct * bd_part;
475 /* number of times partitions within this device have been opened. */
476 unsigned bd_part_count;
477 int bd_invalidated;
478 struct gendisk * bd_disk;
479 struct list_head bd_list;
480 struct backing_dev_info *bd_inode_backing_dev_info;
481 /*
482 * Private data. You must have bd_claim'ed the block_device
483 * to use this. NOTE: bd_claim allows an owner to claim
484 * the same device multiple times, the owner must take special
485 * care to not mess up bd_private for that case.
486 */
487 unsigned long bd_private;
488 };

很明显,Linux中每一个Block设备都由这么一个结构体变量表示,这玩意儿因此被称作块设备描述符.inode咱们不具体讲,但是这里挺逗的一个结构体是struct bdev_inode,

29 struct bdev_inode {
30 struct block_device bdev;
31 struct inode vfs_inode;
32 };

把两个变态的结构体组合起来就变成了第三个变态的结构体.
但是网名为”避孕套一直用雕牌”的哥们儿问我,bdev_inode好像没出现过,讲它干嘛?我想说看问题要看本质,不要被表面迷惑,这个世界上很多事情都不像表面上看起来那样.不信你看BDEV_I,这个内联函数来自fs/block_dev.c:

34 static inline struct bdev_inode *BDEV_I(struct inode *inode)
35 {
36 return container_of(inode, struct bdev_inode, vfs_inode);
37 }

很显然,从inode得到相应的bdev_inode.于是565行这个&BDEV_I(inode)->bdev表示的就是inode对应的bdev_inode的成员struct block_device bdev.
但是结构体变量这东西不像公共汽车,只需等待就会自动来到你的面前,而需要你去申请才会有.iget5_locked就是干这件事情的,这个函数来自fs/inode.c,我们显然不会去深入看它,只能告诉你,这个函数这么一执行,我们就既有inode又有block_device了.而且对于第一次申请的inode,其i_state成员是设置了I_NEW这个flag的,所以bdget()函数中,567行这一段if语句是要被执行的.这一段if语句的作用就是初始化inode结构体指针inode以及block_device结构体指针bdev.而函数最终返回的也正是bdev.需要强调一下,bdev正是从这一刻开始正式出现在我们的故事中的.
回到register_disk()中,继续往下.下一个重量级的函数是blkdev_get,来自fs/block_dev.c:

1206 static int __blkdev_get(struct block_device *bdev, mode_t mode, unsigned flags,
1207 int for_part)
1208 {
1209 /*
1210 * This crockload is due to bad choice of ->open() type.
1211 * It will go away.
1212 * For now, block device ->open() routine must _not_
1213 * examine anything in 'inode' argument except ->i_rdev.
1214 */
1215 struct file fake_file = {};
1216 struct dentry fake_dentry = {};
1217 fake_file.f_mode = mode;
1218 fake_file.f_flags = flags;
1219 fake_file.f_path.dentry = &fake_dentry;
1220 fake_dentry.d_inode = bdev->bd_inode;
1221
1222 return do_open(bdev, &fake_file, for_part);
1223 }
1224
1225 int blkdev_get(struct block_device *bdev, mode_t mode, unsigned flags)
1226 {
1227 return __blkdev_get(bdev, mode, flags, 0);
1228 }

看到blkdev_get调用的是__blkdev_get,所以我们两个函数一块贴出来了.
很显然,真正需要看的却是do_open,来自同一个文件.

1103 /*
1104 * bd_mutex locking:
1105 *
1106 * mutex_lock(part->bd_mutex)
1107 * mutex_lock_nested(whole->bd_mutex, 1)
1108 */
1109
1110 static int do_open(struct block_device *bdev, struct file *file, int for_part)
1111 {
1112 struct module *owner = NULL;
1113 struct gendisk *disk;
1114 int ret = -ENXIO;
1115 int part;
1116
1117 file->f_mapping = bdev->bd_inode->i_mapping;
1118 lock_kernel();
1119 disk = get_gendisk(bdev->bd_dev, &part);
1120 if (!disk) {
1121 unlock_kernel();
1122 bdput(bdev);
1123 return ret;
1124 }
1125 owner = disk->fops->owner;
1126
1127 mutex_lock_nested(&bdev->bd_mutex, for_part);
1128 if (!bdev->bd_openers) {
1129 bdev->bd_disk = disk;
1130 bdev->bd_contains = bdev;
1131 if (!part) {
1132 struct backing_dev_info *bdi;
1133 if (disk->fops->open) {
1134 ret = disk->fops->open(bdev->bd_inode, file);
1135 if (ret)
1136 goto out_first;
1137 }
1138 if (!bdev->bd_openers) {
1139 bd_set_size(bdev,(loff_t)get_capacity(disk)<<9);
1140 bdi = blk_get_backing_dev_info(bdev);
1141 if (bdi == NULL)
1142 bdi = &default_backing_dev_info;
1143 bdev->bd_inode->i_data.backing_dev_info = bdi;
1144 }
1145 if (bdev->bd_invalidated)
1146 rescan_partitions(disk, bdev);
1147 } else {
1148 struct hd_struct *p;
1149 struct block_device *whole;
1150 whole = bdget_disk(disk, 0);
1151 ret = -ENOMEM;
1152 if (!whole)
1153 goto out_first;
1154 BUG_ON(for_part);
1155 ret = __blkdev_get(whole, file->f_mode, file->f_flags, 1);
1156 if (ret)
1157 goto out_first;
1158 bdev->bd_contains = whole;
1159 p = disk->part[part - 1];
1160 bdev->bd_inode->i_data.backing_dev_info =
1161 whole->bd_inode->i_data.backing_dev_info;
1162 if (!(disk->flags & GENHD_FL_UP) || !p || !p->nr_sects) {
1163 ret = -ENXIO;
1164 goto out_first;
1165 }
1166 kobject_get(&p->kobj);
1167 bdev->bd_part = p;
1168 bd_set_size(bdev, (loff_t) p->nr_sects << 9);
1169 }
1170 } else {
1171 put_disk(disk);
1172 module_put(owner);
1173 if (bdev->bd_contains == bdev) {
1174 if (bdev->bd_disk->fops->open) {
1175 ret = bdev->bd_disk->fops->open(bdev->bd_inode, file);
1176 if (ret)
1177 goto out;
1178 }
1179 if (bdev->bd_invalidated)
1180 rescan_partitions(bdev->bd_disk, bdev);
1181 }
1182 }
1183 bdev->bd_openers++;
1184 if (for_part)
1185 bdev->bd_part_count++;
1186 mutex_unlock(&bdev->bd_mutex);
1187 unlock_kernel();
1188 return 0;
1189
1190 out_first:
1191 bdev->bd_disk = NULL;
1192 bdev->bd_inode->i_data.backing_dev_info = &default_backing_dev_info;
1193 if (bdev != bdev->bd_contains)
1194 __blkdev_put(bdev->bd_contains, 1);
1195 bdev->bd_contains = NULL;
1196 put_disk(disk);
1197 module_put(owner);
1198 out:
1199 mutex_unlock(&bdev->bd_mutex);
1200 unlock_kernel();
1201 if (ret)
1202 bdput(bdev);
1203 return ret;
1204 }

天哪.内核函数没有最变态,只有更变态.
一开始的时候,bd_openers是被初始化为了0,所以1128这个if语句是要被执行的.bd_openers为0表示一个文件还没有被打开过.
一开始我们还没有涉及到分区的信息,所以一开始我们只有sda这个概念,而没有sda1,sda2,sda3…这些概念.这时候我们调用get_gendisk得到的part一定是0.所以1131行的if语句也会执行.而disk->fops->open很明显,就是sd_open.(因为我们在sd_probe中曾经设置了gd->fops等于&sd_fops.)
但此时此刻我们执行sd_open实际上是不做什么正经事儿的.顶多就是测试一下看看sd_open能不能执行,如果能执行,那么就返回0.如果根本就不能执行,那就赶紧汇报错误.
接下来还有几个函数,主要做一些赋值,暂时先飘过.等到适当的时候需要看了再回来看.
而1146行这个rescan_partitions()显然是我们要看的,首先我们在调用blkdev_get之前把bd_invalidated设置为了1,所以这个函数这次一定会被执行.从这一刻开始分区信息闯入了我们的生活.这个函数来自fs/partitions/check.c:

530 int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
531 {
532 struct parsed_partitions *state;
533 int p, res;
534
535 if (bdev->bd_part_count)
536 return -EBUSY;
537 res = invalidate_partition(disk, 0);
538 if (res)
539 return res;
540 bdev->bd_invalidated = 0;
541 for (p = 1; p < disk->minors; p++)
542 delete_partition(disk, p);
543 if (disk->fops->revalidate_disk)
544 disk->fops->revalidate_disk(disk);
545 if (!get_capacity(disk) || !(state = check_partition(disk, bdev)))
546 return 0;
547 if (IS_ERR(state)) /* I/O error reading the partition table */
548 return -EIO;
549 for (p = 1; p < state->limit; p++) {
550 sector_t size = state->parts[p].size;
551 sector_t from = state->parts[p].from;
552 if (!size)
553 continue;
554 if (from + size > get_capacity(disk)) {
555 printk(" %s: p%d exceeds device capacity/n",
556 disk->disk_name, p);
557 }
558 add_partition(disk, p, from, size, state->parts[p].flags);
559 #ifdef CONFIG_BLK_DEV_MD
560 if (state->parts[p].flags & ADDPART_FLAG_RAID)
561 md_autodetect_dev(bdev->bd_dev+p);
562 #endif
563 }
564 kfree(state);
565 return 0;
566 }

其实就算我们一行代码都不看也知道这个函数在干嘛,正如我们说的,这个函数执行过后,关于分区的信息我们就算都有了.关于分区,我们是用struct hd_struct这么个结构体来表示的,而struct hd_struct也正是struct gendisk的成员,并且是个二级指针.一开始这个指针并无所指,或者说一开始我们并没有为struct hd_struct申请空间,所以我即使不贴出下面这个delete_partition函数的代码你也应该知道,此时此刻,它什么也不会干.

352 void delete_partition(struct gendisk *disk, int part)
353 {
354 struct hd_struct *p = disk->part[part-1];
355 if (!p)
356 return;
357 if (!p->nr_sects)
358 return;
359 disk->part[part-1] = NULL;
360 p->start_sect = 0;
361 p->nr_sects = 0;
362 p->ios[0] = p->ios[1] = 0;
363 p->sectors[0] = p->sectors[1] = 0;
364 sysfs_remove_link(&p->kobj, "subsystem");
365 kobject_unregister(p->holder_dir);
366 kobject_uevent(&p->kobj, KOBJ_REMOVE);
367 kobject_del(&p->kobj);
368 kobject_put(&p->kobj);
369 }

而revalidate_disk指针指向的就是sd_revalidate_disk,这个函数我们在讲述sd的时候对它做足了文章.在sd_probe调用add_disk之前,就已经执行过这个函数,这里只不过是再执行一次罢了.
接着,get_capacity().没有比这个函数更简单的函数了.来自include/linux/genhd.h:

254 static inline sector_t get_capacity(struct gendisk *disk)
255 {
256 return disk->capacity;
257 }

而check_partition就稍微复杂一些了,来自fs/partitions/check.c:

156 static struct parsed_partitions *
157 check_partition(struct gendisk *hd, struct block_device *bdev)
158 {
159 struct parsed_partitions *state;
160 int i, res, err;
161
162 state = kmalloc(sizeof(struct parsed_partitions), GFP_KERNEL);
163 if (!state)
164 return NULL;
165
166 disk_name(hd, 0, state->name);
167 printk(KERN_INFO " %s:", state->name);
168 if (isdigit(state->name[strlen(state->name)-1]))
169 sprintf(state->name, "p");
170
171 state->limit = hd->minors;
172 i = res = err = 0;
173 while (!res && check_part[i]) {
174 memset(&state->parts, 0, sizeof(state->parts));
175 res = check_part[i++](state, bdev);
176 if (res < 0) {
177 /* We have hit an I/O error which we don't report now.
178 * But record it, and let the others do their job.
179 */
180 err = res;
181 res = 0;
182 }
183
184 }
185 if (res > 0)
186 return state;
187 if (err)
188 /* The partition is unrecognized. So report I/O errors if there were any */
189 res = err;
190 if (!res)
191 printk(" unknown partition table/n");
192 else if (warn_no_part)
193 printk(" unable to read partition table/n");
194 kfree(state);
195 return ERR_PTR(res);
196 }

首先,struct parsed_partitions结构体定义于fs/partitions/check.h这么一个头文件中:

8 enum { MAX_PART = 256 };
9
10 struct parsed_partitions {
11 char name[BDEVNAME_SIZE];
12 struct {
13 sector_t from;
14 sector_t size;
15 int flags;
16 } parts[MAX_PART];
17 int next;
18 int limit;
19 };

这个结构体是我们用来记录分区信息的.
而173行这个check_part是何许人物?在fs/partitions/check.c中找到了它:

43 int warn_no_part = 1; /*This is ugly: should make genhd removable media aware*/
44
45 static int (*check_part[])(struct parsed_partitions *, struct block_device *) = {
46 /*
47 * Probe partition formats with tables at disk address 0
48 * that also have an ADFS boot block at 0xdc0.
49 */
50 #ifdef CONFIG_ACORN_PARTITION_ICS
51 adfspart_check_ICS,
52 #endif
53 #ifdef CONFIG_ACORN_PARTITION_POWERTEC
54 adfspart_check_POWERTEC,
55 #endif
56 #ifdef CONFIG_ACORN_PARTITION_EESOX
57 adfspart_check_EESOX,
58 #endif
59
60 /*
61 * Now move on to formats that only have partition info at
62 * disk address 0xdc0. Since these may also have stale
63 * PC/BIOS partition tables, they need to come before
64 * the msdos entry.
65 */
66 #ifdef CONFIG_ACORN_PARTITION_CUMANA
67 adfspart_check_CUMANA,
68 #endif
69 #ifdef CONFIG_ACORN_PARTITION_ADFS
70 adfspart_check_ADFS,
71 #endif
72
73 #ifdef CONFIG_EFI_PARTITION
74 efi_partition, /* this must come before msdos */
75 #endif
76 #ifdef CONFIG_SGI_PARTITION
77 sgi_partition,
78 #endif
79 #ifdef CONFIG_LDM_PARTITION
80 ldm_partition, /* this must come before msdos */
81 #endif
82 #ifdef CONFIG_MSDOS_PARTITION
83 msdos_partition,
84 #endif
85 #ifdef CONFIG_OSF_PARTITION
86 osf_partition,
87 #endif
88 #ifdef CONFIG_SUN_PARTITION
89 sun_partition,
90 #endif
91 #ifdef CONFIG_AMIGA_PARTITION
92 amiga_partition,
93 #endif
94 #ifdef CONFIG_ATARI_PARTITION
95 atari_partition,
96 #endif
97 #ifdef CONFIG_MAC_PARTITION
98 mac_partition,
99 #endif
100 #ifdef CONFIG_ULTRIX_PARTITION
101 ultrix_partition,
102 #endif
103 #ifdef CONFIG_IBM_PARTITION
104 ibm_partition,
105 #endif
106 #ifdef CONFIG_KARMA_PARTITION
107 karma_partition,
108 #endif
109 #ifdef CONFIG_SYSV68_PARTITION
110 sysv68_partition,
111 #endif
112 NULL
113 };

好家伙,一下子定义了这么多函数,要是每个都要看那我他妈还要不要活了.也亏了哥们儿是曾经的复旦大学优秀团员,要不然还不被吓死去了.
不过情况总还没有那么遭,我们不用像某些媒体一样每次都把夸大事实,以至于每年的洪水或干旱都被认定是百年一遇,搞得我们不禁怀疑自己到底活过了几个百年?眼下的情况其实很好对付,除非你就是专门研究分区表格式的,否则这一堆函数你一个也不用看.如果你真是研究分区表格式的,那么fs/partitions目录下面的文件你就得仔细看看了,各种格式的都有,你就捡自己需要的看吧.

localhost:/usr/src/linux-2.6.22.1 # ls fs/partitions/
Kconfig acorn.h atari.c check.h ibm.c karma.h mac.c msdos.h sgi.c sun.h ultrix.c Makefile amiga.c atari.h efi.c ibm.h ldm.c mac.h osf.c sgi.h sysv68.c ultrix.h acorn.c amiga.h check.c efi.h karma.c ldm.h msdos.c osf.h sun.c sysv68.h

基本上我想说的是,以上那么多个函数其目的就是一个,为了找到分区信息.而且最终分区信息总是会被记录在那个struct parsed_partitions结构体的指针.而接下来我们就会用到其中的信息,这其中像size啊,from啊,这些变量的意思不言自明.
然后我们就来到了add_partition,仍然是来自fs/partitions/check.c:

371 void add_partition(struct gendisk *disk, int part, sector_t start, sector_t len, int flags)
372 {
373 struct hd_struct *p;
374
375 p = kmalloc(sizeof(*p), GFP_KERNEL);
376 if (!p)
377 return;
378
379 memset(p, 0, sizeof(*p));
380 p->start_sect = start;
381 p->nr_sects = len;
382 p->partno = part;
383 p->policy = disk->policy;
384
385 if (isdigit(disk->kobj.name[strlen(disk->kobj.name)-1]))
386 snprintf(p->kobj.name,KOBJ_NAME_LEN,"%sp%d",disk->kobj.name,part);
387 else
388 snprintf(p->kobj.name,KOBJ_NAME_LEN,"%s%d",disk->kobj.name,part);
389 p->kobj.parent = &disk->kobj;
390 p->kobj.ktype = &ktype_part;
391 kobject_init(&p->kobj);
392 kobject_add(&p->kobj);
393 if (!disk->part_uevent_suppress)
394 kobject_uevent(&p->kobj, KOBJ_ADD);
395 sysfs_create_link(&p->kobj, &block_subsys.kobj, "subsystem");
396 if (flags & ADDPART_FLAG_WHOLEDISK) {
397 static struct attribute addpartattr = {
398 .name = "whole_disk",
399 .mode = S_IRUSR | S_IRGRP | S_IROTH,
400 .owner = THIS_MODULE,
401 };
402
403 sysfs_create_file(&p->kobj, &addpartattr);
404 }
405 partition_sysfs_add_subdir(p);
406 disk->part[part-1] = p;
407 }

有了之前的经验,现在再看这些kobject相关的,sysfs相关的函数就容易多了.
389行这个p->kobj.parent = &disk->kobj保证了我们接下来生成的东西在刚才的目录之下,即sda1,sda2,…在sda目录下.

[root@localhost tedkdb]# ls /sys/block/sda/
capability device queue removable sda10 sda12 sda14 sda2 sda5 sda7 sda9 slaves subsystem dev holders range sda1 sda11 sda13 sda15 sda3 sda6 sda8 size stat uevent

而395行sysfs_create_link的效果也很显然,

[root@localhost tedkdb]# ls -l /sys/block/sda/sda1/subsystem
lrwxrwxrwx 1 root root 0 Dec 13 03:15 /sys/block/sda/sda1/subsystem -> ../../../block

而partition_sysfs_add_subdir也没什么好说的,来自fs/partitions/check.c:

333 static inline void partition_sysfs_add_subdir(struct hd_struct *p)
334 {
335 struct kobject *k;
336
337 k = kobject_get(&p->kobj);
338 p->holder_dir = kobject_add_dir(k, "holders");
339 kobject_put(k);
340 }

添加了holders子目录.

[root@localhost tedkdb]# ls /sys/block/sda/sda1/
dev holders size start stat subsystem uevent

最后,让我们记住这个函数做过的一件事情,对p的各个成员进行了赋值,而在函数的结尾处把disk->part[part-1]指向了p.也就是说,从此以后,struct hd_struct这个指针数组里就应该有内容了,而不再是空的.
到这里,rescan_partitions()宣告结束,回到do_open()中.1183行,让bd_openers这个引用计数增加1,如果for_part有值,那么就让它对应的引用计数也加1.然后do_open也就华丽丽的结束了,像多米诺骨牌一样,__blkdev_get和blkdev_get相继返回.blkdev_put和blkdev_get做的事情基本相反,我们就不看了,只是需要注意,它把刚才增加上去的这两个引用计数给减了回去.
最后,register_disk()中调用的最后一个函数就是kobject_uevent(),这个函数就是通知用户空间的进程udevd,告诉它有事件发生了,如果你使用的发行版正确配置了udev的配置文件(详见/etc/udev/目录下),那么其效果就是让/dev目录下面有了相应的设备文件.比如:

[root@localhost tedkdb]# ls /dev/sda*
/dev/sda /dev/sda10 /dev/sda12 /dev/sda14 /dev/sda2 /dev/sda5 /dev/sda7 /dev/sda9 /dev/sda1 /dev/sda11 /dev/sda13 /dev/sda15 /dev/sda3 /dev/sda6 /dev/sda8

至于为什么,你可以去阅读关于udev的知识,这是用户空间的程序,咱们就不多说了.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: