您的位置:首页 > 运维架构 > Linux

Linux那些事儿之我是Block层(10)scsi命令的前世今生(四)

2008-01-31 09:14 741 查看
当然,while循环结束也可能是因为1453行的这两个判断.首先req如果没有了,另一个得看scsi_dev_queue_ready()的返回值,如果返回值为0,那么break也会被执行,从而结束循环.

1270 /*
1271 * scsi_dev_queue_ready: if we can send requests to sdev, return 1 else
1272 * return 0.
1273 *
1274 * Called with the queue_lock held.
1275 */
1276 static inline int scsi_dev_queue_ready(struct request_queue *q,
1277 struct scsi_device *sdev)
1278 {
1279 if (sdev->device_busy >= sdev->queue_depth)
1280 return 0;
1281 if (sdev->device_busy == 0 && sdev->device_blocked) {
1282 /*
1283 * unblock after device_blocked iterates to zero
1284 */
1285 if (--sdev->device_blocked == 0) {
1286 SCSI_LOG_MLQUEUE(3,
1287 sdev_printk(KERN_INFO, sdev,
1288 "unblocking device at zero depth/n"));
1289 } else {
1290 blk_plug_device(q);
1291 return 0;
1292 }
1293 }
1294 if (sdev->device_blocked)
1295 return 0;
1296
1297 return 1;
1298 }

这里需要判断的是device_busy.这个flag如果设置了,说明命令正在执行中,或者说命令已经传递到了底层驱动.因此,我们在调用scsi_dispatch_cmd之前先增加device_busy,即1469行.
另一个flag是device_blocked.这个flag是告诉世人这个设备不能再接收新的命令了,因为它十有八九是正在处理命令.正常情况下这个flag的值为0.除非你调用了scsi_queue_insert()函数.友情提示一下,scsi设备的这个flag是提供了sysfs的接口的,因此我们可以通过sysfs的接口看一下设备的这个值,下面列举了两个scsi设备的这个变量的值,可以看到都是0,应该说这是它的常态.

[root@localhost ~]# ls /sys/bus/scsi/devices/
0:0:8:0/ 0:2:0:0/ 1:0:0:0/ 2:0:0:0/
[root@localhost ~]# ls /sys/bus/scsi/devices/2/:0/:0/:0/
block:sdb/ iocounterbits modalias rev subsystem/ bus/ iodone_cnt model scsi_device:2:0:0:0/ timeout delete ioerr_cnt queue_depth scsi_disk:2:0:0:0/ type device_blocked iorequest_cnt queue_type scsi_level uevent driver/ max_sectors rescan state vendor
[root@localhost ~]# cat /sys/bus/scsi/devices/2/:0/:0/:0/device_blocked
0
[root@localhost ~]# cat /sys/bus/scsi/devices/0/:0/:8/:0/device_blocked
0

所以正常情况下,scsi_dev_queue_ready()函数的返回值就是1,这一点正如其注释里说的那样.但是所谓的常态,指的是单独执行一个命令,如果要执行多个命令,或者说我们提交了多个request,那么device_busy就会一次次的在1469行增加,从而使得device_busy有可能将超过queue_depth,这样子scsi_dev_queue_ready()就会返回0,从而scsi_request_fn()就有可能结束,这之后,__generic_unplug_device也将返回,之后blk_execute_rq_nowait()返回,回到blk_execute_rq()中,执行wait_for_completion(),于是就睡眠了,等待了,按照游戏规则,我们应该能找到一条complete()语句来唤醒它,那么这条语句在哪里呢?答案是blk_end_sync_rq.
网友”宁失身不失眠”非常好奇我是怎么知道的.说来话长,还记得我们当时在usb-storage中说的那个scsi_done么?命令执行完了就会call scsi_done.而scsi_done来自drivers/scsi/scsi.c,很显然这个函数是我们的突破口,我们找到了这个函数就好比国民党找到了甫志高,就好比王佳芝找到了易先生:

608 /**
609 * scsi_done - Enqueue the finished SCSI command into the done queue.
610 * @cmd: The SCSI Command for which a low-level device driver (LLDD) gives
611 * ownership back to SCSI Core -- i.e. the LLDD has finished with it.
612 *
613 * This function is the mid-level's (SCSI Core) interrupt routine, which
614 * regains ownership of the SCSI command (de facto) from a LLDD, and enqueues
615 * the command to the done queue for further processing.
616 *
617 * This is the producer of the done queue who enqueues at the tail.
618 *
619 * This function is interrupt context safe.
620 */
621 static void scsi_done(struct scsi_cmnd *cmd)
622 {
623 /*
624 * We don't have to worry about this one timing out any more.
625 * If we are unable to remove the timer, then the command
626 * has already timed out. In which case, we have no choice but to
627 * let the timeout function run, as we have no idea where in fact
628 * that function could really be. It might be on another processor,
629 * etc, etc.
630 */
631 if (!scsi_delete_timer(cmd))
632 return;
633 __scsi_done(cmd);
634 }

躲躲闪闪的是来自同一文件的__scsi_done,

636 /* Private entry to scsi_done() to complete a command when the timer
637 * isn't running --- used by scsi_times_out */
638 void __scsi_done(struct scsi_cmnd *cmd)
639 {
640 struct request *rq = cmd->request;
641
642 /*
643 * Set the serial numbers back to zero
644 */
645 cmd->serial_number = 0;
646
647 atomic_inc(&cmd->device->iodone_cnt);
648 if (cmd->result)
649 atomic_inc(&cmd->device->ioerr_cnt);
650
651 BUG_ON(!rq);
652
653 /*
654 * The uptodate/nbytes values don't matter, as we allow partial
655 * completes and thus will check this in the softirq callback
656 */
657 rq->completion_data = cmd;
658 blk_complete_request(rq);
659 }

别的我们都不关心,就关心最后这个blk_complete_request().

3588 /**
3589 * blk_complete_request - end I/O on a request
3590 * @req: the request being processed
3591 *
3592 * Description:
3593 * Ends all I/O on a request. It does not handle partial completions,
3594 * unless the driver actually implements this in its completion callback
3595 * through requeueing. Theh actual completion happens out-of-order,
3596 * through a softirq handler. The user must have registered a completion
3597 * callback through blk_queue_softirq_done().
3598 **/
3599
3600 void blk_complete_request(struct request *req)
3601 {
3602 struct list_head *cpu_list;
3603 unsigned long flags;
3604
3605 BUG_ON(!req->q->softirq_done_fn);
3606
3607 local_irq_save(flags);
3608
3609 cpu_list = &__get_cpu_var(blk_cpu_done);
3610 list_add_tail(&req->donelist, cpu_list);
3611 raise_softirq_irqoff(BLOCK_SOFTIRQ);
3612
3613 local_irq_restore(flags);
3614 }

其它的咱们不管,就管一管这个raise_softirq_irqoff().在很久很久以前,有一个函数,它的名字叫做blk_dev_init().它是我们这个故事的起源.在这个函数中我们曾经见过这么一行,

3720 open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL);

当时咱们就说过,它所做的就是初始化了一个softirq,即BLOCK_SOFTIRQ.并且绑定了softirq函数blk_done_softirq,而要触发这个软中断,咱们当时也说了,只要调用raise_softirq_irqoff()即可.所以现在我们也就这样做了.这也就意味着,blk_done_softirq会被调用.

3542 /*
3543 * splice the completion data to a local structure and hand off to
3544 * process_completion_queue() to complete the requests
3545 */
3546 static void blk_done_softirq(struct softirq_action *h)
3547 {
3548 struct list_head *cpu_list, local_list;
3549
3550 local_irq_disable();
3551 cpu_list = &__get_cpu_var(blk_cpu_done);
3552 list_replace_init(cpu_list, &local_list);
3553 local_irq_enable();
3554
3555 while (!list_empty(&local_list)) {
3556 struct request *rq = list_entry(local_list.next, struct request, donelist);
3557
3558 list_del_init(&rq->donelist);
3559 rq->q->softirq_done_fn(rq);
3560 }
3561 }

而这个softirq_done_fn是什么呢?不要说你不知道,其实我们也讲过.不过忘记了也不要紧,人最大的烦恼便是记忆太好,健忘的人容易快乐.在scsi_alloc_queue中,我们调用blk_queue_softirq_done把scsi_softirq_done赋给了q->softirq_done_fn,所以到了这里,被调用的就是scsi_softirq_done.

1376 static void scsi_softirq_done(struct request *rq)
1377 {
1378 struct scsi_cmnd *cmd = rq->completion_data;
1379 unsigned long wait_for = (cmd->allowed + 1) * cmd->timeout_per_command;
1380 int disposition;
1381
1382 INIT_LIST_HEAD(&cmd->eh_entry);
1383
1384 disposition = scsi_decide_disposition(cmd);
1385 if (disposition != SUCCESS &&
1386 time_before(cmd->jiffies_at_alloc + wait_for, jiffies)) {
1387 sdev_printk(KERN_ERR, cmd->device,
1388 "timing out command, waited %lus/n",
1389 wait_for/HZ);
1390 disposition = SUCCESS;
1391 }
1392
1393 scsi_log_completion(cmd, disposition);
1394
1395 switch (disposition) {
1396 case SUCCESS:
1397 scsi_finish_command(cmd);
1398 break;
1399 case NEEDS_RETRY:
1400 scsi_queue_insert(cmd, SCSI_MLQUEUE_EH_RETRY);
1401 break;
1402 case ADD_TO_MLQUEUE:
1403 scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
1404 break;
1405 default:
1406 if (!scsi_eh_scmd_add(cmd, 0))
1407 scsi_finish_command(cmd);
1408 }
1409 }

不用我多说,你也知道,scsi_softirq_done会调用scsi_finish_command,来自drivers/scsi/scsi.c:

661 /*
662 * Function: scsi_finish_command
663 *
664 * Purpose: Pass command off to upper layer for finishing of I/O
665 * request, waking processes that are waiting on results,
666 * etc.
667 */
668 void scsi_finish_command(struct scsi_cmnd *cmd)
669 {
670 struct scsi_device *sdev = cmd->device;
671 struct Scsi_Host *shost = sdev->host;
672
673 scsi_device_unbusy(sdev);
674
675 /*
676 * Clear the flags which say that the device/host is no longer
677 * capable of accepting new commands. These are set in scsi_queue.c
678 * for both the queue full condition on a device, and for a
679 * host full condition on the host.
680 *
681 * XXX(hch): What about locking?
682 */
683 shost->host_blocked = 0;
684 sdev->device_blocked = 0;
685
686 /*
687 * If we have valid sense information, then some kind of recovery
688 * must have taken place. Make a note of this.
689 */
690 if (SCSI_SENSE_VALID(cmd))
691 cmd->result |= (DRIVER_SENSE << 24);
692
693 SCSI_LOG_MLCOMPLETE(4, sdev_printk(KERN_INFO, sdev,
694 "Notifying upper driver of completion "
695 "(result %x)/n", cmd->result));
696
697 cmd->done(cmd);
698 }

也就是说,cmd->done会被调用,从而真正的幕后工作者scsi_blk_pc_done会被调用.因为,当初在scsi_setup_blk_pc_cmnd()中有这么一行,

1135 cmd->done = scsi_blk_pc_done;

而scsi_blk_pc_done来自drivers/scsi/scsi_lib.c:

1078 static void scsi_blk_pc_done(struct scsi_cmnd *cmd)
1079 {
1080 BUG_ON(!blk_pc_request(cmd->request));
1081 /*
1082 * This will complete the whole command with uptodate=1 so
1083 * as far as the block layer is concerned the command completed
1084 * successfully. Since this is a REQ_BLOCK_PC command the
1085 * caller should check the request's errors value
1086 */
1087 scsi_io_completion(cmd, cmd->request_bufflen);
1088 }

来自drivers/scsi/scsi_lib.c:

789 /*
790 * Function: scsi_io_completion()
791 *
792 * Purpose: Completion processing for block device I/O requests.
793 *
794 * Arguments: cmd - command that is finished.
795 *
796 * Lock status: Assumed that no lock is held upon entry.
797 *
798 * Returns: Nothing
799 *
800 * Notes: This function is matched in terms of capabilities to
801 * the function that created the scatter-gather list.
802 * In other words, if there are no bounce buffers
803 * (the normal case for most drivers), we don't need
804 * the logic to deal with cleaning up afterwards.
805 *
806 * We must do one of several things here:
807 *
808 * a) Call scsi_end_request. This will finish off the
809 * specified number of sectors. If we are done, the
810 * command block will be released, and the queue
811 * function will be goosed. If we are not done, then
812 * scsi_end_request will directly goose the queue.
813 *
814 * b) We can just use scsi_requeue_command() here. This would
815 * be used if we just wanted to retry, for example.
816 */
817 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
818 {
819 int result = cmd->result;
820 int this_count = cmd->request_bufflen;
821 request_queue_t *q = cmd->device->request_queue;
822 struct request *req = cmd->request;
823 int clear_errors = 1;
824 struct scsi_sense_hdr sshdr;
825 int sense_valid = 0;
826 int sense_deferred = 0;
827
828 scsi_release_buffers(cmd);
829
830 if (result) {
831 sense_valid = scsi_command_normalize_sense(cmd, &sshdr);
832 if (sense_valid)
833 sense_deferred = scsi_sense_is_deferred(&sshdr);
834 }
835
836 if (blk_pc_request(req)) { /* SG_IO ioctl from block level */
837 req->errors = result;
838 if (result) {
839 clear_errors = 0;
840 if (sense_valid && req->sense) {
841 /*
842 * SG_IO wants current and deferred errors
843 */
844 int len = 8 + cmd->sense_buffer[7];
845
846 if (len > SCSI_SENSE_BUFFERSIZE)
847 len = SCSI_SENSE_BUFFERSIZE;
848 memcpy(req->sense, cmd->sense_buffer, len);
849 req->sense_len = len;
850 }
851 }
852 req->data_len = cmd->resid;
853 }
854
855 /*
856 * Next deal with any sectors which we were able to correctly
857 * handle.
858 */
859 SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "
860 "%d bytes done./n",
861 req->nr_sectors, good_bytes));
862 SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d/n", cmd->use_sg));
863
864 if (clear_errors)
865 req->errors = 0;
866
867 /* A number of bytes were successfully read. If there
868 * are leftovers and there is some kind of error
869 * (result != 0), retry the rest.
870 */
871 if (scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL)
872 return;
873
874 /* good_bytes = 0, or (inclusive) there were leftovers and
875 * result = 0, so scsi_end_request couldn't retry.
876 */
877 if (sense_valid && !sense_deferred) {
878 switch (sshdr.sense_key) {
879 case UNIT_ATTENTION:
880 if (cmd->device->removable) {
881 /* Detected disc change. Set a bit
882 * and quietly refuse further access.
883 */
884 cmd->device->changed = 1;
885 scsi_end_request(cmd, 0, this_count, 1);
886 return;
887 } else {
888 /* Must have been a power glitch, or a
889 * bus reset. Could not have been a
890 * media change, so we just retry the
891 * request and see what happens.
892 */
893 scsi_requeue_command(q, cmd);
894 return;
895 }
896 break;
897 case ILLEGAL_REQUEST:
898 /* If we had an ILLEGAL REQUEST returned, then
899 * we may have performed an unsupported
900 * command. The only thing this should be
901 * would be a ten byte read where only a six
902 * byte read was supported. Also, on a system
903 * where READ CAPACITY failed, we may have
904 * read past the end of the disk.
905 */
906 if ((cmd->device->use_10_for_rw &&
907 sshdr.asc == 0x20 && sshdr.ascq == 0x00) &&
908 (cmd->cmnd[0] == READ_10 ||
909 cmd->cmnd[0] == WRITE_10)) {
910 cmd->device->use_10_for_rw = 0;
911 /* This will cause a retry with a
912 * 6-byte command.
913 */
914 scsi_requeue_command(q, cmd);
915 return;
916 } else {
917 scsi_end_request(cmd, 0, this_count, 1);
918 return;
919 }
920 break;
921 case NOT_READY:
922 /* If the device is in the process of becoming
923 * ready, or has a temporary blockage, retry.
924 */
925 if (sshdr.asc == 0x04) {
926 switch (sshdr.ascq) {
927 case 0x01: /* becoming ready */
928 case 0x04: /* format in progress */
929 case 0x05: /* rebuild in progress */
930 case 0x06: /* recalculation in progress */
931 case 0x07: /* operation in progress */
932 case 0x08: /* Long write in progress */
933 case 0x09: /* self test in progress */
934 scsi_requeue_command(q, cmd);
935 return;
936 default:
937 break;
938 }
939 }
940 if (!(req->cmd_flags & REQ_QUIET)) {
941 scmd_printk(KERN_INFO, cmd,
942 "Device not ready: ");
943 scsi_print_sense_hdr("", &sshdr);
944 }
945 scsi_end_request(cmd, 0, this_count, 1);
946 return;
947 case VOLUME_OVERFLOW:
948 if (!(req->cmd_flags & REQ_QUIET)) {
949 scmd_printk(KERN_INFO, cmd,
950 "Volume overflow, CDB: ");
951 __scsi_print_command(cmd->cmnd);
952 scsi_print_sense("", cmd);
953 }
954 /* See SSC3rXX or current. */
955 scsi_end_request(cmd, 0, this_count, 1);
956 return;
957 default:
958 break;
959 }
960 }
961 if (host_byte(result) == DID_RESET) {
962 /* Third party bus reset or reset for error recovery
963 * reasons. Just retry the request and see what
964 * happens.
965 */
966 scsi_requeue_command(q, cmd);
967 return;
968 }
969 if (result) {
970 if (!(req->cmd_flags & REQ_QUIET)) {
971 scsi_print_result(cmd);
972 if (driver_byte(result) & DRIVER_SENSE)
973 scsi_print_sense("", cmd);
974 }
975 }
976 scsi_end_request(cmd, 0, this_count, !result);
977 }

又是一个令人发指的函数.但我什么都不想多说了.直接跳到最后一行,scsi_end_request().来自drivers/scsi_lib.c:

632 /*
633 * Function: scsi_end_request()
634 *
635 * Purpose: Post-processing of completed commands (usually invoked at end
636 * of upper level post-processing and scsi_io_completion).
637 *
638 * Arguments: cmd - command that is complete.
639 * uptodate - 1 if I/O indicates success, <= 0 for I/O error.
640 * bytes - number of bytes of completed I/O
641 * requeue - indicates whether we should requeue leftovers.
642 *
643 * Lock status: Assumed that lock is not held upon entry.
644 *
645 * Returns: cmd if requeue required, NULL otherwise.
646 *
647 * Notes: This is called for block device requests in order to
648 * mark some number of sectors as complete.
649 *
650 * We are guaranteeing that the request queue will be goosed
651 * at some point during this call.
652 * Notes: If cmd was requeued, upon return it will be a stale pointer.
653 */
654 static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int uptodate,
655 int bytes, int requeue)
656 {
657 request_queue_t *q = cmd->device->request_queue;
658 struct request *req = cmd->request;
659 unsigned long flags;
660
661 /*
662 * If there are blocks left over at the end, set up the command
663 * to queue the remainder of them.
664 */
665 if (end_that_request_chunk(req, uptodate, bytes)) {
666 int leftover = (req->hard_nr_sectors << 9);
667
668 if (blk_pc_request(req))
669 leftover = req->data_len;
670
671 /* kill remainder if no retrys */
672 if (!uptodate && blk_noretry_request(req))
673 end_that_request_chunk(req, 0, leftover);
674 else {
675 if (requeue) {
676 /*
677 * Bleah. Leftovers again. Stick the
678 * leftovers in the front of the
679 * queue, and goose the queue again.
680 */
681 scsi_requeue_command(q, cmd);
682 cmd = NULL;
683 }
684 return cmd;
685 }
686 }
687
688 add_disk_randomness(req->rq_disk);
689
690 spin_lock_irqsave(q->queue_lock, flags);
691 if (blk_rq_tagged(req))
692 blk_queue_end_tag(q, req);
693 end_that_request_last(req, uptodate);
694 spin_unlock_irqrestore(q->queue_lock, flags);
695
696 /*
697 * This will goose the queue request function at the end, so we don't
698 * need to worry about launching another command.
699 */
700 scsi_next_command(cmd);
701 return NULL;
702 }

而我们最需要关心的,是693行end_that_request_last.

3618 /*
3619 * queue lock must be held
3620 */
3621 void end_that_request_last(struct request *req, int uptodate)
3622 {
3623 struct gendisk *disk = req->rq_disk;
3624 int error;
3625
3626 /*
3627 * extend uptodate bool to allow < 0 value to be direct io error
3628 */
3629 error = 0;
3630 if (end_io_error(uptodate))
3631 error = !uptodate ? -EIO : uptodate;
3632
3633 if (unlikely(laptop_mode) && blk_fs_request(req))
3634 laptop_io_completion();
3635
3636 /*
3637 * Account IO completion. bar_rq isn't accounted as a normal
3638 * IO on queueing nor completion. Accounting the containing
3639 * request is enough.
3640 */
3641 if (disk && blk_fs_request(req) && req != &req->q->bar_rq) {
3642 unsigned long duration = jiffies - req->start_time;
3643 const int rw = rq_data_dir(req);
3644
3645 __disk_stat_inc(disk, ios[rw]);
3646 __disk_stat_add(disk, ticks[rw], duration);
3647 disk_round_stats(disk);
3648 disk->in_flight--;
3649 }
3650 if (req->end_io)
3651 req->end_io(req, error);
3652 else
3653 __blk_put_request(req->q, req);
3654 }

好了,3651行这个end_io是最关键的代码.也许你早已忘记我们曾经见过end_io,但是不要紧,有我在.在blk_execute_rq_nowait()中,曾经有一行

2596 rq->end_io = done;

而done是这个函数的第四个参数.当初我们在调用这个函数的时候,在blk_execute_rq中,我们是这样写的:

2636 blk_execute_rq_nowait(q, bd_disk, rq, at_head, blk_end_sync_rq);

也就是说,rq->end_io被赋上了blk_end_sync_rq.

2786 /**
2787 * blk_end_sync_rq - executes a completion event on a request
2788 * @rq: request to complete
2789 * @error: end io status of the request
2790 */
2791 void blk_end_sync_rq(struct request *rq, int error)
2792 {
2793 struct completion *waiting = rq->end_io_data;
2794
2795 rq->end_io_data = NULL;
2796 __blk_put_request(rq->q, rq);
2797
2798 /*
2799 * complete last, if this is a stack request the process (and thus
2800 * the rq pointer) could be invalid right after this complete()
2801 */
2802 complete(waiting);
2803 }

终于我们找到了亲爱的可爱的相爱的深爱的最爱的complete().那么如何确定此waiting就是彼wait呢?对照一下这个waiting,当时在blk_execute_rq中我们有:

2635 rq->end_io_data = &wait;

而眼下我们又有:

2793 struct completion *waiting = rq->end_io_data;

由此可知我们没有搞错对象,毕竟我们深知,接吻可以搞错对象,发脾气则不可以,写代码则更加不可以.
至此,blk_execute_rq被唤醒,然后迅速返回.紧随其后的是scsi_execute的返回和scsi_execute_req的返回.这一刻,一个scsi命令终于从无到有最终到有,它经历了scsi命令到request的蜕变,也经历了request到scsi命令的历练.最终它完成了它的使命.对它来说,生命是一场幻觉,别离或者死亡是唯一的结局.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: