kernel对D状态(TASK_UNINTERRUPTIBLE)task在120s不被调度的检测
2017-11-20 17:22
1106 查看
当打开CONFIG_DETECT_HUNG_TASK选项的时候,kernel会对处于D状态也就是TASK_UNINTERRUPTIBLE的task进行检测,如果在120s内都没有被调动就会处于D状态task的pid,ppid 和stack等信息。 其源码在kernel/hung_task.c 中 static int __init hung_task_init(void) { atomic_notifier_chain_register(&panic_notifier_list, &panic_block); watchdog_task = kthread_run(watchdog, NULL, "khungtaskd"); return 0; } subsys_initcall(hung_task_init); 直接调用kthread_run来运行watchdog 这个死循环,这个thread的name为khungtaskd /* * kthread which checks for tasks stuck in D state */ static int watchdog(void *dummy) { unsigned long hung_last_checked = jiffies; set_user_nice(current, 0); for ( ; ; ) { //每次检测的周期可以通过sysctl_hung_task_timeout_secs来设置 unsigned long timeout = sysctl_hung_task_timeout_secs; long t = hung_timeout_jiffies(hung_last_checked, timeout); if (t <= 0) { if (!atomic_xchg(&reset_hung_task, 0)) //调用check_hung_uninterruptible_tasks 来检测 check_hung_uninterruptible_tasks(timeout); hung_last_checked = jiffies; continue; } //如果能跑到这里说明没有检测到错误,睡眠sysctl_hung_task_timeout_secs 后来进行下一次检测 schedule_timeout_interruptible(t); } return 0; } 继续看看check_hung_uninterruptible_tasks static void check_hung_uninterruptible_tasks(unsigned long timeout) { int max_count = sysctl_hung_task_check_count; int batch_count = HUNG_TASK_BATCHING; struct task_struct *g, *t; /* * If the system crashed already then all bets are off, * do not report extra hung tasks: */ if (test_taint(TAINT_DIE) || did_panic) return; hung_task_show_lock = false; rcu_read_lock(); //遍历系统中所有的thread for_each_process_thread(g, t) { if (!max_count--) goto unlock; if (!--batch_count) { batch_count = HUNG_TASK_BATCHING; if (!rcu_lock_break(g, t)) goto unlock; } //只有对处于D状态也就是TASK_UNINTERRUPTIBLE的thread进行检测,最终调用check_hung_task来检测处于D状态的thread是否在规定的时间内没有被调度 /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */ if (t->state == TASK_UNINTERRUPTIBLE) check_hung_task(t, timeout); } unlock: rcu_read_unlock(); if (hung_task_show_lock) debug_show_all_locks(); } 继续看看check_hung_task static void check_hung_task(struct task_struct *t, unsigned long timeout) { unsigned long switch_count = t->nvcsw + t->nivcsw; /* * Ensure the task is not frozen. * Also, skip vfork and any other user process that freezer should skip. */ if (unlikely(t->flags & (PF_FROZEN | PF_FREEZER_SKIP))) return; /* * When a freshly created task is scheduled once, changes its state to * TASK_UNINTERRUPTIBLE without having ever been switched out once, it * musn't be checked. */ if (unlikely(!switch_count)) return; //这一句是核心,如果120s内被调度。则这个if条件成立,则在这里就返回了 if (switch_count != t->last_switch_count) { t->last_switch_count = switch_count; return; } trace_sched_process_hang(t); //走到这里说明就不正常,但是如果没有定义sysctl_hung_task_warnings和sysctl_hung_task_panic的话,则不产生任何warning log if (!sysctl_hung_task_warnings && !sysctl_hung_task_panic) return; /* * Ok, the task did not get scheduled for more than 2 minutes, * complain: */ //如果定义了sysctl_hung_task_warnings。则显示下面的log 显示警告,并调用sched_show_task显示当前task的pid,ppid,stack等信息 if (sysctl_hung_task_warnings) { if (sysctl_hung_task_warnings > 0) sysctl_hung_task_warnings--; pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n", t->comm, t->pid, timeout); pr_err(" %s %s %.*s\n", print_tainted(), init_utsname()->release, (int)strcspn(init_utsname()->version, " "), init_utsname()->version); pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\"" " disables this message.\n"); sched_show_task(t); hung_task_show_lock = true; } touch_nmi_watchdog(); //如果定义了sysctl_hung_task_panic,则通过trigger_all_cpu_backtrace触发让所有的cpu都挂掉 if (sysctl_hung_task_panic) { //如果定义了hung_task_show_lock,则显示系统中所有被持有的锁 if (hung_task_show_lock) debug_show_all_locks(); trigger_all_cpu_backtrace(); panic("hung_task: blocked tasks"); } }
相关文章推荐
- kernel对R状态task在4s不被调度的检测
- 进程资源和进程状态 TASK_RUNNING TASK_INTERRUPTIBLE TASK_UNINTERRUPTIBLE
- 进程状态TASK_UNINTERRUPTIBLE
- Linux的进程,线程以及调度(fork与僵尸,内存泄漏,task结构体,停止状态与作业控制)
- 进程资源和进程状态 TASK_RUNNING TASK_INTERRUPTIBLE TASK_UNINTERRUPTIBLE
- nginx 前端调度 对后端的app的生存状态的检测
- Linux进程描述符task_struct结构体详解--Linux进程的管理与调度
- iOS中如何利用AFNetworking检测当前的网络状态
- 转贴:Mark Russinovich的Inside Vista Kernel系列文章,讲到了Vista内核的调度,IO,内存管理,缓存,事务处理,安全等众多新特性
- Java 定时调度(Timer和TimerTask)
- Phonegap学习点滴(2) -- 网络状态检测
- iOS下的实际网络连接状态检测
- 检测网络状态
- iOS 检测网络状态
- Android网络编程实践之旅(一):网络状态检测
- 窥探 kernel --- 进程调度的目标,nice值,静态优先级,动态优先级,实时优先级,时间片
- APP 检测网络的状态 是wifi 还是蜂窝连接
- 检测网络状态,网络是否可用
- iOS开发 - Swift实现检测网络连接状态及网络类型
- 检测子进程的结束返回状态