您的位置:首页 > 移动开发 > Android开发

Android系统高通平台Kernel Watchdog

2017-08-05 17:25 253 查看
Watchdog概念

Watchdog主要应用于嵌入式系统,用于系统出现严重故障(如内核死锁,进入死循环,CPU跑飞等)不能恢复时,在无人为介入的情况下可以自动重新启动系统。

在传统Linux 内核下, watchdog的基本工作原理是:当watchdog启动后(即/dev/watchdog 设备被打开后),如果在某一设定的时间间隔内/dev/watchdog没有被执行写操作,
硬件watchdog电路或软件定时器就会重新启动系统。

Watchdog根据实现方式又可以分为硬件watchdog和软件watchdog。硬件watchdog必须有硬件电路支持,设备节点/dev/watchdog对应真实的物理设备。软件watchdog通过通过内核定时器来实现,/dev/watchdog并不对应真实的物理设备。

硬件watchdog比软件watchdog有更好的可靠性。软件watchdog最大的优势是成本低,因此在可靠性要求不是很高一般民用产品被广泛使用。硬件watchdog的优势是可靠性高,因此在对可靠性要求严格的工业产品中被广泛使用。

但是在高通平台Android系统中,watchdog的实现有所不同,稍后我们会分析,这里只需知道其并没有提供/dev/watchdog。

当然在系统出现严重故障不能恢复时触发Watchdog,重启系统,仅仅是一个补救措施,虽然有效,但是过于简单粗暴,用户体验不佳。解决问题的最好方法是不让问题发生,因此我们需要针对watchdog进行和分析,尽量不让问题不发生。

注意Android系统中还有一套watchdog实现,也是使用软件实现的,用于检测SystemServer中各Service是否正常运行。大家不要搞混了。

如没有特别说明,本文后续提到的watchdog都特指高通平台Android系统kernel中watchdog。

二Watchdog的实现

2.0 Device Tree中watchdog的定义

[align=justify]       wdog: qcom,wdt@17817000 {[/align]
[align=justify]              compatible = "qcom,msm-watchdog";[/align]
[align=justify]              reg = <0x17817000 0x1000>; //没有查到对应寄存器的说明[/align]
[align=justify]              reg-names = "wdt-base";[/align]
[align=justify]              interrupts = <0 3 0>, <0 4 0>; //狗叫和狗咬的中断,由于目前的实现是狗叫的同时就进行狗咬,所以只用到了狗叫的中断[/align]
[align=justify]              qcom,bark-time = <11000>; //超过11秒没有喂狗,连叫带咬,系统重启[/align]
[align=justify]              qcom,pet-time = <10000>; //每10秒喂狗一次[/align]
[align=justify]              qcom,ipi-ping; //喂狗时需要ping一下系统中的其他cpu,确保所有cpu都处于正常状态[/align]
[align=justify]              qcom,wakeup-enable; //看门狗具有唤醒系统的能力,如果不具备唤醒能力的话,需要在系统睡眠时关闭看门狗,唤醒时再重新打开看门狗[/align]
[align=justify]              qcom,scandump-size = <0x40000>; // ramdump相关[/align]
[align=justify]       };[/align]

2.1核心数据结构struct msm_watchdog_data

Watchdog的显示在drivers/soc/qcom/watchdog_v2.c源文件中。
[align=justify]  struct msm_watchdog_data {[/align]
[align=justify]       unsigned int __iomem phys_base; //对应dt中的reg[/align]
[align=justify]       size_t size;[/align]
[align=justify]       void __iomem *base; //将的phy_base映射到虚拟地址空间[/align]
[align=justify]       void __iomem *wdog_absent_base;[/align]
[align=justify]       struct device *dev; //指向watchdog的device[/align]
[align=justify]       unsigned int pet_time; //对应dt中的qcom,pet-time[/align]
[align=justify]       unsigned int bark_time; //对应dt中的qcom,bark-time[/align]
[align=justify]       unsigned int bark_irq; //狗叫中断[/align]
[align=justify]       unsigned int bite_irq; //狗咬中断[/align]
[align=justify]       bool do_ipi_ping; //对应dt中的qcom,ipi-ping[/align]
[align=justify]       bool wakeup_irq_enable; //对应dt中的qcom,wakeup-enable[/align]
[align=justify]       unsigned long long last_pet; //记录上次喂狗时间[/align]
[align=justify]       unsigned min_slack_ticks;[/align]
[align=justify]       unsigned long long min_slack_ns;[/align]
[align=justify]       void *scm_regsave;[/align]
[align=justify]       cpumask_t alive_mask;[/align]
[align=justify]       struct mutex disable_lock;[/align]
[align=justify]       bool irq_ppi;[/align]
[align=justify]       struct msm_watchdog_data __percpu **wdog_cpu_dd; //当irq_ppi为true时才会用到[/align]
[align=justify]       struct notifier_block panic_blk; //将会注册到panic_notifier_list内核通知链,当内核panic会回调[/align]
[align=justify]       bool enabled; //标示watchdog是否使能[/align]
[align=justify]       bool user_pet_enabled; //标示watchdog是否对用户空间开放,我们没有定义qcom,userspace-watchdog,没有对用户空间开放,因此不去关注[/align]
[align=justify]       struct task_struct *watchdog_task; // watchdog的内核进程,名为msm-watchdog[/align]
[align=justify]       struct timer_list pet_timer; //喂狗的定时器[/align]
[align=justify]       wait_queue_head_t pet_complete; //喂狗的内核等待队列[/align]
[align=justify]       bool timer_expired; //标示喂狗定时器是否到期,timer到期后置为true,唤醒喂狗的内核等待队列会后置为false[/align]
[align=justify]       bool user_pet_complete;[/align]
[align=justify]       unsigned int scandump_size;[/align]
[align=justify]  };[/align]

2.2 Watchdog的初始化

[align=justify]下列函数略有删减[/align]
[align=justify]static int msm_watchdog_probe(struct platform_device *pdev)[/align]
[align=justify]  {[/align]
[align=justify]       int ret;[/align]
[align=justify]       struct msm_watchdog_data *wdog_dd;[/align]
[align=justify]       if (!pdev->dev.of_node || !enable)[/align]
[align=justify]              return -ENODEV;[/align]
       wdog_dd = kzalloc(sizeof(struct msm_watchdog_data), GFP_KERNEL); //分配struct
msm_watchdog_data结构体
[align=justify]       if (!wdog_dd)[/align]
[align=justify]              return -EIO;[/align]
       ret = msm_wdog_dt_to_pdata(pdev, wdog_dd); //解析device
tree,设置相应的struct msm_watchdog_data成员变量
[align=justify]       if (ret)[/align]
[align=justify]              goto err;[/align]
       wdog_data = wdog_dd; //将分配的struct
msm_watchdog_data结构体
[align=justify]保存到全局变量[/align]
[align=justify]       wdog_dd->dev = &pdev->dev;[/align]
[align=justify]       platform_set_drvdata(pdev, wdog_dd);[/align]
[align=justify]       cpumask_clear(&wdog_dd->alive_mask);[/align]
[align=justify]       wdog_dd->watchdog_task = kthread_create(watchdog_kthread, wdog_dd,[/align]
[align=justify]                     "msm_watchdog"); //创建名为msm-watchdog的内核进程,进程入口函数watchdog_kthread,wdog_dd是watchdog_kthread的参数,kthread_create仅创建进程,并不立即运行[/align]
[align=justify]       if (IS_ERR(wdog_dd->watchdog_task)) {[/align]
[align=justify]              ret = PTR_ERR(wdog_dd->watchdog_task);[/align]
[align=justify]              goto err;[/align]
[align=justify]       }[/align]
       init_watchdog_data(wdog_dd); //继续完善struct
msm_watchdog_data结构体,并做进一步初始化
[align=justify]       return 0;[/align]
[align=justify]  err:[/align]
[align=justify]       kzfree(wdog_dd);[/align]
[align=justify]       return ret;[/align]
[align=justify]  }[/align]
[align=justify]  static void init_watchdog_data(struct msm_watchdog_data *wdog_dd)[/align]
[align=justify]  {[/align]
[align=justify]       unsigned long delay_time;[/align]
[align=justify]       uint32_t val;[/align]
[align=justify]       u64 timeout;[/align]
[align=justify]       int ret;[/align]
[align=justify]       {[/align]
[align=justify]              ret = devm_request_irq(wdog_dd->dev, wdog_dd->bark_irq,[/align]
[align=justify]                            wdog_bark_handler, IRQF_TRIGGER_RISING,[/align]
[align=justify]                                          "apps_wdog_bark", wdog_dd); //申请狗叫的中断[/align]
[align=justify]              if (ret) {[/align]
[align=justify]                     dev_err(wdog_dd->dev, "failed to request bark irq\n");[/align]
[align=justify]                     return;[/align]
[align=justify]              }[/align]
[align=justify]       }[/align]
[align=justify]       delay_time = msecs_to_jiffies(wdog_dd->pet_time); //喂狗延时[/align]
[align=justify]       wdog_dd->min_slack_ticks = UINT_MAX;[/align]
[align=justify]       wdog_dd->min_slack_ns = ULLONG_MAX;[/align]
[align=justify]       configure_bark_dump(wdog_dd);[/align]
[align=justify]       timeout = (wdog_dd->bark_time * WDT_HZ)/1000; // 11s * WDT_HZ[/align]
[align=justify]       __raw_writel(timeout, wdog_dd->base + WDT0_BARK_TIME); //配置狗叫的时间,11秒[/align]
[align=justify]       __raw_writel(timeout + 3*WDT_HZ, wdog_dd->base + WDT0_BITE_TIME); //配置狗叫的时间,14秒[/align]
[align=justify]       wdog_dd->panic_blk.notifier_call = panic_wdog_handler; //手机panic时watchdog的回调函数[/align]
[align=justify]       atomic_notifier_chain_register(&panic_notifier_list,[/align]
[align=justify]                                   &wdog_dd->panic_blk); //注册回调函数,panic_notifier_list内核通知链将在panic函数中被调用[/align]
[align=justify]       mutex_init(&wdog_dd->disable_lock);[/align]
[align=justify]       init_waitqueue_head(&wdog_dd->pet_complete); //初始化喂狗的内核等待队列[/align]
[align=justify]       wdog_dd->timer_expired = false;[/align]
[align=justify]       wdog_dd->user_pet_complete = true;[/align]
[align=justify]       wdog_dd->user_pet_enabled = false;[/align]
[align=justify]       wake_up_process(wdog_dd->watchdog_task); //唤醒msm-watchdog内核进程[/align]
[align=justify]       init_timer_deferrable(&wdog_dd->pet_timer); //初始化喂狗定时器,deferrable表示定时器对时间敏感度不是很高,内核可以将接近的几个timer集中起来一起执行,减少唤醒系统的次数[/align]
[align=justify]       wdog_dd->pet_timer.data = (unsigned long)wdog_dd; //喂狗函数的参数[/align]
[align=justify]       wdog_dd->pet_timer.function = pet_task_wakeup; //喂狗函数[/align]
[align=justify]       wdog_dd->pet_timer.expires = jiffies + delay_time; //喂狗的定时器超时时间[/align]
[align=justify]       add_timer(&wdog_dd->pet_timer); //注册喂狗定时器[/align]
[align=justify]       val = BIT(EN);[/align]
[align=justify]       if (wdog_dd->wakeup_irq_enable) //设置watchdog有唤醒cpu的能力[/align]
[align=justify]              val |= BIT(UNMASKED_INT_EN);[/align]
[align=justify]       __raw_writel(val, wdog_dd->base + WDT0_EN);[/align]
[align=justify]       __raw_writel(1, wdog_dd->base + WDT0_RST);[/align]
[align=justify]       wdog_dd->last_pet = sched_clock(); //初始化上次喂狗时间,每次喂狗时会更新[/align]
[align=justify]       wdog_dd->enabled = true; //标示watchdog使能[/align]
[align=justify]       init_watchdog_sysfs(wdog_dd); //创建sysfs节点[/align]
[align=justify]       dev_info(wdog_dd->dev, "MSM Watchdog Initialized\n");[/align]
[align=justify]       return;[/align]
[align=justify]  }[/align]

2.3 Watchdog的工作流程

每次喂狗定时器超时后,执行定时器函数pet_task_wakeup,设置timer_expired为ture,唤醒pet_complete等待队列
[align=justify]  static void pet_task_wakeup(unsigned long data)[/align]
[align=justify]  {[/align]
[align=justify]       struct msm_watchdog_data *wdog_dd =[/align]
[align=justify]              (struct msm_watchdog_data *)data;[/align]
[align=justify]       wdog_dd->timer_expired = true;[/align]
[align=justify]       wake_up(&wdog_dd->pet_complete);[/align]
[align=justify]  }[/align]

msm_watchdog进程
[align=justify]  static __ref int watchdog_kthread(void *arg)[/align]
[align=justify]  {[/align]
[align=justify]       struct msm_watchdog_data *wdog_dd =[/align]
[align=justify]              (struct msm_watchdog_data *)arg;[/align]
[align=justify]       unsigned long delay_time = 0;[/align]
[align=justify]       struct sched_param param = {.sched_priority = MAX_RT_PRIO-1};[/align]
[align=justify]       sched_setscheduler(current, SCHED_FIFO, ¶m); //将msm_watchdog进程设置为实时进程,使用先进先出的调度策略[/align]
[align=justify]       while (!kthread_should_stop()) { //判断进程是应该停止[/align]
[align=justify]              while (wait_event_interruptible( //判断timer_expired是否为true,为true时退出等待,为false时等待在pet_complete等待队列上;直到别处调用wake_up调用唤醒,则重新根据timer_expired是否为true,进行等待或继续往下执行[/align]
[align=justify]                     wdog_dd->pet_complete,[/align]
[align=justify]                     wdog_dd->timer_expired) != 0)[/align]
[align=justify]                     ;[/align]
[align=justify]              if (wdog_dd->do_ipi_ping)[/align]
[align=justify]                     ping_other_cpus(wdog_dd); // ping其他cpu,确保所有cpu都是活的[/align]
[align=justify]              while (wait_event_interruptible( // user_pet_complete为false,永不等待[/align]
[align=justify]                     wdog_dd->pet_complete,[/align]
[align=justify]                     wdog_dd->user_pet_complete) != 0)[/align]
[align=justify]                     ;[/align]
[align=justify]              wdog_dd->timer_expired = false; //将timer_expired设为false[/align]
[align=justify]              wdog_dd->user_pet_complete = !wdog_dd->user_pet_enabled;[/align]
[align=justify]              if (enable) {[/align]
[align=justify]                     delay_time = msecs_to_jiffies(wdog_dd->pet_time);[/align]
[align=justify]                     pet_watchdog(wdog_dd); //喂狗[/align]
[align=justify]              }[/align]
[align=justify]              /* Check again before scheduling *[/align]
[align=justify]               * Could have been changed on other cpu */[/align]
[align=justify]              mod_timer(&wdog_dd->pet_timer, jiffies + delay_time); //重新设置定时器超时时间,并注册[/align]
[align=justify]       }[/align]
[align=justify]       return 0;[/align]
[align=justify]  }[/align]

喂狗
[align=justify]  static void pet_watchdog(struct msm_watchdog_data *wdog_dd)[/align]
[align=justify]  {[/align]
[align=justify]       int slack, i, count, prev_count = 0;[/align]
[align=justify]       unsigned long long time_ns;[/align]
[align=justify]       unsigned long long slack_ns;[/align]
[align=justify]       unsigned long long bark_time_ns = wdog_dd->bark_time * 1000000ULL;[/align]
[align=justify]       for (i = 0; i < 2; i++) { //读取watchdog状态寄存器[/align]
[align=justify]              count = (__raw_readl(wdog_dd->base + WDT0_STS) >> 1) & 0xFFFFF;[/align]
[align=justify]              if (count != prev_count) {[/align]
[align=justify]                     prev_count = count;[/align]
[align=justify]                     i = 0;[/align]
[align=justify]              }[/align]
[align=justify]       }[/align]
[align=justify]       slack = ((wdog_dd->bark_time * WDT_HZ) / 1000) - count;[/align]
[align=justify]       if (slack < wdog_dd->min_slack_ticks)[/align]
[align=justify]              wdog_dd->min_slack_ticks = slack;[/align]
[align=justify]       __raw_writel(1, wdog_dd->base + WDT0_RST); //重置watchdog,即喂狗[/align]
[align=justify]       time_ns = sched_clock();[/align]
[align=justify]       slack_ns = (wdog_dd->last_pet + bark_time_ns) - time_ns;[/align]
[align=justify]       if (slack_ns < wdog_dd->min_slack_ns)[/align]
[align=justify]              wdog_dd->min_slack_ns = slack_ns;[/align]
[align=justify]       wdog_dd->last_pet = time_ns;[/align]
[align=justify]  }[/align]

没有按时喂狗,触发bark中断,执行中断处理函数。
[align=justify]  static irqreturn_t wdog_bark_handler(int irq, void *dev_id)[/align]
[align=justify]  {[/align]
[align=justify]       struct msm_watchdog_data *wdog_dd = (struct msm_watchdog_data *)dev_id;[/align]
[align=justify]       unsigned long nanosec_rem;[/align]
[align=justify]       unsigned long long t = sched_clock();[/align]
[align=justify]       nanosec_rem = do_div(t, 1000000000);[/align]
[align=justify]       printk(KERN_INFO "Watchdog bark! Now = %lu.%06lu\n", (unsigned long) t,[/align]
[align=justify]              nanosec_rem / 1000); //打印狗叫的时间[/align]
[align=justify]       nanosec_rem = do_div(wdog_dd->last_pet, 1000000000);[/align]
[align=justify]       printk(KERN_INFO "Watchdog last pet at %lu.%06lu\n", (unsigned long)[/align]
[align=justify]              wdog_dd->last_pet, nanosec_rem / 1000); //打印上一次喂狗的时间[/align]
[align=justify]       if (wdog_dd->do_ipi_ping)[/align]
[align=justify]              dump_cpu_alive_mask(wdog_dd);[/align]
[align=justify]       msm_trigger_wdog_bite(); //触发狗咬[/align]
[align=justify]       panic("Failed to cause a watchdog bite! - Falling back to kernel panic!");[/align]
[align=justify]       return IRQ_HANDLED;[/align]
[align=justify]  }[/align]

狗叫时,系统已经异常,因此无法走正常关机流程,因此需要写watchdog寄存器,由硬件来重启手机。
[align=justify]  void msm_trigger_wdog_bite(void)[/align]
[align=justify]  {[/align]
[align=justify]       if (!wdog_data)[/align]
[align=justify]              return;[/align]
[align=justify]       pr_info("Causing a watchdog bite!");[/align]
[align=justify]       __raw_writel(1, wdog_data->base + WDT0_BITE_TIME); //一个clk后,狗咬,由硬件处理[/align]
[align=justify]       mb();[/align]
[align=justify]       __raw_writel(1, wdog_data->base + WDT0_RST); //重置watchdog[/align]
[align=justify]       mb();[/align]
[align=justify]       /* Delay to make sure bite occurs */[/align]
[align=justify]       mdelay(10000); //等待狗咬完成手机重启[/align]
[align=justify]       pr_err("Wdog - STS: 0x%x, CTL: 0x%x, BARK TIME: 0x%x, BITE TIME: 0x%x",[/align]
[align=justify]              __raw_readl(wdog_data->base + WDT0_STS),[/align]
[align=justify]              __raw_readl(wdog_data->base + WDT0_EN),[/align]
[align=justify]              __raw_readl(wdog_data->base + WDT0_BARK_TIME),[/align]
[align=justify]              __raw_readl(wdog_data->base + WDT0_BITE_TIME)); //手机重启失败,打印watchdog一些寄存器信息[/align]
[align=justify]  }[/align]

panic_wdog_handler其实不属于watchdog的流程,而是kernel panic后,手机关机或重启异常时借助watchdog来完成手机重启的。
[align=justify]  static int panic_wdog_handler(struct notifier_block *this,[/align]
[align=justify]                           unsigned long event, void *ptr)[/align]
[align=justify]  {[/align]
[align=justify]       struct msm_watchdog_data *wdog_dd = container_of(this,[/align]
[align=justify]                            struct msm_watchdog_data, panic_blk);[/align]
[align=justify]       if (panic_timeout == 0) { //我们的系统中panic_timeout等于5,因此走else流程[/align]
[align=justify]              __raw_writel(0, wdog_dd->base + WDT0_EN);[/align]
[align=justify]              mb();[/align]
[align=justify]       } else { //配置15秒后,watchdog超时,重启系统[/align]
[align=justify]              __raw_writel(WDT_HZ * (panic_timeout + 10),[/align]
[align=justify]                            wdog_dd->base + WDT0_BARK_TIME);[/align]
[align=justify]              __raw_writel(WDT_HZ * (panic_timeout + 10),[/align]
[align=justify]                            wdog_dd->base + WDT0_BITE_TIME);[/align]
[align=justify]              __raw_writel(1, wdog_dd->base + WDT0_RST);[/align]
[align=justify]       }[/align]
[align=justify]       return NOTIFY_DONE;[/align]
[align=justify]  }[/align]

2.4 watchdog工作示意图

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: