机器学习之EM算法的原理及推导(三硬币模型)及Python实现
EM算法的简介
EM算法由两步组成:E步和M步,是最常用的迭代算法。
本文主要参考了李航博士的《统计学习方法》
在此基础上主要依据EM算法原理补充了三硬币模型的推导。
1.EM算法的原理
1.1从一个例子开始
三硬币模型
假设有3枚硬币,分别记作A,B和C。 这些硬币正面向上的概率分别是 π,p\pi,pπ,p 和 qqq 。进行如下抛硬币试验:
1、先抛硬币A, 根据其结果选出硬币B或者硬币C,正面选硬币B,反面选硬币C;
2、然后掷选出的硬币,抛硬币的结果,出现正面记作1,出现反面记作0;
3、独立重复nnn次试验(这里,n=10),观测结果如下:
1,1,0,1,0,0,1,0,1,1 1,1,0,1,0,0,1,0,1,1 1,1,0,1,0,0,1,0,1,1
假设只能观测到掷硬币的结果,不能观测掷硬币的过程。问如何估计三硬币正面出现的概率,即三硬币模型参数。
对于单次观测结果,三硬币模型可以写作:
p(yj∣θ)=∑zp(yj,z∣θ)=∑zp(z∣θ)p(yj∣z,θ)=πpyj(1−p)1−yj+(1−π)qyj(1−q)1−yj
\begin{aligned}
p(y_j|\theta)
&= \sum_z p(y_j,z| \theta) \\
&= \sum_z p(z|\theta) p(y_j|z,\theta) \\
&= \pi p^{y_j} (1-p)^{1-y_j} + (1-\pi)q^{y_j} (1-q)^{1-y_j}
\end{aligned}
p(yj∣θ)=z∑p(yj,z∣θ)=z∑p(z∣θ)p(yj∣z,θ)=πpyj(1−p)1−yj+(1−π)qyj(1−q)1−yj
其中,yiy_iyi是第jjj个观测结果1或0;随机变量zzz是隐变量,表示未观测到的掷硬币A的结果;θ=(π,p,q)\theta=(\pi,p,q)θ=(π,p,q)是模型参数。
方便起见,观测数据可以表示为y=(y1,y2,...,yn)Ty=(y_1,y_2,...,y_n)^Ty=(y1,y2,...,yn)T,隐变量数据可以表示为z=(z1,z2,...,zn)Tz=(z_1,z_2,...,z_n)^Tz=(z1,z2,...,zn)T。观测数据的似然函数可以表示为:
p(y∣θ)=∑zp(z∣θ)p(y∣z,θ)p(y|\theta) = \sum_z p(z|\theta) p(y|z,\theta)p(y∣θ)=z∑p(z∣θ)p(y∣z,θ)
即:
p(y∣θ)=∏j=1n[πpyj(1−p)1−yj+(1−π)qyj(1−q)1−yj](1)p(y|\theta) = \prod_{j=1}^n [ \pi p^{y_j} (1-p)^{1-y_j} + (1-\pi)q^{y_j} (1-q)^{1-y_j}] \tag1p(y∣θ)=j=1∏n[πpyj(1−p)1−yj+(1−π)qyj(1−q)1−yj](1)
这个问题没有办法直接解析,只能用迭代的方法解决。下面我们先看看EM算法的推导,之后重新再对这个问题进行推导。
1.2 EM算法推导
1.2.1 Jensen 不等式说明首先说明一下什么是凸函数:
粗糙一点理解,如果函数的二阶导数为正数,那么这个函数就是凸函数:比如开口向上的二次函数就是典型的凸函数。
若有凸函数f(x)f(x)f(x),且在函数中取自变量点集{xi}\{x_i\}{xi},且取对应{λi}\{ \lambda_i\}{λi},满足λi>0,∑λi=1\lambda_i>0,\sum \lambda_i=1λi>0,∑λi=1,
则有:
f(∑iλixi)≤∑iλif(xi) f(\sum_i \lambda_i x_i) \le \sum_i \lambda_i f(x_i)f(i∑λixi)≤i∑λif(xi)
当x>0x>0x>0时,−log(x)-\log(x)−log(x)的二阶导数1x2>0\frac{1}{x^2} > 0x21>0,故可对−log(x)-log(x)−log(x)运用Jessen不等式。
如果是对于log(x)log(x)log(x)运用Jessen不等式,不等式方向要变号。
求解型入(1)式的问题,我们取对数似然函数,也就是对对数似然函数求取极大值:
L(θ)=logp(y∣θ)=log(∑zp(y∣z,θ)p(z∣θ)) L(\theta) = \log p(y|\theta) = \log (\sum_z p(y|z,\theta) p(z|\theta))L(θ)=logp(y∣θ)=log(z∑p(y∣z,θ)p(z∣θ))
运用迭代的思想解决这个问题,假设在第iii次迭代后θ\thetaθ的估计值是θi\theta^iθi。因为想要求取最大对数似然,所以我们希望L(θ)>L(θi)L(\theta)>L(\theta^i)L(θ)>L(θi),并逐步达到极大值,也就是它们的差值达到最大值:
L(θ)−L(θi)=log(∑zp(y∣z,θ)p(z∣θ))−logp(y∣θi) L(\theta) - L(\theta^i) = \log (\sum_z p(y|z,\theta) p(z|\theta)) - log p(y|\theta^i) L(θ)−L(θi)=log(z∑p(y∣z,θ)p(z∣θ))−logp(y∣θi)
利用Jensen不等式,得到其下界:
L(θ)−L(θi)=log(∑zp(y∣z,θ)p(z∣θ))−logp(y∣θi)=log(∑zp(z∣y,θi)p(y∣z,θ)p(z∣θ)p(z∣y,θi))−logp(y∣θi)≥∑zp(z∣y,θi)logp(y∣z,θ)p(z∣θ)p(z∣y,θi)−logp(y∣θi)=∑zp(z∣y,θi)logp(y∣z,θ)p(z∣θ)p(z∣y,θi)−∑zp(z∣y,θi)logp(y∣θi)=∑zp(z∣y,θi)logp(y∣z,θ)p(z∣θ)p(z∣y,θi)p(y∣θi) \begin{aligned} L(\theta) - L(\theta^i) &= \log (\sum_z p(y|z,\theta) p(z|\theta)) - \log p(y|\theta^i) \\ &= \log (\sum_z p(z|y,\theta^i) \frac {p(y|z,\theta) p(z|\theta)}{p(z|y,\theta^i)}) - \log p(y|\theta^i) \\ &\ge \sum_z p(z|y,\theta^i) \log \frac{p(y|z,\theta) p(z|\theta)}{p(z|y,\theta^i)} - \log p(y|\theta^i) \\ &=\sum_z p(z|y,\theta^i) \log \frac {p(y|z,\theta) p(z|\theta)}{p(z|y,\theta^i)} - \sum_z p(z|y,\theta^i) \log p(y|\theta^i) \\ &=\sum_z p(z|y,\theta^i) \log \frac {p(y|z,\theta) p(z|\theta)}{p(z|y,\theta^i) p(y|\theta^i)} \\ \end{aligned} L(θ)−L(θi)=log(z∑p(y∣z,θ)p(z∣θ))−logp(y∣θi)=log(z∑p(z∣y,θi)p(z∣y,θi)p(y∣z,θ)p(z∣θ))−logp(y∣θi)≥z∑p(z∣y,θi)logp(z∣y,θi)p(y∣z,θ)p(z∣θ)−logp(y∣θi)=z∑p(z∣y,θi)logp(z∣y,θi)p(y∣z,θ)p(z∣θ)−z∑p(z∣y,θi)logp(y∣θi)=z∑p(z∣y,θi)logp(z∣y,θi)p(y∣θi)p(y∣z,θ)p(z∣θ)
令J(θ)=L(θ)−L(θi)J(\theta) =L(\theta) - L(\theta^i)J(θ)=L(θ)−L(θi),则求取的是:
θ(i+1)=argmax< 4000 /mo>θJ(θ)\theta^{(i+1)} = \arg \max_\theta J(\theta)θ(i+1)=argθmaxJ(θ)
J(θ)={∑zp(z∣y,θi)log[p(y∣z,θ)p(z∣θ)]}−{∑zp(z∣y,θi)log[p(z∣y,θi)p(y∣θi)]} J(\theta) = \{ \sum_z p(z|y,\theta^i) \log [{p(y|z,\theta) p(z|\theta)}] \}- \{ \sum_z p(z|y,\theta^i) \log [{p(z|y,\theta^i) p(y|\theta^i)}] \} J(θ)={z∑p(z∣y,θi)log[p(y∣z,θ)p(z∣θ)]}−{z∑p(z∣y,θi)log[p(z∣y,θi)p(y∣θi)]}
因为后一项中无θ\thetaθ项,故:
θ(i+1)=argmaxθ∑zp(z∣y,θi)log[p(y∣z,θ)p(z∣θ)] \theta^{(i+1)} = \arg \max_\theta \sum_z p(z|y,\theta^i) \log [{p(y|z,\theta) p(z|\theta)}]θ(i+1)=argθmaxz∑p(z∣y,θi)log[p(y∣z,θ)p(z∣θ)]
因为:
p(y∣z,θ)p(z∣θ)=p(y,z∣θ){p(y|z,\theta) p(z|\theta)} = p(y,z|\theta)p(y∣z,θ)p(z∣θ)=p(y,z∣θ)
设:
Q(θ,θi)=∑zp(z∣y,θi)logp(y,z∣θ)(2)Q(\theta, \theta^i) = \sum_z p(z|y,\theta^i) \log p(y,z|\theta) \tag2Q(θ,θi)=z∑p(z∣y,θi)logp(y,z∣θ)(2)
则:
θ(i+1)=argmaxθQ(θ,θi)(3) \theta^{(i+1)} = \arg \max_\theta Q(\theta, \theta^i) \tag3θ(i+1)=argθmaxQ(θ,θi)(3)
EM算法的总结:
E步(求隐变量p(z∣y,θi)p(z|y,\theta_i)p(z∣y,θi)):给定观测数据yyy和当前的参数估计θi\theta_iθi,求取隐变量zzz的条件概率分布;
M步:将隐变量当做已知量,求Q(θ,θi)Q(\theta,\theta_i)Q(θ,θi)的极大化的θ\thetaθ
E步和M步重复执行,直到收敛。
1.3 三硬币模型继续推导
1.3.1 三硬币模型的隐变量以及完全数据的对数似然函数我们已知:
p(y∣θ)=∏j=1n[πpyj(1−p)1−yj+(1−π)qyj(1−q)1−yj]p(y|\theta) = \prod_{j=1}^n [ \pi p^{y_j} (1-p)^{1-y_j} + (1-\pi)q^{y_j} (1-q)^{1-y_j}] p(y∣θ)=j=1∏n[πpyj(1−p)1−yj+(1−π)qyj(1−q)1−yj]
设yjy_jyj来自掷硬币B的概率为μj\mu_jμj, 则来自C的概率为1−μj1-\mu_j1−μj,且μj∈{0,1},j=1,2,...,n\mu_j \in \{0,1\},j=1,2,...,nμj∈{0,1},j=1,2,...,n。即参数μ\muμ为模型的隐变量。
于是完全数据的似然函数可以表示为:
p(y,μ∣θ)=∏j=1n{[πpyj(1−p)1−yj]μ+[(1−π)qyj(1−q)1−yj](1−μ)}p(y,\mu|\theta) = \prod_{j=1}^n \{ [ \pi p^{y_j} (1-p)^{1-y_j}]^\mu + [(1-\pi)q^{y_j} (1-q)^{1-y_j}]^{(1-\mu)} \}p(y,μ∣θ)=j=1∏n{[πpyj(1−p)1−yj]μ+[(1−π)qyj(1−q)1−yj](1−μ)}
相应的对数似然函数为:
logp(y,μ∣θ)=∑j=1n{μ[logπ+yjlogp+(1−yj)log(1−p)]+(1−μ)[log(1−π)+yjlogq+(1−yj)log(1−q)]}\log p(y,\mu|\theta) = \sum_{j=1}^n \{\mu [\log \pi + y_j \log p + (1-y_j) \log (1-p)] + (1-\mu) [\log(1-\pi) + y_j \log q + (1-y_j)\log(1-q)] \}logp(y,μ∣θ)=j=1∑n{μ[logπ+yjlogp+(1−yj)log(1−p)]+(1−μ)[log(1−π)+yjlogq+(1−yj)log(1−q)]}
1.3.2 E步:确定QQQ函数因为EM算法是迭代算法,设第iii次迭代的参数估计值为θ(i)=(π(i),p(i),q(i))\theta^{(i)}=(\pi^{(i)}, p^{(i)}, q^{(i)})θ(i)=(π(i),p(i),q(i)),又因为隐变量μ\muμ代表观测数据来自B的概率,所以第(i+1)(i+1)(i+1)次隐变量:
μj(i+1)=π(i)(p(i))yi(1−p(i))1−yiπ(i)(p(i))yi(1−p(i))1−yi+(1−π(i))(q(i))yi(1−q(i))1−yi\mu_{j}^{(i+1)} = \frac {\pi^{(i)} (p^{(i)})^{y_i} (1-p^{(i)})^{1-y_i}} {\pi^{(i)} (p^{(i)})^{y_i} (1-p^{(i)})^{1-y_i} + (1- \pi^{(i)}) (q^{(i)})^{y_i} (1-q^{(i)})^{1-y_i}}μj(i+1)=π(i)(p(i))yi(1−p(i))1−yi+(1−π(i))(q(i))yi(1−q(i))1−yiπ(i)(p(i))yi(1−p(i))1−yi
求取QQQ:
Q(θ,θi)=∑zp(z∣y,θi)logp(y,z∣θ)=Ez[logp(y,z∣θ,θ(i))]Q(\theta, \theta_i) = \sum_z p(z|y,\theta_i) \log p(y,z|\theta)=E_z[log p(y,z|\theta,\theta^{(i)})]Q(θ,θi)=z∑p(z∣y,θi)logp(y,z∣θ)=Ez[logp(y,z∣θ,θ(i))]
将μj(i+1)\mu_{j}^{(i+1)}μj(i+1)带入则可以得到:
Q(θ,θi)=∑j=1n{μj(i+1)[logπ+yjlogp+(1−yj)log(1−p)]+(1−μj(i+1))[log(1−π)+yjlogq+(1−yj)log(1−q)]}Q(\theta, \theta_i)=\sum_{j=1}^n \{\mu_{j}^{(i+1)} [\log \pi + y_j \log p + (1-y_j) \log (1-p)] + (1-\mu_{j}^{(i+1)}) [\log(1-\pi) + y_j \log q + (1-y_j)\log(1-q)] \}Q(θ,θi)=j=1∑n{μj(i+1)[logπ+yjlogp+(1−yj)log(1−p)]+(1−μj(i+1))[log(1−π)+yjlogq+(1−yj)log(1−q)]}
1.3.2 M步得到了QQQ函数,接下来就是极大化参数:
θ(i+1)=argmaxθQ(θ,θi) \theta^{(i+1)} = \arg \max_\theta Q(\theta, \theta^i) θ(i+1)=argθmaxQ(θ,θi)
1.求解π\piπ:
∂Q(θ,θi)∂π=∑j=1n[μj(i+1)1π−(1−μj(i+1))11−π]\frac{\partial Q(\theta, \theta^i)}{\partial \pi} = \sum_{j=1}^n [\mu_{j}^{(i+1)} \frac{1}{\pi} - (1-\mu_{j}^{(i+1)}) \frac {1}{1-\pi}]∂π∂Q(θ,θi)=j=1∑n[μj(i+1)π1−(1−μj(i+1))1−π1]
求取极值,令等式右边为0:
∑j=1n[μj(i+1)1π−(1−μj(i+1))11−π]=0\sum_{j=1}^n [\mu_{j}^{(i+1)} \frac{1}{\pi} - (1-\mu_{j}^{(i+1)}) \frac {1}{1-\pi}]=0j=1∑n[μj(i+1)π1−(1−μj(i+1))1−π1]=0
左右两边同时乘π(1−π)\pi(1-\pi)π(1−π)得到:
∑j=1n[μj(i+1)(1−π)−(1−μj(i+1))π]=0\sum_{j=1}^n [\mu_{j}^{(i+1)} (1-\pi) - (1-\mu_{j}^{(i+1)}) \pi]=0j=1∑n[μj(i+1)(1−π)−(1−μj(i+1))π]=0
∑j=1n(μj(i+1)−π)=0\sum_{j=1}^n (\mu_{j}^{(i+1)} - \pi)=0j=1∑n(μj(i+1)−π)=0
∑j=1nμj(i+1)−nπ=0\sum_{j=1}^n \mu_{j}^{(i+1)} - n \pi=0j=1∑nμj(i+1)−nπ=0
则:
π(i+1)=1n∑j=1nμj(i+1)\pi^{(i+1)} = \frac {1}{n}\sum_{j=1}^n \mu_{j}^{(i+1)} π(i+1)=n1j=1∑nμj(i+1)
2.接下来求解ppp:
∂Q(θ,θi)∂p=∑j=1nμj(i+1)[yj1p−(1−yj(i+1))11−p]\frac{\partial Q(\theta, \theta^i)}{\partial p} = \sum_{j=1}^n \mu_{j}^{(i+1)} [y_j \frac{1}{p} - (1-y_{j}^{(i+1)}) \frac {1}{1-p}]∂p∂Q(θ,θi)=j=1∑nμj(i+1)[yjp1−(1−yj(i+1))1−p1]
求取极值,令等式右边为0:
∑j=1nμj(i+1)[yj1p−(1−yj(i+1))11−p]=0\sum_{j=1}^n \mu_{j}^{(i+1)} [y_j \frac{1}{p} - (1-y_{j}^{(i+1)}) \frac {1}{1-p}] = 0j=1∑nμj(i+1)[yjp1−(1−yj(i+1))1−p1]=0
左右两边同时乘p(1−p)p(1-p)p(1−p)得到:
∑j=1nμj(i+1)[yj(1−p)−(1−yj(i+1))p]=0\sum_{j=1}^n \mu_{j}^{(i+1)} [y_j (1-p) - (1-y_{j}^{(i+1)}) p] = 0j=1∑nμj(i+1)[yj(1−p)−(1−yj(i+1))p]=0
∑j=1n[μj(i+1)yj−μj(i+1)p]=0\sum_{j=1}^n [\mu_{j}^{(i+1)} y_j - \mu_{j}^{(i+1)} p] = 0j=1∑n[μj(i+1)yj−μj(i+1)p]=0
则:
p(i+1)=∑j=1nμj(i+1)yj∑j=1nμj(i+1)p^{(i+1)} = \frac {\sum_{j=1}^n \mu_{j}^{(i+1)} y_j}{\sum_{j=1}^n \mu_{j}^{(i+1)}}p(i+1)=∑j=1nμj(i+1)∑j=1nμj(i+1)yj
3.最后用同样的方法得到qqq:
q(i+1)=∑j=1n(1−μj(i+1))yj∑j=1n(1−μj(i+1))q^{(i+1)} = \frac {\sum_{j=1}^n (1-\mu_{j}^{(i+1)}) y_j}{\sum_{j=1}^n (1-\mu_{j}^{(i+1)})}q(i+1)=∑j=1n(1−μj(i+1))∑j=1n(1−μj(i+1))yj
1.3.2 参数空间1.模型参数
π\piπ: 硬币A正面的概率,在此模型中是一个float类型的数值
ppp: 硬币B正面的概率,在此模型中是一个float类型的数值
qqq:硬币C正面的概率,在此模型中是一个float类型的数值
2.隐变量
μ\muμ: 最后观测值到底来源于B还是C,是一个一维向量
μ=(μ1,μ2,...,μn)\mu=(\mu_1, \mu_2,...,\mu_n)μ=(μ1,μ2,...,μn),其中μj\mu_jμj代表第jjj次抛硬币B的概率。
1.4 EM算法的收敛性
证明EM算法的收敛,只需要证明p(y∣θ(i))p(y|\theta^{(i)})p(y∣θ(i))是单调递增的即可:
p(y∣θ(i+1))≥p(y∣θ(i))p(y|\theta^{(i+1)}) \ge p(y|\theta^{(i)})p(y∣θ(i+1))≥p(y∣θ(i))
证明:
由于:
p(y∣θ)=p(y,θ)p(θ)p(y,z,θ)p(y,z,θ)=p(y,z∣θ)p(z∣y,θ)p(y|\theta) = \frac {p(y,\theta)}{p(\theta)} \frac {p(y,z,\theta)}{p(y,z,\theta)}= \frac {p(y,z|\theta)}{p(z|y,\theta)}p(y∣θ)=p(θ)p(y,θ)p(y,z,θ)p(y,z,θ)=p(z∣y,θ)p(y,z∣θ)
取对数化简得:
logp(y∣θ(i+1))−logp(y∣θ(i))=[logp(y,z∣θ(i+1))−logp(z∣y,θ(i+1))]−[logp(y,z∣θ(i))−logp(z∣y,θ(i))]=[logp(y,z∣θ(i+1))−logp(y,z∣θ(i))]−[logp(z∣y,θ(i+1))−logp(z∣y,θ(i))]=[∑zp(z∣y,θ(i+1))logp(y,z∣θ(i))−∑zp(z∣y,θi)logp(y,z∣θ(i))]−[∑zp(z∣y,θ(i+1))logp(z∣y,θ(i))−∑zp(z∣y,θ(i))logp(z∣y,θ(i))] \begin{aligned} &\log p(y|\theta^{(i+1)}) - \log p(y|\theta^{(i)}) \\ &= [\log p(y, z|\theta^{(i+1)}) - \log p(z|y,\theta^{(i+1)})] - [\log p(y, z|\theta^{(i)})- \log p(z|y,\theta^{(i)})]\\ &= [\log p(y, z|\theta^{(i+1)}) - \log p(y, z|\theta^{(i)})] - [\log p(z|y,\theta^{(i+1)})- \log p(z|y,\theta^{(i)})]\\ &= [\sum_z p(z|y,\theta^{(i+1)}) \log p(y, z|\theta^{(i)}) - \sum_z p(z|y,\theta^{i})\log p(y, z|\theta^{(i)})] -\\ & [\sum_z p(z|y,\theta^{(i+1)})\log p(z|y,\theta^{(i)})- \sum_z p(z|y,\theta^{(i)}) \log p(z|y,\theta^{(i)})]\\ \end{aligned} logp(y∣θ(i+1))−logp(y∣θ(i))=[logp(y,z∣θ(i+1))−logp(z∣y,θ(i+1))]−[logp(y,z∣θ(i))−logp(z∣y,θ(i))]=[logp(y,z∣θ(i+1))−logp(y,z∣θ(i))]−[logp(z∣y,θ(i+1))−logp(z∣y,θ(i))]=[z∑p(z∣y,θ(i+1))logp(y,z∣θ(i))−z∑p(z∣y,θi)logp(y,z∣θ(i))]−[z∑p(z∣y,θ(i+1))logp(z∣y,θ(i))−z∑p(z∣y,θ(i))logp(z∣y,θ(i))]
前两项有Q(θ(i+1),θ(i))−Q(θ(i),θ(i))≥0Q(\theta^{(i+1)}, \theta^{(i)})- Q(\theta^{(i)}, \theta^{(i)}) \ge 0Q(θ(i+1),θ(i))−Q(θ(i),θ(i))≥0,对后两项进行计算:
∑zp(z∣y,θ(i+1))logp(z∣y,θ(i))−∑zp(z∣y,θ(i))logp(z∣y,θ(i))=∑zlog[p(z∣y,θ(i+1))p(z∣y,θ(i))]p(z∣y,θ(i))≤log∑zp(z∣y,θ(i+1))p(z∣y,θ(i))p(z∣y,θ(i))=log[∑zp(z∣y,θ(i+1))]=0 \begin{aligned} &\sum_z p(z|y,\theta^{(i+1)})\log p(z|y,\theta^{(i)})- \sum_z p(z|y,\theta^{(i)}) \log p(z|y,\theta^{(i)}) \\ &=\sum_z \log [\frac { p(z|y,\theta^{(i+1)}) } { p(z|y,\theta^{(i)})}] p(z|y,\theta^{(i)}) \\ & \le \log \sum_z \frac { p(z|y,\theta^{(i+1)}) } { p(z|y,\theta^{(i)})} p(z|y,\theta^{(i)}) \\ & = \log [\sum_z p(z|y,\theta^{(i+1)}) ] \\ =0 \end{aligned} =0z∑p(z∣y,θ(i+1))logp(z∣y,θ(i))−z∑p(z∣y,θ(i))logp(z∣y,θ(i))=z∑log[p(z∣y,θ(i))p(z∣y,θ(i+1))]p(z∣y,θ(i))≤logz∑p(z∣y,θ(i))p(z∣y,θ(i+1))p(z∣y,θ(i))=log[z∑p(z∣y,θ(i+1))]
也即后面两项小于等于0,所以logp(y∣θ(i+1))−logp(y∣θ(i))≥0\log p(y|\theta^{(i+1)}) - \log p(y|\theta^{(i)}) \ge 0logp(y∣θ(i+1))−logp(y∣θ(i))≥0
得证。
2 三银币模型的Python实现
2.1 模型实现
import numpy as np np.random.seed(0) class ThreeCoinsMode(object): def __init__(self, n_epoch=5): """ 运用EM算法求解三银币模型 :param n_epoch: 迭代次数 20000 """ self.n_epoch = n_epoch self.params = {'pi': None, 'p': None, 'q': None, 'mu': None} def __init_params(self, n): """ 对参数初始化操作 :param n: 观测样本个数 :return: """ self.params = {'pi': np.random.rand(1), 'p': np.random.rand(1), 'q': np.random.rand(1), 'mu': np.random.rand(n)} # self.params = {'pi': [0.5], # 'p': [0.5], # 'q': [0.5], # 'mu': np.random.rand(n)} def E_step(self, y, n): """ E步:跟新隐变量mu :param y: 观测样本 :param n: 观测样本个数 :return: """ pi = self.params['pi'][0] p = self.params['p'][0] q = self.params['q'][0] for i in range(n): self.params['mu'][i] = (pi * pow(p, y[i]) * pow(1-p, 1-y[i])) / (pi * pow(p, y[i]) * pow(1-p, 1-y[i]) + (1-pi) * pow(q, y[i]) * pow(1-q, 1-y[i])) def M_step(self, y, n): """ M步:跟新模型参数 :param y: 观测样本 :param n: 观测样本个数 :return: """ mu = self.params['mu'] self.params['pi'][0] = sum(mu) / n self.params['p'][0] = sum([mu[i] * y[i] for i in range(n)]) / sum(mu) self.params['q'][0] = sum([(1-mu[i]) * y[i] for i in range(n)]) / sum([1-mu_i for mu_i in mu]) def fit(self, y): """ 模型入口 :param y: 观测样本 :return: """ n = len(y) self.__init_params(n) print(0, self.params['pi'], self.params['p'], self.params['q']) for i in range(self.n_epoch): self.E_step(y, n) self.M_step(y, n) print(i+1, self.params['pi'], self.params['p'], self.params['q'])
2.2 模型测试结果
def run_three_coins_model(): y = [1, 1, 0, 1, 0, 0, 1, 0, 1, 1] tcm = ThreeCoinsMode() tcm.fit(y)
运行结果:
0 [0.5488135] [0.71518937] [0.60276338]
1 [0.54076424] [0.65541668] [0.53474516]
2 [0.54076424] [0.65541668] [0.53474516]
3 [0.54076424] [0.65541668] [0.53474516]
4 [0.54076424] [0.65541668] [0.53474516]
5 [0.54076424] [0.65541668] [0.53474516]
参考文献:
《统计学习方法》 李航著
- LDA主题模型原理解析与python实现
- 机器学习(4):BP神经网络原理及其python实现
- 机器学习决策树的Python实现详细流程及原理解读_1
- 机器学习笔记——2 简单线性模型及局部加权线性模型的基本原理和python实现(参数估计的两个基本角度:几何直观和概率直观。函数最值问题的两大基本算法:梯度方法与迭代方法)
- [(机器学习)概率统计]极大似然估计MLE原理+python实现
- 机器学习几种模块的原理及公式推导——EM算法
- 【机器学习】算法原理详细推导与实现(二):逻辑回归
- 深度学习(神经网络) —— BP神经网络原理推导及python实现
- 机器学习--- 一元线性回归数学推导以及Python实现
- 机器学习---Logistic回归数学推导以及python实现
- 小白学习机器学习---第五章:神经网络简单模型python实现
- 机器学习笔记——4 广义线性模型的基本思想和各个常用的回归特例(附logistic模型的python实现)
- 机器学习经典算法详解及Python实现--CART分类决策树、回归树和模型树
- 机器学习经典算法详解及Python实现--CART分类决策树、回归树和模型树
- 最大熵模型与EM算法及python实现
- 机器学习—最大熵模型_改进迭代尺度法IIS_python实现
- 机器学习决策树的Python实现详细流程及原理解读_2
- 机器学习:K-近邻算法原理与Python代码实现
- EM算法初探——公式推导和三硬币模型解析
- Python机器学习-感知机原理及代码实现