您的位置:首页 > 其它

Andrew Ng machine learning 课程笔记--生成学习算法

2019-02-27 23:13 211 查看

Logistic regression:in fact, you can see there is the data set with logistic regression,and so I 've initialized the parameters randomly,and so logistic regression is,kind of ,the outputting,it's the ,kind of,hypothesis that iteration zero is that  straight line shown in the bottom right.And so after one iteration under gradient ascent,the straight line changes a bit.After two iterations ,three,four,until logistic regression converges and had found the straight line that,more of less,separates the positive and negative class.

Discriminative learning algorithms:learn PFY given X directly,or even learn a hypothesis that outputs value 0,1 directly.

Generative learning algorithms:models PFX given Y,the probability of the features given the calss label,and as a technical detail,it also models PFY.a generative model builds a probabilistic model for what the features looks like,conditionedd on the xlass label.

Gaussian discriminant analysis:I'm going to assume that your input feature is X,and RN and are continous values.I guess,core assumption is that PFX given Y is Gaussian.

So given the training sets and using the Gaussian discriminant analysis model to fit the parameters of the model.we'll do maximaize likelihood estimation as usual.this just means look for your training set,find all the examples for which Y=0,and take the average of the value of X for all your examples which Y =0.So take all your negative fitting examples and average the values for X and that's mew0.when you are a new value of X,which is the maximum of a Y by Bayes rule.You can repeat this exercise for a bunch of points.Computer PFY equals one given X for a bunch of points,and if you connect up these points,you find that the curve you get plotted takes a form of sigmoid function.But it thrns out the key difference is that Gaussian discriminant analysis will end up choosing a different position and a steepness of the sigmoid than would logistic regression.if you assume X given Y is Gaussian,and that implies that the posterior distribution of the form of PFY=1 given X is going to be a logistic function,and it turns out in the opposite direction does not hold true.that X given Y=1 is Poisson with parameter lambda 1,and X given Y =0,is poisson with parameter lambda 0.it turns out if you assume this then that also implies that PFY given X is logistic.

 

Gaussian:a random variable Z is distributed Gaussian,multivariate Gaussian as and the script N for normal with parameters mean U and covariance sigma squared.If Z has a density I over 2 pi,sigma 2 .That's the formal for the dansity as a generalization of the one dimension of Gaussian and no more the familiar bell-shape curve.It's a high dimension vector value random variable Z.this vector mew is the mean of the Gaussian and this matrix sigma is the covariance matrix.

John likelihood:So back where we're fitting logistic regression models or generalized learning models,we are always modeling PFYI given XI and parameterized by a theta,and that was the conditional likelihood in which we're modeling PFYI given XI,whereas regenerative learning algorithms,we are going to look at the joint likelihood which is PFXI,YI.

Generative learning algorithms versus discriminative algorithms:generative learning algorithms requires less data and that data isnot exectly Gaussian.but its usually do surprisingly well even when these modeling assumptions are not met,but ane other tradeoff is that by making stronger assumptions about the data,Gaussian discriminant analysis often needs less data in order to fit an okey model even if there is less training data.

Naïve Bayes:we are going to make a very strong assumption on PFX given Y, in particular,the XI's are conditionally independent given Y.

The Laplace smoothing:take each of these terms ,the number of ones and sort of,the number of zeros and add one to that,instead of estimating the probability of .we will add one to all of these counts,and so we say that the chance of their winning .It turns out,under a certai set of assumptions,under a certain set of Bayesian assumptions about the prior and posterior, this Laplace smoothing actually we will get the optimal estimate.

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐