您的位置:首页 > 其它

CalTech machine learning, video 9 note(the Linear Model II)

2014-09-27 12:02 423 查看
8:39 2014-09-27 

start CalTech machine learning, 

video 9, the Linear Model II

8:40 2014-09-27

Bias-Variance decomposition of the 

out-of-sample error

8:41 2014-09-27

* linear classification

* linear regression

* logistic regression

8:54 2014-09-27

the tradeoff between approximation & generalization

8:55 2014-09-27

the ability to generalization of linear classification

8:55 2014-09-27

nonlinear transformation

8:57 2014-09-27

feature space

8:57 2014-09-27

linear surface => quadratic surface

8:59 2014-09-27

almost separable: 

this guy is errouneously classified.

9:06 2014-09-27

the lesson learned from this is that:

if you look at the data before choosing the model,

can be hazardous to your (Eout) health, not your health

but the generalization health.

9:15 2014-09-27

if you look at the data, we said that you did learning

9:17 2014-09-27

VC dimension of the hypotheses set

9:17 2014-09-27

this is the manifestation of the biggest 

trap that practioners fall into.

9:18 2014-09-27

when you go into machine learning, learning

from the data, choosing the model is very tricky

9:19 2014-09-27

it's very tempting, let me just look at the data,

and pick something suitable

9:20 2014-09-27

it's not against the law, you can do it,

but just charge accordingly.

9:20 2014-09-27

if you look at the data before choosing

your model, you have already forfeit the warrantry

that is given by the VC inequality.

9:22 2014-09-27

this is the manifestation of basically snooping,

you snoop into the data in a way that is not allowed.

9:22 2014-09-27

data snooping

9:22 2014-09-27

when you do this, bad things happen.

9:23 2014-09-27

validation, model selection

9:24 2014-09-27

it will be a legitimate way of select a model,

it's a model selection that does not contaminat 

the data,

9:25 2014-09-27

it's no longer trusted to reflect the real performance

because you already used in learning

9:26 2014-09-27

linear model is a economy car, nonlinear model 

gives you a truck,

9:28 2014-09-27

logistic regression

9:28 2014-09-27

the model: what is the hypothese set

9:28 2014-09-27

soft threshold

9:36 2014-09-27

there is a proability sitting there generating

examples.

9:37 2014-09-27

credit score, risk score

9:41 2014-09-27

this is supervised learning, I have to give you tags.

9:44 2014-09-27

error measure based on likelihood

9:51 2014-09-27

the data is generated by this target function

9:52 2014-09-27

if that probability is very small, then your 

assumption must be poor.

9:52 2014-09-27

and if that probability is high, then your assumption

has more plausibility.

9:52 2014-09-27

so I can use this to build comparative way to say

that this is more plausible 

9:53 2014-09-27

what is the probability of generating this data

if your assumption is true?

// result => causal ???

9:54 2014-09-27

what is the most probable hypothesis given the data?

what is the probability of the data given the hypothesis?

9:57 2014-09-27

prior

9:57 2014-09-27

if I choose a hypothesis under which having the 

data is very plausible, it look like this hypothesis

is very likely, hence the likelihood name

9:59 2014-09-27

what is the likelihood of this whole data set?

10:06 2014-09-27

maximizing the likelihood => minimizing the error measure

10:08 2014-09-27

we're maximizing the likelihood of this hypothesis

under this data set.

10:12 2014-09-27

cross-entropy error

10:19 2014-09-27

learning algorithm

10:19 2014-09-27

How to minimize Ein?

10:20 2014-09-27

linear regression => logistic regression

10:20 2014-09-27

iterative solution, closed-form solution

10:21 2014-09-27

iterative method: gradient descent

10:22 2014-09-27

convex optimization

10:24 2014-09-27

you're sitting on the surface, then you close 

your eyes, and all you do is feel around you,

and then dicide that this is a more promising direction

than this, that's all you do in one step.

10:28 2014-09-27

when you go the new point, repeat, repeat,...

10:29 2014-09-27

until you get to the minimum.

10:29 2014-09-27

that' all the iterative method you're going to use.

10:29 2014-09-27

fixed-step size

10:30 2014-09-27

Iterative method: gradient descent

General method for nonlinear optimization,

start at w(0); take a step along steepest slope

fixed step size.

10:30 2014-09-27

under this situation, you're going to derive

what is v hat?

10:34 2014-09-27

gradient descent

10:34 2014-09-27

how do I choose the direction in order to 

make this as negative as possible?

10:38 2014-09-27

Fixed-size step?

10:44 2014-09-27

logistic regression algorithm

// using gradient descent

10:50 2014-09-27

summary of linear model:

* perceptron // linear classification

* linear regression

* logistic regression

10:52 2014-09-27

Apply to credit analysis

* perceptron => Approve or Deny
 => binary classification error

(PLA, Pocket)

* linear regression     => Amount of Credit
 => squared error 

(Pseudo-inverse)

* logistic regression   => Probability of Default => cross-entropy error

(Gradient descent)

10:53 2014-09-27

I will stop here, and then we'll start after a short break.

10:57 2014-09-27

let's start the Q & A

10:57 2014-09-27

there is the question of "learning rate"

10:58 2014-09-27

there're other questions of "initialization"

10:58 2014-09-27

so let's set up a target error, so if I don't

got to the target error, I won't stop.

11:00 2014-09-27

local minimum, global minimum

11:01 2014-09-27

termination is tricky, a combination of criteria 

is the best way.

11:05 2014-09-27

in many situations you just doing a gradient descent

in a simple way & get a very good result.

11:08 2014-09-27

you're applying the algorithm faithfully, and ...

11:09 2014-09-27

from a practical point of view, starting from different

initialization point, so each of them will go to it's 

local minimum.

11:10 2014-09-27

ordinarilly it will give you a good local minimum, 

but getting a global minimum is NP hard.

11:12 2014-09-27

for entropy, you get a function based on the probability.

11:15 2014-09-27

because you will be charged for that.

11:36 2014-09-27

because I use the CPU cycles but does not improve much

11:36 2014-09-27

Neural Networks & hidden layers
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: