Softmax on Digits Data with TensorFlow
2017-06-05 14:48
531 查看
In this tutorial, we will basically follow the official tutorial but will change some parts to make it easier to understand. The contents about logistic regression borrows from Arindam Banerjee’s Machine Learning course at University of Minnesota, which corresponds to Ethem Alpaydin’s book Introduction to machine learning.
The corresponding executable python code of this tutorial can be found here.
TensorFlow provides a very creative frame for machine learning programming. We will first build a structure (graph) for the algorithm and then feed the data and run the session. In the graph, the operations are the vertices and data flows on the edges.
g(x)=wTx+w0
This somehow assumes a linear relation from the features to the probability of each class. In comparison, the logistic regression assumes the log of the ration between two classes has a linear form, which is
log(P(1|x)P(0|x))=wTx
A direct calculation gives:
P(1|x)=exp(wTx)1+exp(wTx)=σ(wTx)
P(0|x)=11+exp(wTx)=1−σ(wTx)
Here σ is the so called logistic function.
In the training phase, the following “cross-entropy” cost function is to be minimized,
E(w,w0|)=−∑trtlogyt+(1−rt)log(1−yt)
here r is the label and y is the predict value of the model. Here we can raise a question that why some other cost functions are not utilized, we keep this as another topic in future blogs.
Now we come to the problem that there are more than two classes. For each class, we formulate the posterior probability as
y=P̂ (Ci|x)=exp[wTi]∑Kj=1exp[wTj]
Now the cost function to be minimized becomes:
E(wi,w0i|)=−∑t∑irtilogyti
The above formula is the so-called softmax.
The 64 features are the 8*8 pixels of each handwritten graph.
In the following, the data are partitioned as training data and test data where 1/10 are kept as the test data. Since we do not need to select a data among a bunch of candidate model, we do not need a validation set here.
In the official tutorial, the function
In the program, the input data and labels are declared as
The structure for the training is completed with on sentence:
Note that at this point, the program is not yet executed, we only build the frame, a session is need to run the code.
We also need to initialize all the variables:
The real training process is also completed in two lines of code:
Currently it’s not very clear to me why we need to repeat the
Finally, the algorithm’s performance can be tested as follows:
From the above discussion we can somehow feel the great convenience brought by TensorFlow. We will discuss more topics in the future posts.
For any unclear parts of this tutorial, please refer to the official tutorial.
The corresponding executable python code of this tutorial can be found here.
TensorFlow provides a very creative frame for machine learning programming. We will first build a structure (graph) for the algorithm and then feed the data and run the session. In the graph, the operations are the vertices and data flows on the edges.
Softmax
In short, Softmax is the multi-class logistic discrimination. For a linear discrimination, the discriminant function is defined as:g(x)=wTx+w0
This somehow assumes a linear relation from the features to the probability of each class. In comparison, the logistic regression assumes the log of the ration between two classes has a linear form, which is
log(P(1|x)P(0|x))=wTx
A direct calculation gives:
P(1|x)=exp(wTx)1+exp(wTx)=σ(wTx)
P(0|x)=11+exp(wTx)=1−σ(wTx)
Here σ is the so called logistic function.
In the training phase, the following “cross-entropy” cost function is to be minimized,
E(w,w0|)=−∑trtlogyt+(1−rt)log(1−yt)
here r is the label and y is the predict value of the model. Here we can raise a question that why some other cost functions are not utilized, we keep this as another topic in future blogs.
Now we come to the problem that there are more than two classes. For each class, we formulate the posterior probability as
y=P̂ (Ci|x)=exp[wTi]∑Kj=1exp[wTj]
Now the cost function to be minimized becomes:
E(wi,w0i|)=−∑t∑irtilogyti
The above formula is the so-called softmax.
Digits Dataset
The dataset used here is bit different from that in the official tutorial. This dataset can be loaded as follows:# Load data and import numpy libraries from sklearn.datasets import load_digits digits = load_digits()
The 64 features are the 8*8 pixels of each handwritten graph.
TensorFlow
In this section we will go through the code and mainly focus on the modified parts.In the following, the data are partitioned as training data and test data where 1/10 are kept as the test data. Since we do not need to select a data among a bunch of candidate model, we do not need a validation set here.
# Divide into train data and test data train_num = np.int(total_num * 0.9) X_train = X[0:train_num,:] y_train = y[0:train_num,:] X_test = X[train_num:total_num,:] y_test = y[train_num:total_num,:]
In the official tutorial, the function
mnist.train.next_batch(100)is used to randomly fetch 100 data points to do the training. Since we are using as different dataset, an auxiliary function as follows is used to shuffle and fetch the data.
# The shuffle_data function shuffle the data randomly # and return n tuples (X,y) back def shuffle_data(X,y,n): import numpy as np X = np.array(X) y = np.array(y) rows,columns = X.shape if rows < n: print("ERROR: There is not enough rows in X.") rndInd = np.random.permutation(rows) return X[rndInd[0:n],:], y[rndInd[0:n],:]
In the program, the input data and labels are declared as
placeholder, which as its name indicates, maintain a place for the future data come to train and predict. The parameter
Wand
bare declared as
variables, which can be modified in the training phase and used in the predicting phase. This philosophy can be extended to other machine learning algorithm: we build the pipes and wait for the data. Note that here
bin the code corresponds to w0 in the mathematical derivations.
The structure for the training is completed with on sentence:
cross_entropy = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(labels = y_, logits=y)) train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
Note that at this point, the program is not yet executed, we only build the frame, a session is need to run the code.
# Create session sess = tf.InteractiveSession()
We also need to initialize all the variables:
# Initialize variables tf.global_variables_initializer().run()
The real training process is also completed in two lines of code:
# Train model for _ in range(1000): batch_xs, batch_ys = shuffle_data(X_train,y_train,100) sess.run(train_step, feed_dict={x:batch_xs, y_:batch_ys})
Currently it’s not very clear to me why we need to repeat the
train_stepfor thousands of times. It seems that TensorFlow does not capsulate the optimization computation, which is confusing to me.
Finally, the algorithm’s performance can be tested as follows:
# Test trained model correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32)) print(sess.run(accuracy,feed_dict={x: X_test,y_:y_test}))
From the above discussion we can somehow feel the great convenience brought by TensorFlow. We will discuss more topics in the future posts.
For any unclear parts of this tutorial, please refer to the official tutorial.
相关文章推荐
- 解决 tensorflow softmax_cross_entropy_with_logits() 报错 Only call `softmax_cross_entropy_with_logits`
- TensorFlow Machine Learning with Financial Data on Google Cloud Platform
- CS224n (Spring 2017) assignment 2-----1. Tensorflow Softmax
- OReilly.Hands-On.Machine.Learning.with.Scikit-Learn.and.TensorFlow.翻译以及读书心得--p33-p40
- chapter2 of OReilly.Hands-On.Machine.Learning.with.Scikit-Learn.and.TensorFlow
- Kaggle入门 (Titanic TensorFlow Softmax)
- 《Hands-on Machine Learning with Scikit-Learn and TensorFlow》 读书笔记
- 集成算法(chapter 7 - Hands on machine learning with scikit learn and tensorflow)
- 《Hands-on Machine Learning with Scikit-Learn and TensorFlow》 读书笔记
- Notes on tensorflow(八)read tfrecords with slim
- Getting started with TensorFlow on iOS
- hands on machine learning with sklearn and tensorflow 附录B-扩充整理 2关于获取数据
- hands on machine learning with sklearn and tensorflow 附录B 翻译与整理(1)概要
- TensorFlow Softmax
- tensorflow softmax输出只有0或1
- OReilly.Hands-On.Machine.Learning.with.Scikit-Learn.and.TensorFlow.翻译以及读书心得--p41-53
- Setup Tensorflow with GPU on Mac OSX 10.11
- 170620 How to install tensorflow with GPU on linux
- 深度学习基础 - MNIST实验(tensorflow+Softmax)
- TensorFlow softmax VS sparse softmax