您的位置:首页 > Web前端

如何标准化特征向量HOW TO NORMALISE FEATURE VECTORS

2016-04-12 13:22 225 查看


HOW TO NORMALISE FEATURE VECTORS

I was trying to create a sample file for training a neural network and ran into a common problem: the feature values are all over the place. In this example I’m working with demographical real-world values for countries. For example, a feature for GDP per person
in a country ranges from 551.27 to 88286.0, whereas estimates for corruption range between -1.56 to 2.42. This can be very confusing for machine learning algorithms, as they can end up treating bigger values as more important signals.

To handle this issue, we want to scale all the feature values into roughly the same range. We can do this by taking each feature value, subtracting its mean (thereby shifting the mean to 0), and dividing by the standard deviation (normalising the distribution).
This is a piece of code I’ve implemented a number of times for various projects, so it’s time to write a nice reusable script. Hopefully it can be helpful for others as well. I chose to do this in python, as it’s easies to run compared to C++ and Java (doesn’t
need to be compiled), but has better support for real-valued numbers compared to bash scripting.

Each line in the input file is assumed to be a feature vector, with values separated by whitespace. The first element is an integer class label that will be left untouched. This is followed by a number of floating point feature values which will be normalised.
For example:

We’re assuming dense vectors, meaning that each line has an equal number of features.

To execute it, simply use

The complete script that will normalise feature vectors is here:

from: http://www.marekrei.com/blog/normalise-feature-vectors/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: