您的位置:首页 > 其它

sklearn.preprocessing.LabelBinarizer

2017-07-13 16:17 330 查看
标签二值化:sklearn.preprocessing.LabelBinarizer(neg_label=0, pos_label=1,sparse_output=False)主要是将多类标签转化为二值标签,最终返回的是一个二值数组或稀疏矩阵

参数说明:

neg_label:输出消极标签值

pos_label:输出积极标签值

sparse_output:设置True时,以行压缩格式稀疏矩阵返回,否则返回数组

classes_属性:类标签的取值组成数组

①设置neg_label=2、pos_label=4,只能返回二值数组,理解neg_label、pos_label两标签值的含义

In [14]: from sklearn import preprocessing
...: lb = preprocessing.LabelBinarizer(neg_label=2,pos_label=4)
...: lb.fit([1, 2, 6, 4, 2])
...:
Out[14]: LabelBinarizer(neg_label=2, pos_label=4, sparse_output=False)通过classes_属性获取训练集类标签值
In [15]: lb.classes_
Out[15]: array([1, 2, 4, 6])通过transform方法转换成二值数组
In [16]: lb.transform([1, 2, 6, 4, 2])
Out[16]:
array([[4, 2, 2, 2],
[2, 4, 2, 2],
[2, 2, 2, 4],
[2, 2, 4, 2],
[2, 4, 2, 2]])上述结果:1转换成[4, 2, 2, 2],由于标签有4个值,因此用4位表示,生成[4, 2, 2, 2]的顺序,是按照数据与classes_属性获取的数组顺序匹配的,出现的位置为pos_label值,其他位置为neg_label值,通过下面测试集再验证该逻辑
In [17]: lb.transform([1, 6,3,5])
Out[17]:
array([[4, 2, 2, 2],
[2, 2, 2, 4],
[2, 2, 2, 2],
[2, 2, 2, 2]])neg_label值为非0数字时,不能设置sparse_output=True以稀疏矩阵输出,只能以数组形式输出
In [18]: from sklearn import preprocessing
...: lb = preprocessing.LabelBinarizer(neg_label=1,pos_label=2,sparse_outpu
...: t=True)
...: lb.fit([1, 2, 6, 4, 2])
...: lb.transform([1, 6])
...:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-18-84f620fa6b11> in <module>()
1 from sklearn import preprocessing
----> 2 lb = preprocessing.LabelBinarizer(neg_label=1,pos_label=2,sparse_output=
True)
3 lb.fit([1, 2, 6, 4, 2])
4 lb.transform([1, 6])

d:\softwore\python\lib\site-packages\sklearn\preprocessing\label.py in __init__(
self, neg_label, pos_label, sparse_output)
275 "zero pos_label and zero neg_label, got "
276 "pos_label={0} and neg_label={1}"
--> 277 "".format(pos_label, neg_label))
278
279 self.neg_label = neg_label

ValueError: Sparse binarization is only supported with non zero pos_label and ze
ro neg_label, got pos_label=2 and neg_label=1

In [19]: from sklearn import preprocessing
...: lb = preprocessing.LabelBinarizer(neg_label=0,pos_label=2,sparse_outpu
...: t=True)
...: lb.fit([1, 2, 6, 4, 2])
...: lb.transform([1, 6])
...:
Out[19]:
<2x4 sparse matrix of type '<class 'numpy.int32'>'
with 2 stored elements in Compressed Sparse Row format>

In [20]: lb.transform([1, 6]).toarray()
Out[20]:
array([[2, 0, 0, 0],
[0, 0, 0, 2]])对于二值标签训练,将返回单列数组
In [21]: from sklearn import preprocessing
...: lb = preprocessing.LabelBinarizer(neg_label=2,pos_label=4)
...: lb.fit_transform(['yes', 'no', 'no', 'yes'])
...:
Out[21]:
array([[4],
[2],
[2],
[4]])对于多标签分类
In [22]: from sklearn import preprocessing
...: import numpy as np
...: lb = preprocessing.LabelBinarizer(neg_label=2,pos_label=4)
...: lb.fit(np.array([[0, 1, 0, 1, 0],[1,1,0,0,1]]))
...: lb.classes_
...:
Out[22]: array([0, 1, 2, 3, 4])

In [23]: lb.transform([1, 6])
Out[23]:
array([[2, 4, 2, 2, 2],
[2, 2, 2, 2, 2]])

In [24]: lb.y_type_
Out[24]: 'multilabel-indicator'
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息