您的位置:首页 > 其它

机器学习----xgboost学习笔记

2017-07-13 19:56 483 查看
1、利用xgboost做特征组合

1)XGBModel.apply(self, X, ntree_limit=0)

return the predicted leaf every tree for each sample

X: 训练集特征,features matrix

ntree_limit: 预测时数的个数, Limit number of trees in the prediction; defaults to 0 (use all trees)。

def apply(self, X, ntree_limit=0):
"""Return the predicted leaf every tree for each sample.

Parameters
----------
X : array_like, shape=[n_samples, n_features]
Input features matrix.

ntree_limit : int
Limit number of trees in the prediction; defaults to 0 (use all trees).

Returns
-------
X_leaves : array_like, shape=[n_samples, n_trees]
For each datapoint x in X and for each tree, return the index of the
leaf x ends up in. Leaves are numbered within
``[0; 2**(self.max_depth+1))``, possibly with gaps in the numbering.
"""
test_dmatrix = DMatrix(X, missing=self.missing)
return self.get_booster().predict(test_dmatrix,
pred_leaf=True,
ntree_limit=ntree_limit)


GBDT与GBDT+LR区别

我理解如下:

GBDT: 拟合上次预测后与实际结果的残差(即拟合[(y-y1^)-y^2]).

GBDT+LR: 即将GBDT每课树预测的结果,通过线性再次组合,自动学习每次的权重。

菜鸟学习中,笔记方便自己后期学习理解,边学习边修改中,如有不正确的地方,烦请指正。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: