scikit-learn:在实际项目中用到过的知识点(总结)
2015-07-27 08:34
302 查看
零、所有项目通用的:
/article/1323252.html(数据集格式和预测器)
/article/1476350.html href="/article/1476350.html" target=_blank>加载自己的原始数据)
(适合文本分类问题的 整个语料库加载)
/article/1323246.html 加载内置公用的数据)
(常见的很多公共数据集的加载,5.
Dataset loading utilities)
/article/1323254.html href="/article/1323254.html" target=_blank>Choosing the right estimator(你的问题适合什么estimator来建模呢))
(一张图告诉你,你的问题选什么estimator好,再也不用试了)
/article/1323248.html href="/article/1323248.html" target=_blank>训练分类器、预测新数据、评价分类器)
/article/1476348.html href="/article/1476348.html" target=_blank>使用“Pipeline”统一vectorizer => transformer => classifier、网格搜索调参)
一、[b]文本分类用到的:[/b]
/article/1323249.html href="/article/1323249.html" target=_blank>从文本文件中提取特征(tf、idf))
(CountVectorizer、TfidfTransformer)
/article/1323247.html href="/article/1323247.html" target=_blank>CountVectorizer提取tf都做了什么)
(深入解读CountVectorizer都做了哪些处理,指导我们做个性化预处理)
/article/1476346.html 通过TruncatedSVD实现LSA(隐含语义分析))
(LSA、LDA分析)
(非scikit-learn)/article/1476344.html(《textanalytics》课程简单总结(1):两种word relations——Paradigmatic vs. Syntagmatic)
(非scikit-learn)/article/1476343.html(《textanalytics》课程简单总结(1):两种word relations——Paradigmatic vs. Syntagmatic(续))
(词粒度关系:Paradigmatic(聚合关系:同性质可相互替代、用基于tfidf的相似度挖掘) vs. Syntagmatic(组合关系:协同出现、用互信息挖掘))
(非scikit-learn)/article/1476351.html(特征选择方法(TF-IDF、CHI和IG))
(介绍了TF-IDF在特征选择时的误区、CHI Square和Information Gain在特征选择时的应用)
二、数据预处理用到的(4.
Dataset transformations):
/article/1323244.html href="/article/1323244.html" target=_blank>4.1. Pipeline and FeatureUnion: combining estimators(特征与预测器结合;特征与特征结合))
(特征与预测器结合、特征与特征结合)
/article/1323243.html href="/article/1323243.html" target=_blank>4.2. Feature extraction(特征提取,不是特征选择))
(loading features form dicts、feature hashing、text feature extraction、image feature extraction)
/article/1323242.html href="/article/1323242.html" target=_blank>4.2.3. Text feature extraction)
(text feature extraction)
/article/1323241.html href="/article/1323241.html" target=_blank>4.3. Preprocessing data(standardi/normali/binari..zation、encoding、missing value))
(Standardization, or mean removal and variance scaling(标准化:去均值、除方差)、Normalization(正规化)、Feature Binarization(二值化)、Encoding
categorical features(编码类别特征)、imputation of missing values(归责缺失值))
/article/1323240.html href="/article/1323240.html" target=_blank>4.4. Unsupervised dimensionality reduction(降维))
(PCA、Random projections、Feature agglomeration(特征集聚))
/article/1323236.html href="/article/1323236.html" target=_blank>4.8. Transforming the prediction target (y))
(Label binarization、Lable encoding(transform non-numerical labels to numerical labels))
三、其他重要知识点:
/article/1323232.html href="/article/1323232.html" target=_blank>3.1. Cross-validation: evaluating estimator performance)
(交叉验证)
/article/1323231.html href="/article/1323231.html" target=_blank>3.2. Grid Search: Searching for estimator parameters)
(搜索最佳参数组合)
/article/1323230.html href="/article/1323230.html" target=_blank>3.3. Model evaluation: quantifying the quality of predictions)
(模型效果评估:score函数、confusion matrix、classification report等)
/article/1323229.html href="/article/1323229.html" target=_blank>3.4. Model persistence)
(保存训练好的模型到本地:joblib.dump & joblib.load pickle .dump & pickle .load)
None、常用的监督非监督模型:
/article/1476347.html 矩阵因子分解问题)
/article/1323226.html href="/article/1323226.html" target=_blank>scikit-learn(工程中用的相对较多的模型介绍):1.4. Support Vector Machines)
SVM(SVC、SVR)
/article/1323225.html href="/article/1323225.html" target=_blank>scikit-learn(工程中用的相对较多的模型介绍):1.11. Ensemble methods)
Bagging meta-estimator、Forests of ranomized trees、AdaBoost、Gradient Tree Boosting(Gradient Boosted Regression Trees (GBRT) )
/article/1323224.html href="/article/1323224.html" target=_blank>scikit-learn(工程中用的相对较多的模型介绍):1.12. Multiclass
and multilabel algorithms)
Multiclass classification、Multilabel classification、Multioutput-multiclass classification and multi-task classification
/article/1323223.html href="/article/1323223.html" target=_blank>scikit-learn(工程中用的相对较多的模型介绍):1.13. Feature selection)
Univariate feature selection(单变量特征选择)、recursive feature elimination(递归特征消除)、L1-based / ree-based features selection(这个也用的比价多)、Feature selection as part of a pipeline
/article/1323222.html href="/article/1323222.html" target=_blank>
)
/article/1323220.html href="/article/1323220.html" target=_blank>scikit-learn(工程中用的相对较多的模型介绍):2.3. Clustering(可用于特征的无监督降维))
/article/1323252.html(数据集格式和预测器)
/article/1476350.html href="/article/1476350.html" target=_blank>加载自己的原始数据)
(适合文本分类问题的 整个语料库加载)
/article/1323246.html 加载内置公用的数据)
(常见的很多公共数据集的加载,5.
Dataset loading utilities)
/article/1323254.html href="/article/1323254.html" target=_blank>Choosing the right estimator(你的问题适合什么estimator来建模呢))
(一张图告诉你,你的问题选什么estimator好,再也不用试了)
/article/1323248.html href="/article/1323248.html" target=_blank>训练分类器、预测新数据、评价分类器)
/article/1476348.html href="/article/1476348.html" target=_blank>使用“Pipeline”统一vectorizer => transformer => classifier、网格搜索调参)
一、[b]文本分类用到的:[/b]
/article/1323249.html href="/article/1323249.html" target=_blank>从文本文件中提取特征(tf、idf))
(CountVectorizer、TfidfTransformer)
/article/1323247.html href="/article/1323247.html" target=_blank>CountVectorizer提取tf都做了什么)
(深入解读CountVectorizer都做了哪些处理,指导我们做个性化预处理)
/article/1476346.html 通过TruncatedSVD实现LSA(隐含语义分析))
(LSA、LDA分析)
(非scikit-learn)/article/1476344.html(《textanalytics》课程简单总结(1):两种word relations——Paradigmatic vs. Syntagmatic)
(非scikit-learn)/article/1476343.html(《textanalytics》课程简单总结(1):两种word relations——Paradigmatic vs. Syntagmatic(续))
(词粒度关系:Paradigmatic(聚合关系:同性质可相互替代、用基于tfidf的相似度挖掘) vs. Syntagmatic(组合关系:协同出现、用互信息挖掘))
(非scikit-learn)/article/1476351.html(特征选择方法(TF-IDF、CHI和IG))
(介绍了TF-IDF在特征选择时的误区、CHI Square和Information Gain在特征选择时的应用)
二、数据预处理用到的(4.
Dataset transformations):
/article/1323244.html href="/article/1323244.html" target=_blank>4.1. Pipeline and FeatureUnion: combining estimators(特征与预测器结合;特征与特征结合))
(特征与预测器结合、特征与特征结合)
/article/1323243.html href="/article/1323243.html" target=_blank>4.2. Feature extraction(特征提取,不是特征选择))
(loading features form dicts、feature hashing、text feature extraction、image feature extraction)
/article/1323242.html href="/article/1323242.html" target=_blank>4.2.3. Text feature extraction)
(text feature extraction)
/article/1323241.html href="/article/1323241.html" target=_blank>4.3. Preprocessing data(standardi/normali/binari..zation、encoding、missing value))
(Standardization, or mean removal and variance scaling(标准化:去均值、除方差)、Normalization(正规化)、Feature Binarization(二值化)、Encoding
categorical features(编码类别特征)、imputation of missing values(归责缺失值))
/article/1323240.html href="/article/1323240.html" target=_blank>4.4. Unsupervised dimensionality reduction(降维))
(PCA、Random projections、Feature agglomeration(特征集聚))
/article/1323236.html href="/article/1323236.html" target=_blank>4.8. Transforming the prediction target (y))
(Label binarization、Lable encoding(transform non-numerical labels to numerical labels))
三、其他重要知识点:
/article/1323232.html href="/article/1323232.html" target=_blank>3.1. Cross-validation: evaluating estimator performance)
(交叉验证)
/article/1323231.html href="/article/1323231.html" target=_blank>3.2. Grid Search: Searching for estimator parameters)
(搜索最佳参数组合)
/article/1323230.html href="/article/1323230.html" target=_blank>3.3. Model evaluation: quantifying the quality of predictions)
(模型效果评估:score函数、confusion matrix、classification report等)
/article/1323229.html href="/article/1323229.html" target=_blank>3.4. Model persistence)
(保存训练好的模型到本地:joblib.dump & joblib.load pickle .dump & pickle .load)
None、常用的监督非监督模型:
/article/1476347.html 矩阵因子分解问题)
/article/1323226.html href="/article/1323226.html" target=_blank>scikit-learn(工程中用的相对较多的模型介绍):1.4. Support Vector Machines)
SVM(SVC、SVR)
/article/1323225.html href="/article/1323225.html" target=_blank>scikit-learn(工程中用的相对较多的模型介绍):1.11. Ensemble methods)
Bagging meta-estimator、Forests of ranomized trees、AdaBoost、Gradient Tree Boosting(Gradient Boosted Regression Trees (GBRT) )
/article/1323224.html href="/article/1323224.html" target=_blank>scikit-learn(工程中用的相对较多的模型介绍):1.12. Multiclass
and multilabel algorithms)
Multiclass classification、Multilabel classification、Multioutput-multiclass classification and multi-task classification
/article/1323223.html href="/article/1323223.html" target=_blank>scikit-learn(工程中用的相对较多的模型介绍):1.13. Feature selection)
Univariate feature selection(单变量特征选择)、recursive feature elimination(递归特征消除)、L1-based / ree-based features selection(这个也用的比价多)、Feature selection as part of a pipeline
/article/1323222.html href="/article/1323222.html" target=_blank>
scikit-learn(工程中用的相对较多的模型介绍):1.14. Semi-Supervised
)/article/1323220.html href="/article/1323220.html" target=_blank>scikit-learn(工程中用的相对较多的模型介绍):2.3. Clustering(可用于特征的无监督降维))
相关文章推荐
- 0041 枚举 typedef 预处理指令:宏
- Householder 变换与 QR 分解
- ctk插件框架的使用
- Fibonacci Again
- 计蒜课-统计三角形
- 集合、可变集合
- 爷爷去世了。
- MySQL command
- struts2 kindeditor teatarea拿不到值问题。
- UEditor扩展上传
- H - Can you answer these queries? - (区间查询更新)
- 阿里云epel源
- Twelves Monkeys (multiset解法 141 - ZOJ Monthly, July 2015 - H)
- this call() apply()理解
- 开源 java CMS - FreeCMS2.3会员我的留言
- OC中可变字典和不可变字典
- 网站的通用注册原型设计
- Android ORMLite 框架的入门用法
- hihocoder 1191 小W与网格 (组合数)
- Android MenuItem 设置文本颜色-TextColor设置