TF-IDF 的计算二
2011-08-29 22:51
281 查看
TF(Term Frequency)计算公式: TFi,j = Freq i,j / max Freq j
以上公式中Freq i,j 是该词在文件dj中的出现次数,max Freq j 是在文件dj中所有字词的出现次数之和.
class TFs
{
ArrayList<HashMap<String,Double>> TFsList = new ArrayList<HashMap<String, Double>>();
ArrayList<ArrayList<String>> TFsMainFileList = new ArrayList<ArrayList<String>>();
public TFs(ArrayList<ArrayList<String>> tf)
{
TFsMainFileList = tf;
}
public ArrayList<HashMap<String,Double>> PrintTFs()
{
for(int i=0; i<TFsMainFileList.size(); i++)
{
//TermTF use to save subFile of term and value
HashMap<String,Double> TermTF = new HashMap<String,Double>();
HashMap<String,Double> saveTF = new HashMap<String,Double>();
ArrayList<String> TFsSubFileList = TFsMainFileList.get(i);
int TermMaxFreq=0;//maxcount(max freqj) is the maximum number of times any term occurs is documentj.
for(int j=0; j<TFsSubFileList.size(); j++)
{
//Take elements from arraylist<hashmap<string,Double>>
if(!TermTF.containsKey(TFsSubFileList.get(j)))
{
TermTF.put(TFsSubFileList.get(j),1.0);
}
else
{
double value = TermTF.get(TFsSubFileList.get(j));
value ++;
TermTF.put(TFsSubFileList.get(j),value);
if(value > TermMaxFreq)
{
TermMaxFreq = (int)(value);
}
}
}
for(int v=0; v<TFsSubFileList.size(); v++)
{
if(!saveTF.containsKey(TFsSubFileList.get(v)))
{
double TermFreq = TermTF.get(TFsSubFileList.get(v));//where freqi,j is the number of times term i occurs in document j
double tfs = (double)TermFreq / (double)TermMaxFreq;
saveTF.put(TFsSubFileList.get(v),tfs);
}
}
TFsList.add(saveTF);
}
return TFsList;
}
}
以上公式中Freq i,j 是该词在文件dj中的出现次数,max Freq j 是在文件dj中所有字词的出现次数之和.
class TFs
{
ArrayList<HashMap<String,Double>> TFsList = new ArrayList<HashMap<String, Double>>();
ArrayList<ArrayList<String>> TFsMainFileList = new ArrayList<ArrayList<String>>();
public TFs(ArrayList<ArrayList<String>> tf)
{
TFsMainFileList = tf;
}
public ArrayList<HashMap<String,Double>> PrintTFs()
{
for(int i=0; i<TFsMainFileList.size(); i++)
{
//TermTF use to save subFile of term and value
HashMap<String,Double> TermTF = new HashMap<String,Double>();
HashMap<String,Double> saveTF = new HashMap<String,Double>();
ArrayList<String> TFsSubFileList = TFsMainFileList.get(i);
int TermMaxFreq=0;//maxcount(max freqj) is the maximum number of times any term occurs is documentj.
for(int j=0; j<TFsSubFileList.size(); j++)
{
//Take elements from arraylist<hashmap<string,Double>>
if(!TermTF.containsKey(TFsSubFileList.get(j)))
{
TermTF.put(TFsSubFileList.get(j),1.0);
}
else
{
double value = TermTF.get(TFsSubFileList.get(j));
value ++;
TermTF.put(TFsSubFileList.get(j),value);
if(value > TermMaxFreq)
{
TermMaxFreq = (int)(value);
}
}
}
for(int v=0; v<TFsSubFileList.size(); v++)
{
if(!saveTF.containsKey(TFsSubFileList.get(v)))
{
double TermFreq = TermTF.get(TFsSubFileList.get(v));//where freqi,j is the number of times term i occurs in document j
double tfs = (double)TermFreq / (double)TermMaxFreq;
saveTF.put(TFsSubFileList.get(v),tfs);
}
}
TFsList.add(saveTF);
}
return TFsList;
}
}
相关文章推荐
- 文件文档文档的词频-反向文档频率(TF-IDF)计算
- 关键词权重计算算法:TF-IDF
- python scikit-learn计算tf-idf词语权重
- TF-IDF与余弦相似性的计算
- TF-IDF词项权重计算
- TF-IDF在关键词自动提取、计算文本相似度和摘要自动生成上的应用
- 分享自用小工具:TF-IDF计算文档相似性的python实现
- Java计算TF-IDF值
- python计算tfidf
- spark mllib 中的tf-idf算法计算文档相似度
- 在线编程题-计算文本的 TFIDF值
- 使用spark的TF-IDF算法计算单词的重要性
- 文档的词频-反向文档频率(TF-IDF)计算
- python scikit-learn计算tf-idf词语权重
- 使用scikit-learn tfidf计算词语权重
- 关键词权重计算算法 - TF-IDF
- 在线编程题-计算文本的 TFIDF值
- TF-IDF词项权重计算
- 使用Gensim建立bow TFIDF LSI模型对文本相似度计算
- TF-IDF解析及在计算广告中的应用