Link-based Classification相关数据集
2015-07-14 16:18
169 查看
Link-based Classification相关数据集
DatasetsDocument Classification Datasets:
CiteSeer: The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 3703 unique words. The README file in the dataset provides more details. Click here to download the tarball containing the dataset.
Cora: The Cora dataset consists of 2708 scientific publications classified into one of seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words. The README file in the dataset provides more details. Click here to download the tarball containing the dataset.
WebKB: The WebKB dataset consists of 877 scientific publications classified into one of five classes. The citation network consists of 1608 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1703 unique words. The README file in the dataset provides more details. Click here to download the tarball containing the dataset.
Social Network Datasets:
Terrorists: This dataset contains information about terrorists and their relationships. Unlike the previous datasets, this dataset was designed for classification experiments aimed at classifying the relationships among terrorists. The dataset contains 851 relationships, each described by a 0/1-valued vector of attributes where each entry indicates the absence/presence of a feature. There are a total of 1224 distinct features. Each relationship can be assigned one or more labels out of a maximum of four labels making this dataset suitable for multi-label classification tasks. The README file provides more details. Click here to download the tarball containing the dataset.
Terrorist Attacks: This dataset consists of 1293 terrorist attacks each assigned one of 6 labels indicating the type of the attack. Each attack is described by a 0/1-valued vector of attributes whose entries indicate the absence/presence of a feature. There are a total of 106 distinct features. The files in the dataset can be used to create two distinct graphs. The README file in the dataset provides more details. Click here to download the tarball containing the dataset.
更多 http://www.cs.umd.edu/~sen/lbc-proj/LBC.html
相关文章推荐
- SSH轻量级框架的理解
- Cocos2d-JS加速度计与加速度事件
- 计算机网络网络层
- Android高手进阶教程(一)-------Android常用名令集锦(图文并茂)!
- 如何使用maven构建项目
- matlab图像处理(2)
- qt 5.0中HeaderView的setResiziMode无法使用的问题
- WinServer2008 R2下部署SQL 2008 群集(Part2)
- 下载android4.4.2源码全过程(附已下载的源码)
- c++ STL queue 内存布局简析
- SUSE Linux系统降级内核(Kernel)
- 五、排序
- Android Studio 快捷键
- 使用python的smtp模块发送邮件
- 什么是SPOOLing?
- 3个著名加密算法(MD5、RSA、DES)的解析
- maven install与maven build的区别
- 聊聊Node.js 独立日漏洞
- node.js开发中使用Node Supervisor实现监测文件修改并自动重启应用提高nodejs调试效率
- 关于android的webview 使用的问题 JS无法正常使用 界面切换出现白屏