您的位置:首页 > 运维架构

玩转sociopatterns大规模数据——动态数据

2019-04-12 11:17 836 查看

网址提供了一系列由sociopatterns sensing platform收集获取的数据。平时跟学生接触时间长,挺喜欢分析他们的数据,所以今天就琢磨下:高中学生联系的动态数据。

High school dynamic contact networks

  1. 数据说明:These datasets contain the temporal network of contacts between students in a high school in Marseilles, France.
  • [ The first dataset gives the contacts of the students of three classes during 4 days in Dec. 2011]
  • [the second corresponds to the contacts of the students of 5 classes during 7 days (from a Monday to the Tuesday of the following week) in Nov. 2012]
  • Each Contact list file contains a tab-separated list representing the active contacts during 20-second intervals of the data collection.
  • Each line has the form “t i j Ci Cj“, where i and j are the anonymous IDs of the persons in contact, Ci and Cj are their classes, and the interval during which this contact was active is [ t – 20s, t ]. If multiple contacts are active in a given interval, you will see multiple lines starting with the same value of t. Time is measured in seconds.
  • Each metadata file contains a tab-separated list in which each line of the form “i Ci Gi” gives class Ci and gender Gi of the person having ID i.
  1. 数据共享和使用遵守准则,并引用如下文章,且Please also acknowledge the SocioPatterns collaboration and provide a link to:
3. J. Fournet, A. Barrat, Contact patterns among high school students, PLoS ONE 9(9):e107878 (2014).
4. http://www.sociopatterns.org.

所以引用的时候做4件事:遵守准则;引用文章;致谢该机构;并提供链接。
5. 接下来,下载数据,观察分析原始数据,以一个数据2011为例:

  • metadata_2011.txt里面放了啥东西?126行3列的一些数据,以tab键分割,格式为:
    ID	班级		性别
    ,显然第一列是数字,表示ID学号为i的这个同学,第二列是字符表示该同学的班级?如果是老师标记为teacher,第三列就简单了女的用F表示,男的用F表示。
  • thires_2011.csv里面呢?28561行4列的数据,以tab键分割,格式参见上面说明,不废话了

如何使用这些数据?

  1. 留个爪,废话,当然是设计算法了?看你要做啥分析?然后设计相应算法么。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: