您的位置:首页 > 其它

eachmovie 数据集说明

2015-11-02 20:42 369 查看
From: (http://www.research.digital.com/SRC/eachmovie/)

[EachMovie]

EachMovie collaborative filtering data set

Contents

Introduction

Terms of usage

Schema

Obtaining the data set

Introduction

The DEC Systems Research Center ran the EachMovie recommendation service for 18 months to experiment with a collaborative filtering algorithm. During that time, some
72916 users entered a total of 2811983 numeric ratings for 1628 different movies (films and videos). We are making this preference data set available, with all user identification removed, so that other collaborative filtering researchers can use it to test
their algorithms.

If you are interested in the design of our system, you can read the Each to Each Programmer's Reference Manual written by Paul McJones and John DeTreville.

Terms of usage

Copyright © Digital Equipment Corporation 1997.

The preference data set was compiled by Digital Equipment Corporation using our collaborative filtering technology. Digital is making the data set available for use
under the terms that apply to this Digital web site (see Legal) including the following terms:

1. All information is provided "AS IS". Digital makes no warranties or representations with respect to the completeness or accuracy of the information or otherwise.
DIGITAL DISCLAIMS ALL WARRANTIES WITH REGARD TO THE INFORMATION, INCLUDING ANY IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.

2. In no event shall Digital be liable for damages, and in particular Digital shall not be liable for special, indirect, consequential, or incidental damages, or damages
for lost profits, loss of revenue, or loss of use, arising out of or related to the information or the use or dissemination thereof, whether such damages arise in contract, negligence, tort, under statute, in equity, at law or otherwise.

3. The user may use the information only for research purposes which are non-commercial and non-revenue bearing. Any published research results or other publications
resulting from use of the information shall credit Digital Equipment Corporation as the provider of the data. The user agrees to provide Digital with a copy of any such publication using any of the contact names provided at this web site. The user may make
copies of the data set as needed for internal use only for the preceding purposes. All such copies shall duplicate Digital's copyright notice and this notice.

Schema

The data set is available as eachmoviedata.tar.gz (zipped tab-separated-value text files, 17632000 bytes compressed). There are three tables, one per file:

Person (person.txt) provides optional, unaudited demographic data supplied by each person:

ID: Number -- primary key

Age: Number

Gender: Text -- one of "M", "F"

Zip_Code: Text

Movie (movie.txt) provides descriptive information about each movie:

ID: Number -- primary key

Name: Text

PR_URL: Text -- URL of studio PR site

IMDb_URL: Text -- URL of Internet Movie Database entry

Theater_Status: Text -- either "old" or "current"

Theater_Release: Date/Time

Video_Status: Text -- either "old" or "current"

Video_Release: Date/Time

Action, Animation, Art_Foreign, Classic, Comedy, Drama, Family, Horror, Romance, Thriller: Yes/No

IMDb URLs are provided by courtesy of Internet Movie Database.

The theater and video status and release dates were (approximately) correct in the San Francisco bay area as of September 15, 1997, when EachMovie was terminated.

Vote (vote.txt) is the actual rating data:

Person_ID: Number

Movie_ID: Number

Score: Number -- 0 <= Score <= 1

Weight: Number -- 0 < Weight <= 1

Modified: Date/Time

Score is the rating provided by this person for this movie. The zero-to-five star rating used externally on EachMovie is mapped linearly to the interval [0,1]. Here's
a histogram of the Score values:

Score Count

0 347191

0.2 150495

0.4 339718

0.6 701236

0.8 761676

1.0 511667

Weight is only relevant in the case of a Score of zero, in which case it distinguishes whether the person rated a movie as zero stars (weight = 1) or "sounds awful"
(weight < 1). (Most "sounds awful" weights are 0.2, but for historical reasons about 10% are 0.5.) The idea behind "sounds awful" was to let a user indicate he never planned to see a movie (hence we would omit it from future list of predictions). Our collaborative
filtering algorithm treated such a declaration as less authoratative than a regular rating of zero stars.

Given our site design, there is no way to know whether the person had seen the movie in a theater or on video.

Obtaining the data set

If you have read the terms above, and agree to them, contact

Steve Glassman

<steveg@pa.dec.com>

1 650 853-2166

Compaq Systems Research Center

130 Lytton Avenue

Palo Alto, CA 94301

by telephone or email. He will give you a password for downloading the data. You may also send copies of your publications involving this data (see term 3 above) to
Steve.

Legal

Digital

Developed by Digital Equipment Corporation.

Copyright © Digital Equipment Corporation, 1997.

The DIGITAL logo is a trademark of Digital Equipment Corporation.

All other trademarks are the property of their respective owners. kumpf last updated Jul 30, 1999

转自:http://www.douban.com/note/502794377/
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: