利用python进入数据分析之MovieLens_1M数据分析
2017-10-14 09:43
513 查看
数据设置
In [26]:import pandas as pd import os encoding = 'latin1' upath = os.path.expanduser('ch02/movielens/users.dat') rpath = os.path.expanduser('ch02/movielens/ratings.dat') mpath = os.path.expanduser('ch02/movielens/movies.dat') unames = ['user_id', 'gender', 'age', 'occupation', 'zip'] rnames = ['user_id', 'movie_id', 'rating', 'timestamp'] mnames = ['movie_id', 'title', 'genres'] users = pd.read_csv(upath, sep='::', header=None, names=unames, encoding=encoding) ratings = pd.read_csv(rpath, sep='::', header=None, names=rnames, encoding=encoding) movies = pd.read_csv(mpath, sep='::', header=None, names=mnames, encoding=encoding)
In [6]:
users[:5]
Out[6]:
user_id | gender | age | occupation | zip | |
---|---|---|---|---|---|
0 | 1 | F | 1 | 10 | 48067 |
1 | 2 | M | 56 | 16 | 70072 |
2 | 3 | M | 25 | 15 | 55117 |
3 | 4 | M | 45 | 7 | 02460 |
4 | 5 | M | 25 | 20 | 55455 |
ratings[:5]
Out[7]:
user_id | movie_id | rating | timestamp | |
---|---|---|---|---|
0 | 1 | 1193 | 5 | 978300760 |
1 | 1 | 661 | 3 | 978302109 |
2 | 1 | 914 | 3 | 978301968 |
3 | 1 | 3408 | 4 | 978300275 |
4 | 1 | 2355 | 5 | 978824291 |
movies[:5]
Out[8]:
movie_id | title | genres | |
---|---|---|---|
0 | 1 | Toy Story (1995) | Animation|Children's|Comedy |
1 | 2 | Jumanji (1995) | Adventure|Children's|Fantasy |
2 | 3 | Grumpier Old Men (1995) | Comedy|Romance |
3 | 4 | Waiting to Exhale (1995) | Comedy|Drama |
4 | 5 | Father of the Bride Part II (1995) | Comedy |
ratings
Out[9]:
user_id | movie_id | rating | timestamp | |
---|---|---|---|---|
0 | 1 | 1193 | 5 | 978300760 |
1 | 1 | 661 | 3 | 978302109 |
2 | 1 | 914 | 3 | 978301968 |
3 | 1 | 3408 | 4 | 978300275 |
4 | 1 | 2355 | 5 | 978824291 |
5 | 1 | 1197 | 3 | 978302268 |
6 | 1 | 1287 | 5 | 978302039 |
7 | 1 | 2804 | 5 | 978300719 |
8 | 1 | 594 | 4 | 978302268 |
9 | 1 | 919 | 4 | 978301368 |
10 | 1 | 595 | 5 | 978824268 |
11 | 1 | 938 | 4 | 978301752 |
12 | 1 | 2398 | 4 | 978302281 |
13 | 1 | 2918 | 4 | 978302124 |
14 | 1 | 1035 | 5 | 978301753 |
15 | 1 | 2791 | 4 | 978302188 |
16 | 1 | 2687 | 3 | 978824268 |
17 | 1 | 2018 | 4 | 978301777 |
18 | 1 | 3105 | 5 | 978301713 |
19 | 1 | 2797 | 4 | 978302039 |
20 | 1 | 2321 | 3 | 978302205 |
21 | 1 | 720 | 3 | 978300760 |
22 | 1 | 1270 | 5 | 978300055 |
23 | 1 | 527 | 5 | 978824195 |
24 | 1 | 2340 | 3 | 978300103 |
25 | 1 | 48 | 5 | 978824351 |
26 | 1 | 1097 | 4 | 978301953 |
27 | 1 | 1721 | 4 | 978300055 |
28 | 1 | 1545 | 4 | 978824139 |
29 | 1 | 745 | 3 | 978824268 |
... | ... | ... | ... | ... |
1000179 | 6040 | 2762 | 4 | 956704584 |
1000180 | 6040 | 1036 | 3 | 956715455 |
1000181 | 6040 | 508 | 4 | 956704972 |
1000182 | 6040 | 1041 | 4 | 957717678 |
1000183 | 6040 | 3735 | 4 | 960971654 |
1000184 | 6040 | 2791 | 4 | 956715569 |
1000185 | 6040 | 2794 | 1 | 956716438 |
1000186 | 6040 | 527 | 5 | 956704219 |
1000187 | 6040 | 2003 | 1 | 956716294 |
1000188 | 6040 | 535 | 4 | 964828734 |
1000189 | 6040 | 2010 | 5 | 957716795 |
1000190 | 6040 | 2011 | 4 | 956716113 |
1000191 | 6040 | 3751 | 4 | 964828782 |
1000192 | 6040 | 2019 | 5 | 956703977 |
1000193 | 6040 | 541 | 4 | 956715288 |
1000194 | 6040 | 1077 | 5 | 964828799 |
1000195 | 6040 | 1079 | 2 | 956715648 |
1000196 | 6040 | 549 | 4 | 956704746 |
1000197 | 6040 | 2020 | 3 | 956715288 |
1000198 | 6040 | 2021 | 3 | 956716374 |
1000199 | 6040 | 2022 | 5 | 956716207 |
1000200 | 6040 | 2028 | 5 | 956704519 |
1000201 | 6040 | 1080 | 4 | 957717322 |
1000202 | 6040 | 1089 | 4 | 956704996 |
1000203 | 6040 | 1090 | 3 | 956715518 |
1000204 | 6040 | 1091 | 1 | 956716541 |
1000205 | 6040 | 1094 | 5 | 956704887 |
1000206 | 6040 | 562 | 5 | 956704746 |
1000207 | 6040 | 1096 | 4 | 956715648 |
1000208 | 6040 | 1097 | 4 | 956715569 |
数据合并
In [10]:data = pd.merge(pd.merge(ratings, users), movies) data
Out[10]:
user_id | movie_id | rating | timestamp | gender | age | occupation | zip | title | genres | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1193 | 5 | 978300760 | F | 1 | 10 | 48067 | One Flew Over the Cuckoo's Nest (1975) | Drama |
1 | 2 | 1193 | 5 | 978298413 | M | 56 | 16 | 70072 | One Flew Over the Cuckoo's Nest (1975) | Drama |
2 | 12 | 1193 | 4 | 978220179 | M | 25 | 12 | 32793 | One Flew Over the Cuckoo's Nest (1975) | Drama |
3 | 15 | 1193 | 4 | 978199279 | M | 25 | 7 | 22903 | One Flew Over the Cuckoo's Nest (1975) | Drama |
4 | 17 | 1193 | 5 | 978158471 | M | 50 | 1 | 95350 | One Flew Over the Cuckoo's Nest (1975) | Drama |
5 | 18 | 1193 | 4 | 978156168 | F | 18 | 3 | 95825 | One Flew Over the Cuckoo's Nest (1975) | Drama |
6 | 19 | 1193 | 5 | 982730936 | M | 1 | 10 | 48073 | One Flew Over the Cuckoo's Nest (1975) | Drama |
7 | 24 | 1193 | 5 | 978136709 | F | 25 | 7 | 10023 | One Flew Over the Cuckoo's Nest (1975) | Drama |
8 | 28 | 1193 | 3 | 978125194 | F | 25 | 1 | 14607 | One Flew Over the Cuckoo's Nest (1975) | Drama |
9 | 33 | 1193 | 5 | 978557765 | M | 45 | 3 | 55421 | One Flew Over the Cuckoo's Nest (1975) | Drama |
10 | 39 | 1193 | 5 | 978043535 | M | 18 | 4 | 61820 | One Flew Over the Cuckoo's Nest (1975) | Drama |
11 | 42 | 1193 | 3 | 978038981 | M | 25 | 8 | 24502 | One Flew Over the Cuckoo's Nest (1975) | Drama |
12 | 44 | 1193 | 4 | 978018995 | M | 45 | 17 | 98052 | One Flew Over the Cuckoo's Nest (1975) | Drama |
13 | 47 | 1193 | 4 | 977978345 | M | 18 | 4 | 94305 | One Flew Over the Cuckoo's Nest (1975) | Drama |
14 | 48 | 1193 | 4 | 977975061 | M | 25 | 4 | 92107 | One Flew Over the Cuckoo's Nest (1975) | Drama |
15 | 49 | 1193 | 4 | 978813972 | M | 18 | 12 | 77084 | One Flew Over the Cuckoo's Nest (1975) | Drama |
16 | 53 | 1193 | 5 | 977946400 | M | 25 | 0 | 96931 | One Flew Over the Cuckoo's Nest (1975) | Drama |
17 | 54 | 1193 | 5 | 977944039 | M | 50 | 1 | 56723 | One Flew Over the Cuckoo's Nest (1975) | Drama |
18 | 58 | 1193 | 5 | 977933866 | M | 25 | 2 | 30303 | One Flew Over the Cuckoo's Nest (1975) | Drama |
19 | 59 | 1193 | 4 | 977934292 | F | 50 | 1 | 55413 | One Flew Over the Cuckoo's Nest (1975) | Drama |
20 | 62 | 1193 | 4 | 977968584 | F | 35 | 3 | 98105 | One Flew Over the Cuckoo's Nest (1975) | Drama |
21 | 80 | 1193 | 4 | 977786172 | M | 56 | 1 | 49327 | One Flew Over the Cuckoo's Nest (1975) | Drama |
22 | 81 | 1193 | 5 | 977785864 | F | 25 | 0 | 60640 | One Flew Over the Cuckoo's Nest (1975) | Drama |
23 | 88 | 1193 | 5 | 977694161 | F | 45 | 1 | 02476 | One Flew Over the Cuckoo's Nest (1975) | Drama |
24 | 89 | 1193 | 5 | 977683596 | F | 56 | 9 | 85749 | One Flew Over the Cuckoo's Nest (1975) | Drama |
25 | 95 | 1193 | 5 | 977626632 | M | 45 | 0 | 98201 | One Flew Over the Cuckoo's Nest (1975) | Drama |
26 | 96 | 1193 | 3 | 977621789 | F | 25 | 16 | 78028 | One Flew Over the Cuckoo's Nest (1975) | Drama |
27 | 99 | 1193 | 2 | 982791053 | F | 1 | 10 | 19390 | One Flew Over the Cuckoo's Nest (1975) | Drama |
28 | 102 | 1193 | 5 | 1040737607 | M | 35 | 19 | 20871 | One Flew Over the Cuckoo's Nest (1975) | Drama |
29 | 104 | 1193 | 2 | 977546620 | M | 25 | 12 | 00926 | One Flew Over the Cuckoo's Nest (1975) | Drama |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1000179 | 4933 | 3084 | 3 | 962757020 | M | 25 | 15 | 94040 | Home Page (1999) | Documentary |
1000180 | 4802 | 2218 | 2 | 1014866656 | M | 56 | 1 | 40601 | Juno and Paycock (1930) | Drama |
1000181 | 4812 | 2308 | 2 | 962932391 | M | 18 | 14 | 25301 | Detroit 9000 (1973) | Action|Crime |
1000182 | 4874 | 624 | 4 | 962781918 | F | 25 | 4 | 70808 | Condition Red (1995) | Action|Drama|Thriller |
1000183 | 5059 | 1434 | 4 | 962484364 | M | 45 | 16 | 22652 | Stranger, The (1994) | Action |
1000184 | 5947 | 1434 | 4 | 957190428 | F | 45 | 16 | 97215 | Stranger, The (1994) | Action |
1000185 | 5077 | 1868 | 3 | 962417299 | M | 25 | 2 | 20037 | Truce, The (1996) | Drama|War |
1000186 | 5944 | 1868 | 1 | 957197520 | F | 18 | 10 | 27606 | Truce, The (1996) | Drama|War |
1000187 | 5105 | 404 | 3 | 962337582 | M | 50 | 7 | 18977 | Brother Minister: The Assassination of Malcolm... | Documentary |
1000188 | 5185 | 404 | 4 | 963402617 | F | 35 | 4 | 44485 | Brother Minister: The Assassination of Malcolm... | Documentary |
1000189 | 5532 | 404 | 5 | 959619841 | M | 25 | 17 | 27408 | Brother Minister: The Assassination of Malcolm... | Documentary |
1000190 | 5543 | 404 | 3 | 960127592 | M | 25 | 17 | 97401 | Brother Minister: The Assassination of Malcolm... | Documentary |
1000191 | 5220 | 2543 | 3 | 961546137 | M | 25 | 7 | 91436 | Six Ways to Sunday (1997) | Comedy |
1000192 | 5754 | 2543 | 4 | 958272316 | F | 18 | 1 | 60640 | Six Ways to Sunday (1997) | Comedy |
1000193 | 5227 | 591 | 3 | 961475931 | M | 18 | 10 | 64050 | Tough and Deadly (1995) | Action|Drama|Thriller |
1000194 | 5795 | 591 | 1 | 958145253 | M | 25 | 1 | 92688 | Tough and Deadly (1995) | Action|Drama|Thriller |
1000195 | 5313 | 3656 | 5 | 960920392 | M | 56 | 0 | 55406 | Lured (1947) | Crime |
1000196 | 5328 | 2438 | 4 | 960838075 | F | 25 | 4 | 91740 | Outside Ozona (1998) | Drama|Thriller |
1000197 | 5334 | 3323 | 3 | 960796159 | F | 56 | 13 | 46140 | Chain of Fools (2000) | Comedy|Crime |
1000198 | 5334 | 127 | 1 | 960795494 | F | 56 | 13 | 46140 | Silence of the Palace, The (Saimt el Qusur) (1... | Drama |
1000199 | 5334 | 3382 | 5 | 960796159 | F | 56 | 13 | 46140 | Song of Freedom (1936) | Drama |
1000200 | 5420 | 1843 | 3 | 960156505 | F | 1 | 19 | 14850 | Slappy and the Stinkers (1998) | Children's|Comedy |
1000201 | 5433 | 286 | 3 | 960240881 | F | 35 | 17 | 45014 | Nemesis 2: Nebula (1995) | Action|Sci-Fi|Thriller |
1000202 | 5494 | 3530 | 4 | 959816296 | F | 35 | 17 | 94306 | Smoking/No Smoking (1993) | Comedy |
1000203 | 5556 | 2198 | 3 | 959445515 | M | 45 | 6 | 92103 | Modulations (1998) | Documentary |
1000204 | 5949 | 2198 | 5 | 958846401 | M | 18 | 17 | 47901 | Modulations (1998) | Documentary |
1000205 | 5675 | 2703 | 3 | 976029116 | M | 35 | 14 | 30030 | Broken Vessels (1998) | Drama |
1000206 | 5780 | 2845 | 1 | 958153068 | M | 18 | 17 | 92886 | White Boys (1999) | Drama |
1000207 | 5851 | 3607 | 5 | 957756608 | F | 18 | 20 | 55410 | One Little Indian (1973) | Comedy|Drama|Western |
1000208 | 5938 | 2909 | 4 | 957273353 | M | 25 | 1 | 35401 | Five Wives, Three Secretaries and Me (1998) | Documentary |
In [11]:
data.ix[0]
D:\python2713\lib\anaconda_install\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing See the documentation here: http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate_ix """Entry point for launching an IPython kernel.
Out[11]:
user_id 1 movie_id 1193 rating 5 timestamp 978300760 gender F age 1 occupation 10 zip 48067 title One Flew Over the Cuckoo's Nest (1975) genres Drama Name: 0, dtype: object
计算电影平均分
In [34]:import sys reload(sys) sys.setdefaultencoding('latin1') mean_ratings = data.pivot_table('rating', index='title',columns='gender', aggfunc='mean')
In [38]:
mean_ratings[:5]
Out[38]:
gender | F | M |
---|---|---|
title | ||
$1,000,000 Duck (1971) | 3.375000 | 2.761905 |
'Night Mother (1986) | 3.388889 | 3.352941 |
'Til There Was You (1997) | 2.675676 | 2.733333 |
'burbs, The (1989) | 2.793478 | 2.962085 |
...And Justice for All (1979) | 3.828571 | 3.689024 |
ratings_by_title = data.groupby('title').size() #对title进行分组
In [40]:
ratings_by_title[:5]
Out[40]:
title $1,000,000 Duck (1971) 37 'Night Mother (1986) 70 'Til There Was You (1997) 52 'burbs, The (1989) 303 ...And Justice for All (1979) 199 dtype: int64
In [41]:
active_titles = ratings_by_title.index[ratings_by_title >= 250] # 获得评论数据大于250的电影
In [42]:
active_titles[:10]
Out[42]:
Index([u''burbs, The (1989)', u'10 Things I Hate About You (1999)', u'101 Dalmatians (1961)', u'101 Dalmatians (1996)', u'12 Angry Men (1957)', u'13th Warrior, The (1999)', u'2 Days in the Valley (1996)', u'20,000 Leagues Under the Sea (1954)', u'2001: A Space Odyssey (1968)', u'2010 (1984)'], dtype='object', name=u'title')
In [43]:
mean_ratings = mean_ratings.ix[active_titles]
mean_ratings
Out[43]:
gender | F | M |
---|---|---|
title | ||
'burbs, The (1989) | 2.793478 | 2.962085 |
10 Things I Hate About You (1999) | 3.646552 | 3.311966 |
101 Dalmatians (1961) | 3.791444 | 3.500000 |
101 Dalmatians (1996) | 3.240000 | 2.911215 |
12 Angry Men (1957) | 4.184397 | 4.328421 |
13th Warrior, The (1999) | 3.112000 | 3.168000 |
2 Days in the Valley (1996) | 3.488889 | 3.244813 |
20,000 Leagues Under the Sea (1954) | 3.670103 | 3.709205 |
2001: A Space Odyssey (1968) | 3.825581 | 4.129738 |
2010 (1984) | 3.446809 | 3.413712 |
28 Days (2000) | 3.209424 | 2.977707 |
39 Steps, The (1935) | 3.965517 | 4.107692 |
54 (1998) | 2.701754 | 2.782178 |
7th Voyage of Sinbad, The (1958) | 3.409091 | 3.658879 |
8MM (1999) | 2.906250 | 2.850962 |
About Last Night... (1986) | 3.188679 | 3.140909 |
Absent Minded Professor, The (1961) | 3.469388 | 3.446809 |
Absolute Power (1997) | 3.469136 | 3.327759 |
Abyss, The (1989) | 3.659236 | 3.689507 |
Ace Ventura: Pet Detective (1994) | 3.000000 | 3.197917 |
Ace Ventura: When Nature Calls (1995) | 2.269663 | 2.543333 |
Addams Family Values (1993) | 3.000000 | 2.878531 |
Addams Family, The (1991) | 3.186170 | 3.163498 |
Adventures in Babysitting (1987) | 3.455782 | 3.208122 |
Adventures of Buckaroo Bonzai Across the 8th Dimension, The (1984) | 3.308511 | 3.402321 |
Adventures of Priscilla, Queen of the Desert, The (1994) | 3.989071 | 3.688811 |
Adventures of Robin Hood, The (1938) | 4.166667 | 3.918367 |
African Queen, The (1951) | 4.324232 | 4.223822 |
Age of Innocence, The (1993) | 3.827068 | 3.339506 |
Agnes of God (1985) | 3.534884 | 3.244898 |
... | ... | ... |
White Men Can't Jump (1992) | 3.028777 | 3.231061 |
Who Framed Roger Rabbit? (1988) | 3.569378 | 3.713251 |
Who's Afraid of Virginia Woolf? (1966) | 4.029703 | 4.096939 |
Whole Nine Yards, The (2000) | 3.296552 | 3.404814 |
Wild Bunch, The (1969) | 3.636364 | 4.128099 |
Wild Things (1998) | 3.392000 | 3.459082 |
Wild Wild West (1999) | 2.275449 | 2.131973 |
William Shakespeare's Romeo and Juliet (1996) | 3.532609 | 3.318644 |
Willow (1988) | 3.658683 | 3.453543 |
Willy Wonka and the Chocolate Factory (1971) | 4.063953 | 3.789474 |
Witness (1985) | 4.115854 | 3.941504 |
Wizard of Oz, The (1939) | 4.355030 | 4.203138 |
Wolf (1994) | 3.074074 | 2.899083 |
Women on the Verge of a Nervous Breakdown (1988) | 3.934307 | 3.865741 |
Wonder Boys (2000) | 4.043796 | 3.913649 |
Working Girl (1988) | 3.606742 | 3.312500 |
World Is Not Enough, The (1999) | 3.337500 | 3.388889 |
Wrong Trousers, The (1993) | 4.588235 | 4.478261 |
Wyatt Earp (1994) | 3.147059 | 3.283898 |
X-Files: Fight the Future, The (1998) | 3.489474 | 3.493797 |
X-Men (2000) | 3.682310 | 3.851702 |
Year of Living Dangerously (1982) | 3.951220 | 3.869403 |
Yellow Submarine (1968) | 3.714286 | 3.689286 |
You've Got Mail (1998) | 3.542424 | 3.275591 |
Young Frankenstein (1974) | 4.289963 | 4.239177 |
Young Guns (1988) | 3.371795 | 3.425620 |
Young Guns II (1990) | 2.934783 | 2.904025 |
Young Sherlock Holmes (1985) | 3.514706 | 3.363344 |
Zero Effect (1998) | 3.864407 | 3.723140 |
eXistenZ (1999) | 3.098592 | 3.289086 |
In [44]:
mean_ratings = mean_ratings.rename(index={'Seven Samurai (The Magnificent Seven) (Shichinin no samurai) (1954)': 'Seven Samurai (Shichinin no samurai) (1954)'})
In [45]:
top_female_ratings = mean_ratings.sort_index(by='F', ascending=False)# 获取女性观众最喜欢的电影 top_female_ratings[:10]
Out[45]:
gender | F | M |
---|---|---|
title | ||
Close Shave, A (1995) | 4.644444 | 4.473795 |
Wrong Trousers, The (1993) | 4.588235 | 4.478261 |
Sunset Blvd. (a.k.a. Sunset Boulevard) (1950) | 4.572650 | 4.464589 |
Wallace & Gromit: The Best of Aardman Animation (1996) | 4.563107 | 4.385075 |
Schindler's List (1993) | 4.562602 | 4.491415 |
Shawshank Redemption, The (1994) | 4.539075 | 4.560625 |
Grand Day Out, A (1992) | 4.537879 | 4.293255 |
To Kill a Mockingbird (1962) | 4.536667 | 4.372611 |
Creature Comforts (1990) | 4.513889 | 4.272277 |
Usual Suspects, The (1995) | 4.513317 | 4.518248 |
相关文章推荐
- 利用Python进行数据分析---ch02《MovieLens 1M数据集(下)》读书笔记
- 利用Python进行数据分析---ch02《MovieLens 1M数据集(上)》读书笔记
- Learning: 利用Python进行数据分析 - MovieLens 数据集的探索
- 利用python进入数据分析之全美婴儿姓名分析
- 利用python进入数据分析之数据聚合与数据分组运算
- 利用python进入数据分析之Numpy基础知识
- 利用python进入数据分析之pandas的使用
- 利用python进入数据分析之准备篇
- 利用python进入数据分析之usagov_bitly_data数据分析
- 利用python进入数据分析之数据加载、存储、文件格式
- 利用python进入数据分析之数据规整化:清理、转换、合并、重塑(一)
- MovieLens 1M之python数据分析练习
- Python进行数据分析(二)MovieLens 1M 数据集
- 利用python进行数据分析-数据规整化1
- 利用Python进行数据分析--数据聚合与分组运算1
- 利用python进行数据分析-数据规整化2
- 利用Python进行数据分析 学习笔记
- 利用python调用elasticsearch-api来分析数据并作图进行日报邮件发送
- 思维导图-利用Python进行数据分析
- 利用python进行数据分析-pandas.concat/subplots/gropuby/pivot_table,多文件整合、聚合、分组,子图