python数据清洗工具、方法、过程整理归纳(二、数据清洗之文件读写——读取csv、Excel和MySQL数据)
2020-03-05 06:09
781 查看
文章目录
3 文件读写
3.1 CSV文件读写
- pandas内置了10多种数据源读取函数,常见的就是CSV和EXCEL
- 使用read_csv方式读取,结果为dataframe格式
- 在读取csv文件时,文件名称尽量是英文
- 参数较多,可以自行控制,但很多时候用默认参数
- 读取csv时,注意编码,常用编码为utf-8、gbk、gbk2312和gb18030等
- 使用to_csv方法快速保存
import numpy as np import pandas as pd import os#用于更改文件路径 os.getcwd()#当前文件路径 'D:\\code\\jupyter\\course' os.chdir('D:\\code\\jupyter\\course\\代码和数据')#更改文件存放路径 baby = pd.read_csv('sam_tianchi_mum_baby.csv',encoding = 'utf-8')#默认是utf-8。read_csv会把数据的第一行当做表头即‘列索引’,行索引默认从0开始 baby.head(5) user_id birthday gender 0 2757 20130311 1 1 415971 20121111 0 2 1372572 20120130 1 3 10339332 20110910 0 4 10642245 20130213 0 order = pd.read_csv('meal_order_info.csv',encoding = 'gbk')#utf-8报错,可以尝试gbk. order.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 945 entries, 0 to 944 Data columns (total 21 columns): info_id 945 non-null int64 emp_id 945 non-null int64 number_consumers 945 non-null int64 mode 0 non-null float64 dining_table_id 945 non-null int64 dining_table_name 945 non-null int64 expenditure 945 non-null int64 dishes_count 945 non-null int64 accounts_payable 945 non-null int64 use_start_time 945 non-null object check_closed 0 non-null float64 lock_time 936 non-null object cashier_id 0 non-null float64 pc_id 0 non-null float64 order_number 0 non-null float64 org_id 945 non-null int64 print_doc_bill_num 0 non-null float64 lock_table_info 0 non-null float64 order_status 945 non-null int64 phone 945 non-null int64 name 945 non-null object dtypes: float64(7), int64(11), object(3) memory usage: 155.1+ KB order = pd.read_csv('meal_order_info.csv',encoding = 'gbk',dtype = {'info_id':str, 'emp_id':str})#希望把数值型读成字符串类型 order.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 945 entries, 0 to 944 Data columns (total 21 columns): info_id 945 non-null object emp_id 945 non-null object number_consumers 945 non-null int64 mode 0 non-null float64 dining_table_id 945 non-null int64 dining_table_name 945 non-null int64 expenditure 945 non-null int64 dishes_count 945 non-null int64 accounts_payable 945 non-null int64 use_start_time 945 non-null object check_closed 0 non-null float64 lock_time 936 non-null object cashier_id 0 non-null float64 pc_id 0 non-null float64 order_number 0 non-null float64 org_id 945 non-null int64 print_doc_bill_num 0 non-null float64 lock_table_info 0 non-null float64 order_status 945 non-null int64 phone 945 non-null int64 name 945 non-null object dtypes: float64(7), int64(9), object(5) memory usage: 155.1+ KB order.head(10) info_id emp_id number_consumers mode dining_table_id dining_table_name expenditure dishes_count accounts_payable use_start_time ... lock_time cashier_id pc_id order_number org_id print_doc_bill_num lock_table_info order_status phone name 0 417 1442 4 NaN 1501 1022 165 5 165 2016/8/1 11:05:36 ... 2016/8/1 11:11:46 NaN NaN NaN 330 NaN NaN 1 18688880641 苗宇怡 1 301 1095 3 NaN 1430 1031 321 6 321 2016/8/1 11:15:57 ... 2016/8/1 11:31:55 NaN NaN NaN 328 NaN NaN 1 18688880174 赵颖 2 413 1147 6 NaN 1488 1009 854 15 854 2016/8/1 12:42:52 ... 2016/8/1 12:54:37 NaN NaN NaN 330 NaN NaN 1 18688880276 徐毅凡 3 415 1166 4 NaN 1502 1023 466 10 466 2016/8/1 12:51:38 ... 2016/8/1 13:08:20 NaN NaN NaN 330 NaN NaN 1 18688880231 张大鹏 4 392 1094 10 NaN 1499 1020 704 24 704 2016/8/1 12:58:44 ... 2016/8/1 13:07:16 NaN NaN NaN 330 NaN NaN 1 18688880173 孙熙凯 5 381 1243 4 NaN 1487 1008 239 7 239 2016/8/1 13:15:42 ... 2016/8/1 13:23:42 NaN NaN NaN 330 NaN NaN 1 18688880441 沈晓雯 6 429 1452 4 NaN 1501 1022 699 15 699 2016/8/1 13:17:37 ... 2016/8/1 13:34:18 NaN NaN NaN 330 NaN NaN 1 18688880651 苗泽坤 7 433 1109 8 NaN 1490 1011 511 14 511 2016/8/1 13:38:27 ... 2016/8/1 13:50:16 NaN NaN NaN 330 NaN NaN 1 18688880212 李达明 8 569 1143 6 NaN 1488 1009 326 9 326 2016/8/1 17:06:20 ... 2016/8/1 17:18:20 NaN NaN NaN 330 NaN NaN 1 18688880272 陈有浩 9 655 1268 8 NaN 1492 1013 263 10 263 2016/8/1 17:32:27 ... 2016/8/1 17:44:27 NaN NaN NaN 330 NaN NaN 1 18688880466 沈丹丹 10 rows × 21 columns baby = pd.read_csv('baby_trade_history.csv',nrows = 100)#读取前100行 baby user_id auction_id cat_id cat1 property buy_mount day 0 786295544 41098319944 50014866 50022520 21458:86755362;13023209:3593274;10984217:21985... 2 20140919 1 532110457 17916191097 50011993 28 21458:11399317;1628862:3251296;21475:137325;16... 1 20131011 2 249013725 21896936223 50012461 50014815 21458:30992;1628665:92012;1628665:3233938;1628... 1 20131011 3 917056007 12515996043 50018831 50014815 21458:15841995;21956:3494076;27000458:59723383... 2 20141023 4 444069173 20487688075 50013636 50008168 21458:30992;13658074:3323064;1628665:3233941;1... 1 20141103 5 152298847 41840167463 121394024 50008168 21458:3408353;13023209:727117752;22009:2741771... 1 20141103 6 513441334 19909384116 50010557 50008168 25935:21991;1628665:29784;22019:34731;22019:20... 1 20121212 7 297411659 13540124907 50010542 50008168 21458:60020529;25935:31381;1633959:27247291;16... 1 20121212 8 82830661 19948600790 50013874 28 21458:11580;21475:137325 1 20121101 9 475046636 10368360710 203527 28 22724:40168;22729:40278;21458:21817;2770200:24... 1 20121101 10 734147966 15307958346 50018202 38 21458:3270827;7361532:28710594;7397093:7536994... 2 20121101 11 68547330 21162876126 50012365 122650008 1628665:3233941;1628665:3233942;1628665:323393... 1 20121123 12 697081418 15898050723 50013636 50008168 21458:19726868;1633959:179425852;13836282:1290... 1 20121123 13 377550424 15771663914 50015841 28 1628665:3233941;1628665:3233942;3914866:11580;... 1 20121123 14 88313935 22532727492 50013711 50008168 1628665:3233941;1628665:3233942;22019:3340598;... 1 20131005 15 25918750 16078389250 50012359 122650008 21458:3405407;1633959:6186201;1628366:32799;81... 1 20131005 16 350288528 35086271572 50010544 50008168 21458:61813;25935:21991;1628665:3233938;162866... 1 20131129 17 348090113 17436967558 50009540 50014815 21458:21910;3110425:30696849;2191928:75373546;... 1 20131129 18 1635282280 36153356431 50013207 50008168 1628665:29784;1628665:29799;2904342:31004;2201... 1 20131129 19 530850018 22058239899 50024147 28 21458:205007542;43307470:5543413;2339128:62147... 1 20140210 20 749507708 19171641742 50018860 28 21458:3602856;1628665:3233941;1628665:3233942;... 1 20140210 21 201088567 38564176352 50013207 50008168 1628665:3233941;1628665:3233942;1628665:323393... 1 20140502 22 469517728 8232924597 211122 38 21458:21782;36786:42781029;13023102:6999219;22... 6 20140502 23 691367866 17712372914 121434042 50014815 21458:49341152;8021059:5525523;6851452:1398669... 1 20140804 24 77193822 35537441586 50006520 50014815 22277:6262384;21458:30992;1628665:3233941;1628... 2 20140804 25 605678021 15502618744 50010555 50008168 25935:31381;1628665:3233941;1628665:3233942;16... 1 20130226 26 47702620 26481508332 121412034 50014815 21458:49341152;11057903:4036007;130475532:7537... 1 20140918 27 763560371 40945285800 50012365 122650008 21458:30992;1628665:3233939;22007:30338;22007:... 1 20150201 28 408028533 35838498718 50012442 50008168 21458:3596449;6811831:3446999;13023209:3446999... 1 20141009 29 53566371 27177784760 121394024 50008168 21458:42090508;1628665:3233941;1628665:3233942... 1 20141009 ... ... ... ... ... ... ... ... 70 113473924 15486726090 50014250 28 21458:30015090;1633959:43047819;1627584:28619;... 1 20120905 71 117887031 10956228163 50012451 50008168 1628665:3233942;1628665:3233938;1628665:133527... 1 20120905 72 468447138 15550398428 50012442 50008168 1628665:3233936;1628665:29782;1627349:11462;16... 1 20120905 73 348660284 10896577394 50014250 28 1628665:29796;1628665:108579;1627584:11580;116... 1 20130525 74 129642523 23703880889 50012364 122650008 1628665:3233941;1628665:3233942;1628665:323393... 1 20130525 75 1708761610 18560026026 50016030 50008168 21458:30992;1628665:3233941;25935:31381;22019:... 1 20130525 76 908702885 15515470575 50023591 50022520 21458:26309047;1633959:39224289;11697064:31617... 1 20130312 77 151915451 17305821144 211122 38 21458:21782;36786:42781029;6933553:3313169;130... 2 20140104 78 745002413 36815797313 50023645 28 1628665:82340;21475:11488282;21458:56610575;49... 1 20140104 79 1046234868 10799142007 50023591 50022520 1628665:3233941;1628665:3233942;1628665:323393... 1 20121109 80 810362779 16933071954 50010545 50008168 21458:57430303;1633959:2477;23150:45030;25935:... 1 20121109 81 119784861 20796936076 50140021 50008168 21458:120325094;22019:2026;22019:34731;22019:3... 1 20121129 82 277184180 17734463967 50010555 50008168 21458:3482405;8697758:26247633;1633959:3336498... 1 20121129 83 648623529 16590447919 50010555 50008168 25935:31381;1628665:3233938;1628665:82340;1628... 1 20121129 84 1085938456 39009925227 50013207 50008168 21458:21599;1628665:3233941;22121:30905;122217... 1 20140827 85 2214390386 40856437695 50013636 50008168 21458:216724052;36628385:3480253;1628665:32339... 1 20140827 86 346816172 37132432638 50013636 50008168 21458:216291676;13023209:3583497;35044286:1242... 1 20140827 87 654037597 13775864723 50011993 28 21458:116116655;1633959:3276615;1628862:50276;... 1 20130513 88 1667892062 16767168507 50158020 50008168 1628665:131622;13395135:21671;22019:3340598;22... 1 20130513 89 277279277 18024521052 211122 38 21458:33516;33480:3238774;2653417:7353464;3359... 12 20130513 90 1721792494 36154660054 50008845 28 21458:3400531;5653832:7049425;13023209:7049425... 1 20140312 91 56549058 26930668292 50003700 28 21458:3351431;123273479:31526;1628665:3233941;... 1 20140312 92 696527486 37269469522 50011993 28 21458:118564374;13023209:547499553;122218042:3... 1 20140718 93 643153890 17954181229 50003700 28 123273479:41376163;21475:135183931;1628665:323... 1 20140718 94 362976947 39676108316 50012375 50022520 1628665:29796;1628665:29799;21967:29774;21967:... 1 20140718 95 1097191176 39095838474 50015841 28 1628665:3233941;1628665:3233942;1628665:323393... 1 20150203 96 1107237181 18979330679 121382039 50014815 21458:3371752;13023209:282182273;21475:3779036... 1 20150203 97 1090130969 38473204110 50012364 122650008 21458:30992;1628665:29778;1628665:29799;22007:... 1 20140604 98 373997473 24898348642 50012442 50008168 1628665:29778;22009:29800;122217965:3227750;12... 1 20140604 99 59135448 20494104463 50012375 50022520 21458:7902780;13023209:43969797;2397831:165611... 1 20140929 100 rows × 7 columns pd.set_option('display.max_columns',20)#最多显示20列 pd.set_option('display.max_rows',100)#最多显示100行 baby user_id auction_id cat_id cat1 property buy_mount day 0 786295544 41098319944 50014866 50022520 21458:86755362;13023209:3593274;10984217:21985... 2 20140919 1 532110457 17916191097 50011993 28 21458:11399317;1628862:3251296;21475:137325;16... 1 20131011 2 249013725 21896936223 50012461 50014815 21458:30992;1628665:92012;1628665:3233938;1628... 1 20131011 3 917056007 12515996043 50018831 50014815 21458:15841995;21956:3494076;27000458:59723383... 2 20141023 4 444069173 20487688075 50013636 50008168 21458:30992;13658074:3323064;1628665:3233941;1... 1 20141103 5 152298847 41840167463 121394024 50008168 21458:3408353;13023209:727117752;22009:2741771... 1 20141103 6 513441334 19909384116 50010557 50008168 25935:21991;1628665:29784;22019:34731;22019:20... 1 20121212 7 297411659 13540124907 50010542 50008168 21458:60020529;25935:31381;1633959:27247291;16... 1 20121212 8 82830661 19948600790 50013874 28 21458:11580;21475:137325 1 20121101 9 475046636 10368360710 203527 28 22724:40168;22729:40278;21458:21817;2770200:24... 1 20121101 10 734147966 15307958346 50018202 38 21458:3270827;7361532:28710594;7397093:7536994... 2 20121101 11 68547330 21162876126 50012365 122650008 1628665:3233941;1628665:3233942;1628665:323393... 1 20121123 12 697081418 15898050723 50013636 50008168 21458:19726868;1633959:179425852;13836282:1290... 1 20121123 13 377550424 15771663914 50015841 28 1628665:3233941;1628665:3233942;3914866:11580;... 1 20121123 14 88313935 22532727492 50013711 50008168 1628665:3233941;1628665:3233942;22019:3340598;... 1 20131005 15 25918750 16078389250 50012359 122650008 21458:3405407;1633959:6186201;1628366:32799;81... 1 20131005 16 350288528 35086271572 50010544 50008168 21458:61813;25935:21991;1628665:3233938;162866... 1 20131129 17 348090113 17436967558 50009540 50014815 21458:21910;3110425:30696849;2191928:75373546;... 1 20131129 18 1635282280 36153356431 50013207 50008168 1628665:29784;1628665:29799;2904342:31004;2201... 1 20131129 19 530850018 22058239899 50024147 28 21458:205007542;43307470:5543413;2339128:62147... 1 20140210 20 749507708 19171641742 50018860 28 21458:3602856;1628665:3233941;1628665:3233942;... 1 20140210 21 201088567 38564176352 50013207 50008168 1628665:3233941;1628665:3233942;1628665:323393... 1 20140502 22 469517728 8232924597 211122 38 21458:21782;36786:42781029;13023102:6999219;22... 6 20140502 23 691367866 17712372914 121434042 50014815 21458:49341152;8021059:5525523;6851452:1398669... 1 20140804 24 77193822 35537441586 50006520 50014815 22277:6262384;21458:30992;1628665:3233941;1628... 2 20140804 25 605678021 15502618744 50010555 50008168 25935:31381;1628665:3233941;1628665:3233942;16... 1 20130226 26 47702620 26481508332 121412034 50014815 21458:49341152;11057903:4036007;130475532:7537... 1 20140918 27 763560371 40945285800 50012365 122650008 21458:30992;1628665:3233939;22007:30338;22007:... 1 20150201 28 408028533 35838498718 50012442 50008168 21458:3596449;6811831:3446999;13023209:3446999... 1 20141009 29 53566371 27177784760 121394024 50008168 21458:42090508;1628665:3233941;1628665:3233942... 1 20141009 30 69873877 40133707057 50010555 50008168 21458:30992;25935:31381;1628665:3233941;162866... 1 20141017 31 1609185254 42001753405 121394024 50008168 21458:30992;1628665:3233942;1628665:3233936;16... 1 20141228 32 1746148145 41181827319 50012365 122650008 21458:621749996;13023209:12868;122217803:30916... 1 20141228 33 256475742 39059292616 121452056 50008168 1628665:29784;1628665:29782;122217801:50793479... 1 20140711 34 405194127 15462429573 50007011 50008168 21458:35624651;1633959:7320293;1628665:3233941... 1 20120819 35 938309370 14149079479 50023669 28 21458:4204704;11820090:105550653;11644036:2861... 1 20120819 36 84258337 14653740604 50016704 50022520 21458:3394654;5261331:4377028;1633959:4377028;... 1 20120819 37 14466144 17610665576 50011993 28 21458:104000;21475:137325 1 20130327 38 177724549 14228645401 50018824 38 21475:108284;6933666:96059;33595:16453265;2145... 1 20130327 39 727823869 39674261411 121466023 50008168 21458:14332755;1628665:3233941;1628665:3233942... 2 20140813 40 659020106 40484992676 50011993 28 21458:16162126;13023209:10551667;122218042:605... 1 20140813 41 46277938 40070019945 50006602 50008168 21458:29563;10984217:21985;13023209:3488197;21... 1 20140813 42 827091396 18678458676 50010566 50008168 21458:46906;13023209:158751187;25935:21991;320... 1 20140911 43 18100946 38451267766 121540027 28 21458:215485914;125501489:19689726;11945782:78... 1 20140911 44 725813399 40519533209 50010544 50008168 21458:32270;13023209:669513679;25935:21991;162... 1 20140911 45 1054852159 19063296909 50006235 50008168 1628665:3233941;21475:17106236;21475:17106365;... 2 20140703 46 262519726 19051046285 121398041 28 11666049:40203;21458:3961150;17472269:13302841... 1 20140703 47 87207277 14234909614 121470030 50014815 21458:30992;1628665:3233941;1628665:3233942;16... 1 20140703 48 1053602675 20252281923 50013636 50008168 21458:216724052;1628665:29798;1628665:29796;25... 1 20140220 49 103125167 18426669796 50018438 50014815 21458:46896;1628665:3233941;1628665:3233942;21... 16 20140220 50 886492677 19668429343 50016704 50022520 21458:3662539;5261385:3351834;13023209:3351834... 1 20140628 51 115566151 14778919435 50013187 28 1628665:3233938;1628665:29796;1628665:133527;1... 1 20140113 52 55544814 4917672059 50015727 50014815 21458:4540492;1633959:58840623;7107736:3227806... 4 20131106 53 1714403831 22443564698 50014129 28 21458:57737100;12102318:7282254;11945782:78135... 1 20131106 54 723975586 8096949165 50023591 50022520 1628665:3233941;1628665:29798;1628665:3233938;... 1 20120911 55 66451440 9258781845 50013636 50008168 21458:11580;1628665:3233941;1628665:3233936;16... 1 20120911 56 47342027 14066344263 50013636 50008168 21458:21599;13585028:3416646;1628665:3233942;1... 1 20120911 57 354780072 17851314047 50016704 50022520 21458:3394654;5261331:237777686;1633959:237777... 1 20130725 58 1660751516 12496195786 50024842 50008168 1628665:3233941;25935:21991;13545112:43704;135... 2 20130725 59 1981826945 40793811285 50010538 50008168 21458:37946447;13023209:696649694;25935:21990;... 1 20150108 60 61003275 36738992094 50018831 50014815 21458:21899;7255169:61035386;7368343:7327107;1... 3 20150108 61 848482116 42178787281 50010538 50008168 21458:31340;13023209:25581424;25935:21991;1628... 1 20150119 62 405014302 43130926446 50012777 50014815 21458:46850;1628665:3233939;1628665:92012;1628... 1 20150119 63 806635728 38985185626 121452056 50008168 21458:9398440;1628665:29784;122217801:3265977;... 1 20140615 64 1970876909 20197969079 211122 38 6940834:29865;21458:3270820;1629375:3253542;32... 1 20141017 65 605724983 19747694834 50006520 50014815 21458:30992 12 20141017 66 2148300507 41694440222 50010549 50008168 115931637:36783070;25935:21990;1628665:3233941... 1 20141112 67 818595619 36424612559 50013636 50008168 21458:99466824;13023209:3334185;120198214:3334... 1 20141112 68 442760655 36611607467 50016704 50022520 1628665:3233941;1628665:29790;1628665:3233936;... 1 20141228 69 1026379511 19281156237 50012375 50022520 21458:3731805;1633959:14267607;2397831:1656121... 1 20120905 70 113473924 15486726090 50014250 28 21458:30015090;1633959:43047819;1627584:28619;... 1 20120905 71 117887031 10956228163 50012451 50008168 1628665:3233942;1628665:3233938;1628665:133527... 1 20120905 72 468447138 15550398428 50012442 50008168 1628665:3233936;1628665:29782;1627349:11462;16... 1 20120905 73 348660284 10896577394 50014250 28 1628665:29796;1628665:108579;1627584:11580;116... 1 20130525 74 129642523 23703880889 50012364 122650008 1628665:3233941;1628665:3233942;1628665:323393... 1 20130525 75 1708761610 18560026026 50016030 50008168 21458:30992;1628665:3233941;25935:31381;22019:... 1 20130525 76 908702885 15515470575 50023591 50022520 21458:26309047;1633959:39224289;11697064:31617... 1 20130312 77 151915451 17305821144 211122 38 21458:21782;36786:42781029;6933553:3313169;130... 2 20140104 78 745002413 36815797313 50023645 28 1628665:82340;21475:11488282;21458:56610575;49... 1 20140104 79 1046234868 10799142007 50023591 50022520 1628665:3233941;1628665:3233942;1628665:323393... 1 20121109 80 810362779 16933071954 50010545 50008168 21458:57430303;1633959:2477;23150:45030;25935:... 1 20121109 81 119784861 20796936076 50140021 50008168 21458:120325094;22019:2026;22019:34731;22019:3... 1 20121129 82 277184180 17734463967 50010555 50008168 21458:3482405;8697758:26247633;1633959:3336498... 1 20121129 83 648623529 16590447919 50010555 50008168 25935:31381;1628665:3233938;1628665:82340;1628... 1 20121129 84 1085938456 39009925227 50013207 50008168 21458:21599;1628665:3233941;22121:30905;122217... 1 20140827 85 2214390386 40856437695 50013636 50008168 21458:216724052;36628385:3480253;1628665:32339... 1 20140827 86 346816172 37132432638 50013636 50008168 21458:216291676;13023209:3583497;35044286:1242... 1 20140827 87 654037597 13775864723 50011993 28 21458:116116655;1633959:3276615;1628862:50276;... 1 20130513 88 1667892062 16767168507 50158020 50008168 1628665:131622;13395135:21671;22019:3340598;22... 1 20130513 89 277279277 18024521052 211122 38 21458:33516;33480:3238774;2653417:7353464;3359... 12 20130513 90 1721792494 36154660054 50008845 28 21458:3400531;5653832:7049425;13023209:7049425... 1 20140312 91 56549058 26930668292 50003700 28 21458:3351431;123273479:31526;1628665:3233941;... 1 20140312 92 696527486 37269469522 50011993 28 21458:118564374;13023209:547499553;122218042:3... 1 20140718 93 643153890 17954181229 50003700 28 123273479:41376163;21475:135183931;1628665:323... 1 20140718 94 362976947 39676108316 50012375 50022520 1628665:29796;1628665:29799;21967:29774;21967:... 1 20140718 95 1097191176 39095838474 50015841 28 1628665:3233941;1628665:3233942;1628665:323393... 1 20150203 96 1107237181 18979330679 121382039 50014815 21458:3371752;13023209:282182273;21475:3779036... 1 20150203 97 1090130969 38473204110 50012364 122650008 21458:30992;1628665:29778;1628665:29799;22007:... 1 20140604 98 373997473 24898348642 50012442 50008168 1628665:29778;22009:29800;122217965:3227750;12... 1 20140604 99 59135448 20494104463 50012375 50022520 21458:7902780;13023209:43969797;2397831:165611... 1 20140929 baby.to_csv('al.csv',encoding = 'utf-8',index=False)#保存为utf-8格式,下次读取就得用utf-8. 默认为utf-8,可不写。 索引不写入
3.2 Excel的读写
- 使用read_excel方法读取,结果为dataframe格式
- 读取Excel文件和csv文件参数大致一样,但要考虑工作表sheet页
- 参数较多,可自行控制,但很多时候用默认参数
- 读取Excel时,注意编码,常用编码为utf-8、gbk、gbk2312和gb18030等
- 使用to_excel快速保存为xlsx格式
df1 = pd.read_excel('meal_order_detail.xlsx',encoding = 'utf-8', sheet_name='meal_order_detail1') df1.head(10) detail_id order_id dishes_id logicprn_name parent_class_name dishes_name itemis_add counts amounts cost place_order_time discount_amt discount_reason kick_back add_inprice add_info bar_code picture_file emp_id 0 2956 417 610062 NaN NaN 蒜蓉生蚝 0 1 49 NaN 2016-08-01 11:05:36 NaN NaN NaN 0 NaN NaN caipu/104001.jpg 1442 1 2958 417 609957 NaN NaN 蒙古烤羊腿\r\n\r\n\r\n 0 1 48 NaN 2016-08-01 11:07:07 NaN NaN NaN 0 NaN NaN caipu/202003.jpg 1442 2 2961 417 609950 NaN NaN 大蒜苋菜 0 1 30 NaN 2016-08-01 11:07:40 NaN NaN NaN 0 NaN NaN caipu/303001.jpg 1442 3 2966 417 610038 NaN NaN 芝麻烤紫菜 0 1 25 NaN 2016-08-01 11:11:11 NaN NaN NaN 0 NaN NaN caipu/105002.jpg 1442 4 2968 417 610003 NaN NaN 蒜香包 0 1 13 NaN 2016-08-01 11:11:30 NaN NaN NaN 0 NaN NaN caipu/503002.jpg 1442 5 1899 301 610019 NaN NaN 白斩鸡 0 1 88 NaN 2016-08-01 11:15:57 NaN NaN NaN 0 NaN NaN caipu/204002.jpg 1095 6 1902 301 609991 NaN NaN 香烤牛排\r\n 0 1 55 NaN 2016-08-01 11:19:12 NaN NaN NaN 0 NaN NaN caipu/201001.jpg 1095 7 1906 301 609983 NaN NaN 干锅田鸡 0 1 88 NaN 2016-08-01 11:22:21 NaN NaN NaN 0 NaN NaN caipu/205003.jpg 1095 8 1907 301 609981 NaN NaN 桂圆枸杞鸽子汤 0 1 48 NaN 2016-08-01 11:22:53 NaN NaN NaN 0 NaN NaN caipu/205001.jpg 1095 9 1908 301 610030 NaN NaN 番茄有机花菜 0 1 32 NaN 2016-08-01 11:23:56 NaN NaN NaN 0 NaN NaN caipu/304004.jpg 1095 df1 = pd.read_excel('meal_order_detail.xlsx',encoding = 'utf-8', sheet_name=0) import os os.getcwd() 'D:\\code\\jupyter\\course df1.to_excel('asdf.xlsx',index = False, sheet_name = 'one')#工作簿名称sheet_name
3.3 MySQL数据库交互
- 使用sqlalchemy建立连接
- 需要知道数据库的相关参数,如数据库IP地址、用户名和密码等
- 通过pandas中read_sql函数读入,读取完以后是dataframe格式
- 通过dataframe的to_sql方法保存
import pandas as pd import pymysql from sqlalchemy import create_engine conn = create_engine('mysql+pymysql://root:123456@localhost:3306/meeting') sql = 'select * from employee' df1 = pd.read_sql(sql,con = conn) df1.head() employeeid employeename username phone email status departmentid password role 0 8 王晓华 wangxh 13671075406 wang@qq.com 1 1 1 1 1 9 林耀坤 li56 13671075406 yang@qq.com 1 2 1 2 2 10 熊杰文 xiongjw 134555555 xiong@qq.com 1 3 1 2 3 11 王敏 wangmin 1324554321 wangm@qq.com 0 4 1 2 4 12 林耀坤 linyk 1547896765 kun@qq.com 1 7 1 2 def query(table): host = 'localhost' user = 'root' password = '123456' database = 'meeting' port = 3306 conn = create_engine('mysql+pymysql://{}:{}@{}:{}/{}'.format(user,password,host,port,database)) sql = 'select * from '+ table result = pd.read_sql(sql,con = conn) return result df2 = query('meetingroom') df2 roomid roomnum roomname capacity status description 0 5 101 第一会议室 15 0 公共会议室 1 6 102 第二会议室 5 0 管理部门会议室 2 7 103 第三会议室 12 0 市场部专用会议室 3 8 401 第四会议室 15 0 公共会议室 4 9 201 第五会议室 15 0 最大会议室 5 10 601 第六会议室 12 0 需要提前三天预定 import os os.chdir('D:\code\jupyter\course\代码和数据') df = pd.read_csv('baby_trade_history.csv') try: df.to_sql('testdf',con = conn, index = False, if_exists='replace') except: print('error')
欢迎阅读数据清洗系列文章:python数据清洗工具、方法、过程整理归纳
- 一、数据清洗之常用工具——numpy,pandas
- 二、数据清洗之文件读写——读取csv、Excel和MySQL数据
- 三、数据清洗之数据表操作——数据筛选、增加删除、查找修改、数据整理和层次化索引
- 四、数据清洗之数据转换——日期格式数据处理、高阶函数数据处理、字符串数据处理
- 五、数据清洗之数据统计——数据分组运算、聚合函数使用、分组对象和apply函数、透视图与交叉表
- 六、数据清洗之数据预处理(一)——重复值处理、缺失值处理
- 七、数据清洗之数据预处理(二)——异常值处理、数据离散化处理
- 八、总结
- 点赞
- 收藏
- 分享
- 文章举报
相关文章推荐
- python数据清洗工具、方法、过程整理归纳(五、数据清洗之数据统计——数据分组运算、聚合函数使用、分组对象和apply函数、透视图与交叉表)
- python数据清洗工具、方法、过程整理归纳(四、数据清洗之数据转换——日期格式数据处理、高阶函数数据处理、字符串数据处理)
- python数据清洗工具、方法、过程整理归纳(八、总结)
- python数据清洗工具、方法、过程整理归纳(七、数据清洗之数据预处理(二)——异常值处理、数据离散化处理)
- python数据清洗工具、方法、过程整理归纳(三、数据清洗之数据表操作——数据筛选、增加删除、查找修改、数据整理和层次化索引)
- python数据清洗工具、方法、过程整理归纳(六、数据清洗之数据预处理(一)——重复值处理、缺失值处理)
- Python导出数据到Excel可读取的CSV文件的方法
- Python导出数据到Excel可读取的CSV文件的方法
- 学习笔记(02):Python数据清洗实战-Excel文件读写
- 学习笔记(04):Python数据清洗实战-Excel文件读写
- 学习笔记(05):Python数据清洗实战-csv文件读写
- 学习笔记(01):Python数据清洗实战-Excel文件读写
- Python3读取Excel数据存入MySQL的方法
- 学习笔记(06):Python数据清洗实战-Excel文件读写
- python数据分析之(4)读写数据文件CSV,EXCEL等
- python读出mysql数据写出到csv文件中[整理]
- 学习笔记(03):Python数据清洗实战-csv文件读写
- 学习笔记(07):Python数据清洗实战-Excel文件读写
- Python使用pandas & pymysql读取MySQL数据到csv文件中
- 使用python读取csv文件,并将数据更新至mysql