您的位置:首页 > 编程语言 > Python开发

Python爬虫 - 使用requests和re模块爬取慕课网课程信息

2018-01-26 01:44 771 查看

分析

使用requests和re模块爬取慕课网 “免费课程/数据库/“ 分类下的课程信息

代码实现

# !/usr/bin/env python
# -*- coding:utf-8 -*-

import re
import requests
import os

num=0
def crawl(url):
global num
base_url='https://www.imooc.com'

req_headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
}

path = '../../data'
if not os.path.exists(path):
os.makedirs(path)
file_path=os.path.join(path,'慕课网课程信息.txt')

resp=requests.get(url=url,headers=req_headers)
with open(file_path,'w') as f:
if resp.status_code==requests.codes.ok:
html=resp.text
# 提取所有课程所在区域的div源码
pattern_course_container=r'<div class="course-card-container">.*?</div>.*?</div>.*?</a>.*?</div>'
source_course_container_list=re.findall(pattern_course_container,html,re.DOTALL)

for container in source_course_container_list:
# 提取课程名称
title_course=re.search(r'<h3 class="course-card-name">(.*?)</h3>',container,re.DOTALL).group(1).strip()
# 提取课程图片链接
image_link='https:'+re.search(r'<img.*?src="(.*?)"',container,re.DOTALL).group(1)
# 提取课程播放页面链接
link_play_page=base_url+re.search(r'href="(/learn/\d+)"',container).group(1)

# 将爬取到的信息保存到txt文件中
f.write('{}\n{}\n{}\n\n'.format(title_course,image_link,link_play_page))

num+=1
print('{} {}'.format(num,title_course))

# 提取下一页的链接
pattern_next_page=r'<a href="(/course/list\?c=data&page=\d+)">下一页</a>'
link_next_page=re.search(pattern_next_page,html)
# 如果存在下一页,递归爬取下一页的信息
if link_next_page:
link_next_page=link_next_page.group(1)
crawl(base_url+link_next_page)

if __name__ == '__main__':
start_url='https://www.imooc.com/course/list?c=data&page=1'
crawl(start_url)


运行结果

1 MongoDB复制集—复制集监控

2 MongoDB复制集—复制集安全

3 MySQL5.7复制功能实战



35 与MySQL的零距离接触

36 数据库设计那些事

MongoDB复制集—复制集监控 https://img3.mukewang.com/56a08eed0001fdf319201080-240-135.jpg https://www.imooc.com/learn/595

MongoDB复制集—复制集安全 https://img.mukewang.com/56d7e6a100016a9b06000338-240-135.jpg https://www.imooc.com/learn/614

MySQL5.7复制功能实战 https://img2.mukewang.com/5707653b0001e67f06000338-240-135.jpg https://www.imooc.com/learn/589

MongoDB复制集—容灾核心选举 https://img2.mukewang.com/56d640a30001fc0d06000338-240-135.jpg https://www.imooc.com/learn/594

MongoDB在线讲座系列之MongoDB DBA的日常巡检及执行计划分析 https://img2.mukewang.com/57466a7b0001a49806000338-240-135.jpg https://www.imooc.com/learn/575

MongoDB复制集—复制集的同步机制 https://img4.mukewang.com/569762af00017b3806000338-240-135.jpg https://www.imooc.com/learn/582

2015 Oracle技术嘉年华 https://img1.mukewang.com/5679ffdd0001cb5806000338-240-135.jpg https://www.imooc.com/learn/572

MongoDB复制集—复制集的相关特性 https://img.mukewang.com/5680dbb8000173d406000338-240-135.jpg https://www.imooc.com/learn/578

MongoDB Day 2015 深圳 https://img4.mukewang.com/56779555000160d106000338-240-135.jpg https://www.imooc.com/learn/562

MongoDB 在线讲座系列之MongoDB数据库备份策略/Ops Manager https://img1.mukewang.com/57466aad0001186f06000338-240-135.jpg https://www.imooc.com/learn/552

MongoDB复制集—快速搭建复制集 https://img3.mukewang.com/56261c250001612e06000338-240-135.jpg https://www.imooc.com/learn/528

MongoDB复制集技术内幕:工作原理及新版本改进方向 https://img3.mukewang.com/57466ac60001a0af06000338-240-135.jpg https://www.imooc.com/learn/534

MySQL5.7版本新特性 https://img3.mukewang.com/572afe280001c13406000338-240-135.jpg https://www.imooc.com/learn/533

MongoDB复制集—认识复制集 https://img4.mukewang.com/562052b70001e9c106000338-240-135.jpg https://www.imooc.com/learn/490

MongoDB 在线讲座之如何测试、调整及监控MongoDB性能 https://img.mukewang.com/561ccbad0001391f06000338-240-135.jpg https://www.imooc.com/learn/521

MongoDB集群之分片技术应用 https://img2.mukewang.com/55f8d5080001293c06000338-240-135.jpg https://www.imooc.com/learn/501

MySQL开发技巧(三) https://img2.mukewang.com/570766c80001f1ee06000338-240-135.jpg https://www.imooc.com/learn/449

Oracle触发器 https://img2.mukewang.com/5704cd5f0001207a06000338-240-135.jpg https://www.imooc.com/learn/414

Oracle高级查询 https://img1.mukewang.com/5704cda10001b17506000338-240-135.jpg https://www.imooc.com/learn/437

SQL Server基础--T-SQL语句 https://img4.mukewang.com/5704ace70001dd1806000338-240-135.jpg https://www.imooc.com/learn/435

MySQL开发技巧(二) https://img3.mukewang.com/557fff240001fbfb06000338-240-135.jpg https://www.imooc.com/learn/427

Oracle数据库开发利器之函数 https://img4.mukewang.com/5704cdc100019ee206000338-240-135.jpg https://www.imooc.com/learn/423

MySQL开发技巧(一) https://img.mukewang.com/555e9cdc0001ee9606000338-240-135.jpg https://www.imooc.com/learn/398

Oracle存储过程和自定义函数 https://img3.mukewang.com/5704cd7d0001d01206000338-240-135.jpg https://www.imooc.com/learn/370

Oracle 12c 在OEL6上的安装 https://img2.mukewang.com/5704ccf30001775a06000338-240-135.jpg https://www.imooc.com/learn/369
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐