Python.Following Links in HTML Using BeautifulSoup
2016-07-22 20:01
1186 查看
The program will use urllib to
read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name
you find.
Find the
link at position 18 (the
first name is 1). Follow that link. Repeat this process 7 times.
The answer is the last name that you retrieve.
Hint: The first character of the name of the last page that you will load is: M
HTML地址:http://python-data.dr-chuck.net/known_by_Cleo.html
Python源码:
<span style="font-size:12px;">import urllib
from bs4 import BeautifulSoup
url = raw_input('Enter - ')
count = int(raw_input('Enter count:'))
position = int(raw_input('Enter position:'))
for tag in xrange(count):
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html,'html.parser')
tags = soup.findAll('a')
url = tags[position-1].get('href', None)
print url</span>
运行结果:
read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name
you find.
Find the
link at position 18 (the
first name is 1). Follow that link. Repeat this process 7 times.
The answer is the last name that you retrieve.
Hint: The first character of the name of the last page that you will load is: M
HTML地址:http://python-data.dr-chuck.net/known_by_Cleo.html
Python源码:
<span style="font-size:12px;">import urllib
from bs4 import BeautifulSoup
url = raw_input('Enter - ')
count = int(raw_input('Enter count:'))
position = int(raw_input('Enter position:'))
for tag in xrange(count):
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html,'html.parser')
tags = soup.findAll('a')
url = tags[position-1].get('href', None)
print url</span>
运行结果:
Enter - http://python-data.dr-chuck.net/known_by_Cleo.html Enter count:7 Enter position:18 http://python-data.dr-chuck.net/known_by_Mirrin.html[/code]
相关文章推荐
- python使用urllib2抓取防爬取链接
- python-pickle/cPickle/glob/tarfile
- 转载:python使用urllib2抓取防爬取链接
- Numpy 常用函数及读写操作
- Python 关于正负无穷float(‘inf’)的一些用法
- Python之urlparse模块
- python中ascii码和字符的转换
- 利用python如何刷访问量
- python 之编码转换 unicode, utf-8, utf-16, GBK
- python 多线程爬虫
- 记一次安装Ipython的流程
- Python sort系列
- python排序函数sort()、sorted()、argsort()
- 文本处理(python)
- Scrapy:Python的爬虫框架----原理介绍
- python——异常处理
- Python中向excel中写入数据
- python开发_platform_获取操作系统详细信息工具
- python 正则表达式 之re.findall
- Python 学习笔记三 操作PostgreSQL数据库