您的位置:首页 > Web前端 > JavaScript

Amber is an implementation of the Smalltalk language that runs on top of the JavaScript runtime.

2012-04-23 00:05 435 查看
Amber is an implementation of the Smalltalk-80 language. It is designed to make client-side development faster and easier. It allows developers to write client-side heavy web applications in Smalltalk.

Amber includes an integrated development environment with a class browser, workspace, transcript, object inspector and debugger.

Amber is written in itself, including the parser and compiler, and compiles into efficient JavaScript, mapping one-to-one with the JS equivalent.

Try a right now!

You can join the Google Group or the #amber-lang IRC channel on freenode.

Tweet

webscraping - Python library for web scraping - Google Project Hosting

Overview

The webscraping library aims to make web scraping easier.

All code is pure Python and has been run across multiple Linux servers, Windows machines, as well as Google App Engine.

Examples

common

>>> from webscraping import common
>>> common.remove_tags('hello <b>world</b>!')
'hello world!'

>>> common.extract_domain('http://www.google.com.au/tos.html')
'google.com.au'

>>> common.unescape('<hello & world>')
'<hello & world>'

>>> common.extract_emails('hello richard AT sitescraper DOT net world')
['richard@sitescraper.net']

>>> cj = common.firefox_cookie()
>>> opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
>>> html = opener.open(url).read() # use current firefox cookies to access url

download

>>> from webscraping import download
>>> D = download.Download()

>>> # crawl given domain
>>> domain = ...
>>> for url in D.crawl(domain):
>>>    html = D.cache" target=_blank>[/code][title2]pdict[url=#pdict]
>>> from webscraping import pdict
>>> cache = pdict.PersistentDict(CACHE_FILE)
>>> cache['a'] = range(5) # pickle stored in sqlite database
>>> 'a' in cache
True
>>> cache['a']
[0, 1, 2, 3, 4]
(see a further example [url=http://blog.sitescraper.net/2010/07/caching-crawled-webpages.html]here)

xpath

>>> from webscraping import xpath
>>> html = urllib2.urlopen(url).read()
>>> xpath.parse(html, '/html/body/ul[2]/li[@class="info"]/div[1]')
['div content']
>>> xpath.parse(html, '/html/body/ul[2]/li[@class="info"]/a/@href')
['url1', 'url2', 'url3']
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐