您的位置:首页 > 其它

几个名词(robots.txt/POST/Phrase chunking)

2008-02-27 13:25 106 查看
导读:最近在走流程的时候遇到一些名词,之前并没有接触过,现在将一部分收集起来以便以后查阅。

1、 robots.txt
robots.txt 是一个纯文本文件,通过在这个文件中声明该网站中不想被robots访问的部分,这样,该网站的部分或全部内容就可以不被搜索引擎收录了,或者指定搜索引擎只收录指定的内容。当一个搜索机器人访问一个站点时,它会首先检查该站点根目录下是否存在robots.txt,如果找到,搜索机器人就会按照该文件中的内容来确定访问的范围,如果该文件不存在,那么搜索机器人就沿着链接抓取。

It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:

User-agent: *

Disallow: /


The "User-agent: *" means this section applies to all robots.
The "Disallow: /" tells the robot that it should not visit any pages on the site.

There are two important considerations when using /robots.txt:

robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.

the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

So don't try to use /robots.txt to hide information.

材料来源:http://www.robotstxt.org/robotstxt.html

2、Part-of-speech tagging
Part-of-speech tagging (POS tagging or POST), also called grammatical tagging, is the process of marking up the words in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e., relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught school-age children, in the identification of words as nouns, verbs, adjectives, adverbs, etc. Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags.

材料来源:en.wikipedia.org/wiki/Part-of-speech_tagging

3、Phrase chunking
Phrase chunking is a natural language process that separates and segments sentences into its subconstituents, i.e. noun, verb and prepositional phrases.

材料来源:en.wikipedia.org/wiki/Phrase_chunking
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: