win7下MyEclipse装Nutch1.7
2013-12-21 16:17
591 查看
一、下载安装包
下载 apache-nutch-1.7-bin.zip 和 apache-nutch-1.7-src.zip 2个包
二、基本环境搭建
1、解压apache-nutch-1.7-src.zip包到eclipse的工作目录下面,如:D:\Workspaces\MyEclipse 8.5\test\apache-nutch-1.7
2、解压apache-nutch-1.7-bin.zip包,将其lib文件夹复制到D:\Workspaces\MyEclipse 8.5\test\apache-nutch-1.7下面(因为nutchsrc包下缺少部分lib包)。
3、将nutch导入eclipse,首先点击工具栏“File”-“new”-“other”-“Java project from an existing Ant buildfile”,
,
选择next,在Ant buildfile中选中nutch中的build.xml, Project name与Ant buildfile中的build.xml前面的nutch文件夹保持一致,并勾选“link to the buildfile in the file system”。
点击“finish”后,提示nutch的build文件下缺少lib包,此时我们将nutch下面的lib文件夹复制到bulid文件下,重新执行nutch导入eclipse中步骤即可。
eclipse中已经有nutch的文件了,我们选中conf,右键“build path”---"use as source folder"
此时,发现nutch项目中存在错误“x”,选中项目后鼠标右键“properties”,将项目编码改为utf8点击“apply”,至此nutch已经导入eclipse。
三、Eclipse中配置Nutch
1、找到Crawl.java ,右键“run as ” ----"java application ", 在console中显示 “Usage: Crawl <urlDir> -solr <solrURL> [-dir d] [-threads n] [-depth i] [-topN N]”。
在nutch文件夹下新建文件夹“urls”,在urls文件夹下面建立url.txt,中文本中写入要抓取的网站地址。
此时,Crawl.java类中,鼠标右键“run as ” ----"run configuration",在"programm argments"输入“Crawl urls -dir data1221 -threads 2 -depth 3 -topN 5”,在“vm argment”中输入“-Xms512m -Xmx800m”,点击“apply”。
点击“run”。报错,信息如下:
这个是hadoop的包引起的,需要修改nutch下面lib中的hadoop-core-1.2.0.jar,需要修改hadoop中的FileUtil.java,注释checkReturnValue方法的内容,如图。
将hadoop重新打包。因为我不会打包操作,我是直接用解压缩的方式打开hadoop-core-1.2.0.jar,删除FileUtil的class文件,引入修改后的class文件
此时需要重新build文件,选中“build.xml”,右键 “run as” ----"Ant Build..."
选中“jar”、“job”、“runtime[default]”选项,点击“apply”,点击“run”,需要等待一段时间,才能编译好,第一次编译需要下载部分文件用于编译。
编译后,将nutch文件夹下面的build文件夹下的“plugins”文件夹、“apache-nutch-1.7.jar”、“apache-nutch-1.7.job”复制到nutch文件夹下,在eclipse中刷新nutch项目,如下图。
重新运行Crawl类,运行报错。
提示没有 'http.agent.name'属性,我们找到nutch下面的conf文件夹下面的nutch-default.xml搜索'http.agent.name',发现其中的value值为空。
我们复制上面这段代码到nutch-site.xml中,填写value值。
注意,一般我们不改动nutch-default.xml的默认配置,而是修改nutch-site.xml的配置,覆盖nutch-default.xml的配置。
重新运行Crawl类,报错信息如下:
上面的报错信息是因为没有找到对应的parse_data文件,由于之前运行过程中报错,但当时已经生成了segments下的部分信息导致的。
此时我们发现nutch下面已经生成了data1221文件,我们删除“data1221”这个文件,重新跑Crawl类,运行正常。
发现nutch下面重新生成了“data1221”文件夹,且文件夹下面生成了“crawldb”、“linkdb”、”segments“文件夹。
需要补充的是,我们在这个过程没有改动nutch抓取网页的配置信息,因为nutch下面conf文件下”automaton-urlfilter.txt“默认允许抓取网址如下:
至此,nutch抓取网页演示完毕。
附eclipse的console中的输出信息如下:
下载 apache-nutch-1.7-bin.zip 和 apache-nutch-1.7-src.zip 2个包
二、基本环境搭建
1、解压apache-nutch-1.7-src.zip包到eclipse的工作目录下面,如:D:\Workspaces\MyEclipse 8.5\test\apache-nutch-1.7
2、解压apache-nutch-1.7-bin.zip包,将其lib文件夹复制到D:\Workspaces\MyEclipse 8.5\test\apache-nutch-1.7下面(因为nutchsrc包下缺少部分lib包)。
3、将nutch导入eclipse,首先点击工具栏“File”-“new”-“other”-“Java project from an existing Ant buildfile”,
,
选择next,在Ant buildfile中选中nutch中的build.xml, Project name与Ant buildfile中的build.xml前面的nutch文件夹保持一致,并勾选“link to the buildfile in the file system”。
点击“finish”后,提示nutch的build文件下缺少lib包,此时我们将nutch下面的lib文件夹复制到bulid文件下,重新执行nutch导入eclipse中步骤即可。
eclipse中已经有nutch的文件了,我们选中conf,右键“build path”---"use as source folder"
此时,发现nutch项目中存在错误“x”,选中项目后鼠标右键“properties”,将项目编码改为utf8点击“apply”,至此nutch已经导入eclipse。
三、Eclipse中配置Nutch
1、找到Crawl.java ,右键“run as ” ----"java application ", 在console中显示 “Usage: Crawl <urlDir> -solr <solrURL> [-dir d] [-threads n] [-depth i] [-topN N]”。
在nutch文件夹下新建文件夹“urls”,在urls文件夹下面建立url.txt,中文本中写入要抓取的网站地址。
此时,Crawl.java类中,鼠标右键“run as ” ----"run configuration",在"programm argments"输入“Crawl urls -dir data1221 -threads 2 -depth 3 -topN 5”,在“vm argment”中输入“-Xms512m -Xmx800m”,点击“apply”。
点击“run”。报错,信息如下:
Injector: Converting injected urls to crawl db entries. Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-\mapred\staging\1623868107\.staging to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.apache.nutch.crawl.Injector.inject(Injector.java:281) at org.apache.nutch.crawl.Crawl.run(Crawl.java:132) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
这个是hadoop的包引起的,需要修改nutch下面lib中的hadoop-core-1.2.0.jar,需要修改hadoop中的FileUtil.java,注释checkReturnValue方法的内容,如图。
将hadoop重新打包。因为我不会打包操作,我是直接用解压缩的方式打开hadoop-core-1.2.0.jar,删除FileUtil的class文件,引入修改后的class文件
此时需要重新build文件,选中“build.xml”,右键 “run as” ----"Ant Build..."
选中“jar”、“job”、“runtime[default]”选项,点击“apply”,点击“run”,需要等待一段时间,才能编译好,第一次编译需要下载部分文件用于编译。
编译后,将nutch文件夹下面的build文件夹下的“plugins”文件夹、“apache-nutch-1.7.jar”、“apache-nutch-1.7.job”复制到nutch文件夹下,在eclipse中刷新nutch项目,如下图。
重新运行Crawl类,运行报错。
Fetcher: No agents listed in 'http.agent.name' property. Exception in thread "main" java.lang.Ille 4000 galArgumentException: Fetcher: No agents listed in 'http.agent.name' property. at org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1397) at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1282) at org.apache.nutch.crawl.Crawl.run(Crawl.java:141) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
提示没有 'http.agent.name'属性,我们找到nutch下面的conf文件夹下面的nutch-default.xml搜索'http.agent.name',发现其中的value值为空。
<property> <name>http.agent.name</name> <value></value> <description>HTTP 'User-Agent' request header. MUST NOT be empty - please set this to a single word uniquely related to your organization. NOTE: You should also check other related properties: http.robots.agents http.agent.description http.agent.url http.agent.email http.agent.version and set their values appropriately. </description> </property>
我们复制上面这段代码到nutch-site.xml中,填写value值。
注意,一般我们不改动nutch-default.xml的默认配置,而是修改nutch-site.xml的配置,覆盖nutch-default.xml的配置。
重新运行Crawl类,报错信息如下:
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/D:/Workspaces/MyEclipse 8.5/test/apache-nutch-1.7/data1221/segments/20131221154928/parse_data at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1081) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1073) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:180) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:151) at org.apache.nutch.crawl.Crawl.run(Crawl.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
上面的报错信息是因为没有找到对应的parse_data文件,由于之前运行过程中报错,但当时已经生成了segments下的部分信息导致的。
此时我们发现nutch下面已经生成了data1221文件,我们删除“data1221”这个文件,重新跑Crawl类,运行正常。
发现nutch下面重新生成了“data1221”文件夹,且文件夹下面生成了“crawldb”、“linkdb”、”segments“文件夹。
需要补充的是,我们在这个过程没有改动nutch抓取网页的配置信息,因为nutch下面conf文件下”automaton-urlfilter.txt“默认允许抓取网址如下:
# accept anything else +.*
至此,nutch抓取网页演示完毕。
附eclipse的console中的输出信息如下:
solrUrl is not set, indexing will be skipped... crawl started in: data1221 rootUrlDir = urls threads = 2 depth = 3 solrUrl=null topN = 5 Injector: starting at 2013-12-21 16:05:08 Injector: crawlDb: data1221/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: total number of urls rejected by filters: 0 Injector: total number of urls injected after normalization and filtering: 1 Injector: Merging injected urls into crawl db. Injector: finished at 2013-12-21 16:05:11, elapsed: 00:00:02 Generator: starting at 2013-12-21 16:05:11 Generator: Selecting best-scoring urls due for fetch. Generator: filtering: true Generator: normalizing: true Generator: topN: 5 Generator: jobtracker is 'local', generating exactly one partition. Generator: Partitioning selected urls for politeness. Generator: segment: data1221/segments/20131221160513 Generator: finished at 2013-12-21 16:05:14, elapsed: 00:00:03 Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. Fetcher: starting at 2013-12-21 16:05:14 Fetcher: segment: data1221/segments/20131221160513 Using queue mode : byHost Fetcher: threads: 2 Fetcher: time-out divisor: 2 QueueFeeder finished: total 1 records + hit by time limit :0 Using queue mode : byHost Using queue mode : byHost Fetcher: throughput threshold: -1 Fetcher: throughput threshold retries: 5 fetching http://www.163.com/ (queue crawl delay=5000ms) -finishing thread FetcherThread, activeThreads=1 -finishing thread FetcherThread, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 -activeThreads=0 Fetcher: finished at 2013-12-21 16:05:16, elapsed: 00:00:02 ParseSegment: starting at 2013-12-21 16:05:16 ParseSegment: segment: data1221/segments/20131221160513 Parsed (8ms):http://www.163.com/ ParseSegment: finished at 2013-12-21 16:05:17, elapsed: 00:00:01 CrawlDb update: starting at 2013-12-21 16:05:17 CrawlDb update: db: data1221/crawldb CrawlDb update: segments: [data1221/segments/20131221160513] CrawlDb update: additions allowed: true CrawlDb update: URL normalizing: true CrawlDb update: URL filtering: true CrawlDb update: 404 purging: false CrawlDb update: Merging segment data into db. CrawlDb update: finished at 2013-12-21 16:05:18, elapsed: 00:00:01 Generator: starting at 2013-12-21 16:05:18 Generator: Selecting best-scoring urls due for fetch. Generator: filtering: true Generator: normalizing: true Generator: topN: 5 Generator: jobtracker is 'local', generating exactly one partition. Generator: Partitioning selected urls for politeness. Generator: segment: data1221/segments/20131221160520 Generator: finished at 2013-12-21 16:05:21, elapsed: 00:00:03 Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. Fetcher: starting at 2013-12-21 16:05:21 Fetcher: segment: data1221/segments/20131221160520 Using queue mode : byHost Fetcher: threads: 2 Fetcher: time-out divisor: 2 QueueFeeder finished: total 2 records + hit by time limit :0 Using queue mode : byHost fetching http://m.163.com/ (queue crawl delay=5000ms) Using queue mode : byHost Fetcher: throughput threshold: -1 Fetcher: throughput threshold retries: 5 -activeThreads=2, spinWaiting=2, fetchQueues.totalSize=1 * queue: http://m.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613126729 now = 1387613122697 0. http://m.163.com/newsapp/ -activeThreads=2, spinWaiting=2, fetchQueues.totalSize=1 * queue: http://m.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613126729 now = 1387613123697 0. http://m.163.com/newsapp/ -activeThreads=2, spinWaiting=2, fetchQueues.totalSize=1 * queue: http://m.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613126729 now = 1387613124697 0. http://m.163.com/newsapp/ -activeThreads=2, spinWaiting=2, fetchQueues.totalSize=1 * queue: http://m.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613126729 now = 1387613125698 0. http://m.163.com/newsapp/ -activeThreads=2, spinWaiting=2, fetchQueues.totalSize=1 * queue: http://m.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613126729 now = 1387613126698 0. http://m.163.com/newsapp/ fetching http://m.163.com/newsapp/ (queue crawl delay=5000ms) -finishing thread FetcherThread, activeThreads=1 -finishing thread FetcherThread, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 -activeThreads=0 Fetcher: finished at 2013-12-21 16:05:28, elapsed: 00:00:07 ParseSegment: starting at 2013-12-21 16:05:28 ParseSegment: segment: data1221/segments/20131221160520 Parsed (1ms):http://m.163.com/newsapp/ ParseSegment: finished at 2013-12-21 16:05:29, elapsed: 00:00:01 CrawlDb update: starting at 2013-12-21 16:05:29 CrawlDb update: db: data1221/crawldb CrawlDb update: segments: [data1221/segments/20131221160520] CrawlDb update: additions allowed: true CrawlDb update: URL normalizing: true CrawlDb update: URL filtering: true CrawlDb update: 404 purging: false CrawlDb update: Merging segment data into db. CrawlDb update: finished at 2013-12-21 16:05:30, elapsed: 00:00:01 Generator: starting at 2013-12-21 16:05:30 Generator: Selecting best-scoring urls due for fetch. Generator: filtering: true Generator: normalizing: true Generator: topN: 5 Generator: jobtracker is 'local', generating exactly one partition. Generator: Partitioning selected urls for politeness. Generator: segment: data1221/segments/20131221160532 Generator: finished at 2013-12-21 16:05:33, elapsed: 00:00:03 Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. Fetcher: starting at 2013-12-21 16:05:33 Fetcher: segment: data1221/segments/20131221160532 Using queue mode : byHost Fetcher: threads: 2 Fetcher: time-out divisor: 2 QueueFeeder finished: total 5 records + hit by time limit :0 Using queue mode : byHost Using queue mode : byHost fetching http://digi.163.com/13/0719/10/9450M2MJ00162659.html (queue crawl delay=5000ms) Fetcher: throughput threshold: -1 Fetcher: throughput threshold retries: 5 fetching http://help.3g.163.com/13/1216/17/9G81M68M0096400O.html (queue crawl delay=5000ms) fetching http://m.163.com/newsapp/download.html (queue crawl delay=5000ms) -activeThreads=2, spinWaiting=2, fetchQueues.totalSize=2 * queue: http://help.3g.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613139083 now = 1387613135003 0. http://help.3g.163.com/13/1127/15/9EMS17RN0096400O.html * queue: http://m.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613139117 now = 1387613135003 0. http://m.163.com/newsapp/zhinan.html -activeThreads=2, spinWaiting=2, fetchQueues.totalSize=2 * queue: http://help.3g.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613139083 now = 1387613136012 0. http://help.3g.163.com/13/1127/15/9EMS17RN0096400O.html * queue: http://m.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613139117 now = 1387613136012 0. http://m.163.com/newsapp/zhinan.html -activeThreads=2, spinWaiting=2, fetchQueues.totalSize=2 * queue: http://help.3g.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613139083 now = 1387613137013 0. http://help.3g.163.com/13/1127/15/9EMS17RN0096400O.html * queue: http://m.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613139117 now = 1387613137013 0. http://m.163.com/newsapp/zhinan.html -activeThreads=2, spinWaiting=2, fetchQueues.totalSize=2 * queue: http://help.3g.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613139083 now = 1387613138014 0. http://help.3g.163.com/13/1127/15/9EMS17RN0096400O.html * queue: http://m.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613139117 now = 1387613138014 0. http://m.163.com/newsapp/zhinan.html -activeThreads=2, spinWaiting=2, fetchQueues.totalSize=2 * queue: http://help.3g.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613139083 now = 1387613139015 0. http://help.3g.163.com/13/1127/15/9EMS17RN0096400O.html * queue: http://m.163.com maxThreads = 1 inProgress = 0 crawlDelay = 5000 minCrawlDelay = 0 nextFetchTime = 1387613139117 now = 1387613139015 0. http://m.163.com/newsapp/zhinan.html fetching http://help.3g.163.com/13/1127/15/9EMS17RN0096400O.html (queue crawl delay=5000ms) fetching http://m.163.com/newsapp/zhinan.html (queue crawl delay=5000ms) -finishing thread FetcherThread, activeThreads=1 -finishing thread FetcherThread, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 -activeThreads=0 Fetcher: finished at 2013-12-21 16:05:40, elapsed: 00:00:07 ParseSegment: starting at 2013-12-21 16:05:40 ParseSegment: segment: data1221/segments/20131221160532 Parsed (0ms):http://digi.163.com/13/0719/10/9450M2MJ00162659.html Parsed (0ms):http://help.3g.163.com/13/1127/15/9EMS17RN0096400O.html Parsed (0ms):http://help.3g.163.com/13/1216/17/9G81M68M0096400O.html Parsed (0ms):http://m.163.com/newsapp/download.html Parsed (0ms):http://m.163.com/newsapp/zhinan.html ParseSegment: finished at 2013-12-21 16:05:42, elapsed: 00:00:01 CrawlDb update: starting at 2013-12-21 16:05:42 CrawlDb update: db: data1221/crawldb CrawlDb update: segments: [data1221/segments/20131221160532] CrawlDb update: additions allowed: true CrawlDb update: URL normalizing: true CrawlDb update: URL filtering: true CrawlDb update: 404 purging: false CrawlDb update: Merging segment data into db. CrawlDb update: finished at 2013-12-21 16:05:44, elapsed: 00:00:02 LinkDb: starting at 2013-12-21 16:05:44 LinkDb: linkdb: data1221/linkdb LinkDb: URL normalize: true LinkDb: URL filter: true LinkDb: internal links will be ignored. LinkDb: adding segment: file:/D:/Workspaces/MyEclipse 8.5/test/apache-nutch-1.7/data1221/segments/20131221160513 LinkDb: adding segment: file:/D:/Workspaces/MyEclipse 8.5/test/apache-nutch-1.7/data1221/segments/20131221160520 LinkDb: adding segment: file:/D:/Workspaces/MyEclipse 8.5/test/apache-nutch-1.7/data1221/segments/20131221160532 LinkDb: finished at 2013-12-21 16:05:45, elapsed: 00:00:01 crawl finished: data1221
相关文章推荐
- win7 64bit系统 搭建jdk1.7、Myeclipse10.、tomcat7.0
- nutch中集成IK分词的步骤——基于myeclipse管理的nutch项目
- Nutch1.7学习笔记:基本环境搭建及使用
- MyEclipse 9 在Win7 x64下的破解方法
- nutch-1.7-学习笔记(1)-org.apache.nutch.crawl-ToolRunner
- CentOS 6.5+Nutch 1.7+Solr 4.7+IK 2012
- win7下myeclipse SVN更换账号
- Nutch2.2.1在MyEclipse中的安装(window7环境)
- Win7下 MyEclipse SVN 更改帐号(同样适用于Ankh SVN插件)
- 《Nutch笔记》Nutch-1.7+solr-4.7集成
- hadoop2.6+win7 +myeclipse
- nutch-1.7-学习笔记(1)-org.apache.nutch.crawl.Injector.java-TreeMap
- MyEclipse 8.6 、jdk1.7 配置 Tomcat7
- CentOS 6.4 中安装部署 Nutch 1.7
- nutch-1.7-学习笔记(1)-org.apache.nutch.crawl.Injector.java-Configuration
- Java1.7环境变量配置,分别在WIN7和ubuntu的环境下
- nutch windows win7 配置 错误 java.lang.NoClassDefFoundError
- Win7、Win8系统下关于JDK1.7的安装与环境变量的配置(图例)
- win7 安装完jdk7后,再安装jdk8出现的问题 has value '1.8', but '1.7' is required.
- Nutch1.7源码再研究之---10 Fetch流程分析(续)