您的位置:首页 > 大数据

大数据爬虫基础(四)MAVEN的安装配置和使用(下)--JAVA简单爬虫

2016-06-10 16:26 721 查看
eclipse maven

环境:

windows 10pro x64 jdk1.8 eclipse mars

1、安装设置maven插件
window->preferences->Installations->Add

具体见下边的参考链接一

2、新建maven project

File->new project->maven project->maven-archetype-quickstart(默认选中)

groupId:com.mvntest

artifactId:crawler

finish

3、创建爬虫程序TTT.java

在src/main/java右键->new class,输入TTT回车,将爬虫代码粘到里边

4、添加依赖httpclient 

4.1 搜httpclient 4.5.2依赖包

在如下网站搜到httpclient 3.1的maven pom.xml
http://mvnrepository.com/artifact/commons-httpclient/commons-httpclient/4.5.2
<!-- http://mvnrepository.com/artifact/commons-httpclient/commons-httpclient -->

<!-- http://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient -->

<dependency>

    <groupId>org.apache.httpcomponents</groupId>

    <artifactId>httpclient</artifactId>

    <version>4.5.2</version>

</dependency>

4.2 在eclipse添加pom.xml的依赖项

左边项目管理栏双击pom.xml,弹出表单,在下部找到dependancies,add,将上边的groupId,artifactId,版本填入,ok。

4.3 maven install

右键pom.xml,run as->maven install

如果报错,JRE不是JDK之类的,说明JRE reference不对,需要重新指定为JDK下边的JRE。

右键项目->property->java build path->libraries

选择JRE System Library->Edit

弹出框点Alternate JRE,点installed jres,点search,选择JDK下的JRE目录,apply,OK。

重新maven install 

build ok

[也可以在window -> preferences-java里把JRE路径改了,一劳永逸]

4.4 运行项目

如果没有mave clean,可以:MAVEN项目->右键->run as-> maven build->Goals填clean package,ok。

下载dependency

项目->右键->run as -> maven install

运行

项目->右键->run as -> java application -> TTT

输出抓取结果。完成。

参考:
http://jingyan.baidu.com/article/295430f136e8e00c7e0050b9.html http://www.iteye.com/topic/1123225 http://www.blogjava.net/fancydeepin/archive/2012/06/12/380605.html http://mvnrepository.com/artifact/commons-httpclient/commons-httpclient/3.1 http://bbs.csdn.net/topics/390172911
TTT.java

package com.mvntest.crawler;

import java.io.BufferedReader;  

import java.io.IOException;  

import java.io.InputStream;  

import java.io.InputStreamReader;  

  

import org.apache.http.HttpEntity;  

import org.apache.http.HttpResponse;  

import org.apache.http.client.ClientProtocolException;  

import org.apache.http.client.HttpClient;  

import org.apache.http.client.methods.HttpGet;  

import org.apache.http.impl.client.DefaultHttpClient;  

  

public class TTT  

{  

  

    /** 

     * @param args 

     * @throws IOException  

     * @throws ClientProtocolException  

     */  

    public static void main(String[] args) throws ClientProtocolException, IOException  

    {  

        // 创建HttpClient实例     

        HttpClient httpclient = new DefaultHttpClient();  

        // 创建Get方法实例     

        HttpGet httpgets = new HttpGet("http://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient/4.5.2");    

        HttpResponse response = httpclient.execute(httpgets);    

        HttpEntity entity = response.getEntity();    

        if (entity != null) {    

            InputStream instreams = entity.getContent();    

            String str = convertStreamToString(instreams);  

            System.out.println("Do something");   

            System.out.println(str);  

            // Do not need the rest    

            httpgets.abort();    

        }  

    }  

      

    public static String convertStreamToString(InputStream is) {      

        BufferedReader reader = new BufferedReader(new InputStreamReader(is));      

        StringBuilder sb = new StringBuilder();      

       

        String line = null;      

        try {      

            while ((line = reader.readLine()) != null) {  

                sb.append(line + "\n");      

            }      

        } catch (IOException e) {      

            e.printStackTrace();      

        } finally {      

            try {      

                is.close();      

            } catch (IOException e) {      

               e.printStackTrace();      

            }      

        }      

        return sb.toString();      

    }  

  

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  maven 爬虫