您的位置:首页 > 数据库

spark通过jdbc访问postgresql数据库

2015-12-09 09:56 483 查看
1.首先要有可用的jdbc
[hadoop@db1 bin]$ locate jdbc|grep postgres
/mnt/hd01/www/html/deltasql/clients/java/dbredactor/lib/postgresql-8.2-507.jdbc4.jar
/usr/lib/ruby/gems/1.8/gems/railties-3.2.13/lib/rails/generators/rails/app/templates/config/databases/jdbcpostgresql.yml
/usr/src/postgis-2.0.0/java/jdbc/src/org/postgresql
/usr/src/postgis-2.0.0/java/jdbc/src/org/postgresql/driverconfig.properties
/usr/src/postgis-2.0.0/java/jdbc/stubs/org/postgresql
/usr/src/postgis-2.0.0/java/jdbc/stubs/org/postgresql/Connection.java
/usr/src/postgis-2.0.0/java/jdbc/stubs/org/postgresql/PGConnection.java
/usr/src/postgis-2.1.0/java/jdbc/src/org/postgresql
/usr/src/postgis-2.1.0/java/jdbc/src/org/postgresql/driverconfig.properties
/usr/src/postgis-2.1.0/java/jdbc/stubs/org/postgresql
/usr/src/postgis-2.1.0/java/jdbc/stubs/org/postgresql/Connection.java
/usr/src/postgis-2.1.0/java/jdbc/stubs/org/postgresql/PGConnection.java
没有合适的,就在管网下载:https://jdbc.postgresql.org/download/postgresql-9.4-1205.jdbc4.jar


2.把下载好的jar文件放在$SPARK_HOME/lib下面

3.启动sparck shell

[hadoop@db1 bin]$ SPARK_CLASSPATH=$SPARK_HOME/lib/postgresql-9.4-1205.jdbc4.jar $SPARK_HOME/bin/spark-shell
.
.
.
Please instead use:
- ./spark-submit with --driver-class-path to augment the driver classpath
- spark.executor.extraClassPath to augment the executor classpath

15/11/04 17:53:05 WARN SparkConf: Setting 'spark.executor.extraClassPath' to '/usr/src/data-integration/lib/postgresql-9.3-1102-jdbc4.jar' as a work-around.
15/11/04 17:53:05 WARN SparkConf: Setting 'spark.driver.extraClassPath' to '/usr/src/data-integration/lib/postgresql-9.3-1102-jdbc4.jar' as a work-around.
15/11/04 17:53:06 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
15/11/04 17:53:07 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
Spark context available as sc.
15/11/04 17:53:09 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/04 17:53:09 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/04 17:53:25 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
15/11/04 17:53:25 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
15/11/04 17:53:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/11/04 17:53:29 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/04 17:53:29 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
SQL context available as sqlContext.


4.创建DataFrame对象
scala> val df = sqlContext.load("jdbc", Map("url" -> "jdbc:postgresql://localhost:5434/cd03?user=cd03&password=cd03", "dbtable" -> "test_trans"))
warning: there were 1 deprecation warning(s); re-run with -deprecation for details
df: org.apache.spark.sql.DataFrame = [trans_date: string, trans_prd: int, trans_cust: int]
标准格式:
val jdbcDF = sqlContext.read.format("jdbc").options(
Map("url" -> "jdbc:postgresql:dbserver",
"dbtable" -> "schema.tablename")).load()


5.查看schema
scala> df.printSchema()
root
|-- trans_date: string (nullable = true)
|-- trans_prd: integer (nullable = true)
|-- trans_cust: integer (nullable = true)


6.简单计算
scala> df.filter(df("trans_cust")>9999999).select("trans_date","trans_prd").show
+----------+---------+
|trans_date|trans_prd|
+----------+---------+
| 2015-5-20|     2007|
| 2015-7-24|     5638|
| 2015-5-19|     8182|
| 2015-2-24|    11391|
| 2015-8-13|    17341|
| 2015-2-22|    10996|
| 2015-1-17|    15284|
|  2015-1-8|    16090|
| 2015-1-25|    13528|
| 2015-1-17|     9498|
| 2015-9-25|     7235|
| 2015-8-19|     4084|
| 2015-4-24|    16637|
| 2015-5-27|    13829|
| 2015-0-13|    13956|
| 2015-3-19|    11974|
| 2015-10-5|     1185|
| 2015-3-28|     9412|
| 2015-6-13|    15203|
| 2015-2-14|    10087|
+----------+---------+
only showing top 20 rows
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: