您的位置:首页 > 运维架构

Hive Developing

2015-08-31 11:21 302 查看

Building hive from source

clone source

git clone https://git-wip-us.apache.org/repos/asf/hive.git or
git clone git@github.com:wankunde/hive.git

# ignore chmod changes
git config core.filemode false


build

git branch -va
// checkout the branch which you are intrest.
git checkout -b branch-0.14 origin/branch-0.14
// or git checkout --track origin/branch-0.14

// compile and dist
mvn clean install -DskipTests -Phadoop-2 -Pdist

//  generate protobuf code
cd ql
mvn clean install -DskipTests -Phadoop-2,protobuf

// generate Thrift code
mvn clean install -Phadoop-2,thriftif -DskipTests -Dthrift.home=/usr/local


Tips

An alternatvie method to active profile in pom.xml is add follow configuration in
$MAVEN_HOME/conf/setting.xml
. You can check the configuration using
mvn help:active-profiles
command.

<profiles>
<profile>
<id>hadoop-local-dev</id>
<properties>
<maven.test.classpath></maven.test.classpath>
<test.dfs.mkdir>D:\hadoop_tmp\test_dfs_mkdir</test.dfs.mkdir>
<test.output.overwrite>true</test.output.overwrite>
<thrift>D:\hadoop_tmp\thrift</thrift>
<thrift.home>${thrift}\home</thrift.home>
<thrift.gen.dir>${thrift}\gen_dir</thrift.gen.dir>
<thrift.args></thrift.args>
</properties>
</profile>
</profiles>

<activeProfiles>
<activeProfile>hadoop-2</activeProfile>
<activeProfile>hadoop-local-dev</activeProfile>
</activeProfiles>


Configure system environment variable
HADOOP_HOME=D:\installs\hadoop-2.5.0-cdh5.2.0
.

By default,before compile,maven will download many dependency packages and meet timeout exception.I use nexus and add
<timeout>120000</timeout>
configuration to
<server>
configuration item. Not Test

Hive HiveDeveloperFAQ Wiki

If you want to compile and test hive in Eclipse,continue the follow steps.

mvn eclipse:eclipse

import project

Configure - Convert to maven project

Some maven plugins may not work well,(may need to set local proxy host and port in eclipse),connect to m2e market and install m2e connectors what you need.(include antlr,build-helper)

You can access the m2e market place from the preferences: Preferences>Maven>Discovery>Open Catalog. Installing WTP integration solved most plugin issues for me.

Setup the development environment

Install single node hadoop

Install hive

hive-site.xml

<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>

<property>
<name>hive.metastore.uris</name>
<value>thrift://x.x.x.x:9083</value>
</property>

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://x.x.x.x:3306/hive?createDatabaseIfNotExist=true</value>
</property>

<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>

<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>username</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
</property>

<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2g</value>
</property>
</configuration>


Start Hive metastore

#!/bin/sh
nohup bin/hive --service metastore >> logs/metastore.log 2>&1 &
echo $! > logs/hive-metastore.pid


Error solutions:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)


Problem: Hive throws above exception when old version of MySQL is used as hive metastore.

Solution: Set Latin1 as the charset for metastore

mysql> alter database hive character set latin1.


Hive test and develop

Change hive log level

bin/hive -hiveconf hive.root.logger=DEBUG,console


Or change log4j properties.

cp conf/hive-log4j.properties.template conf/hive-log4j.properties


Connecting a Java Debugger to hive

Example java remote debug

Run remote java program using script

JVM_OPTS="-server -XX:+UseParNewGC -XX:+HeapDumpOnOutOfMemoryError"
DEBUG="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=2345"
JVM_OPTS="$JVM_OPTS $DEBUG"

export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:${LOG_DIR}/gc-`date +'%Y%m%d%H%M'` -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=512M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/bh/logs/hbase/hbase.heapdump  -XX:+PrintAdaptiveSizePolicy -XX:+PrintFlagsFinal -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+UnlockExperimentalVMOptions -XX:InitiatingHeapOccupancyPercent=65 -XX:G1HeapRegionSize=32m -XX:G1RSetRegionEntries=16384 -XX:NewSize=1g -XX:MaxNewSize=1g -XX:MaxTenuringThreshold=1 -XX:SurvivorRatio=15 -XX:+UseNUMA -Xmx16384M -Xms16384M -Xprof"

$JAVA_HOME/bin/java $JVM_OPTS -cp tools-1.0.jar com.wankun.tools.hdfs.Test2


// For JDK 1.5.x or higher
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=2345
// For JDK 1.4.x
-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=2345
// For JDK 1.3.x
-Xnoagent -Djava.compiler=NONE -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=2345


Add and run new remote debug configuration in eclipse

Start hive debug

Server :
hive --help --debug


Idea :
run | edit configuration |default:remote:+ |configure the remote host and port| run the selected the remote debug


Practice :
hive --debug:childSuspend=y -hiveconf hive.root.logger=DEBUG,console


Run hive without a hadoop cluster

export HIVE_OPTS='--hiveconf mapred.job.tracker=local --hiveconf fs.default.name=file:///tmp \
--hiveconf hive.metastore.warehouse.dir=file:///tmp/warehouse \
--hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/tmp/metastore_db;create=true'


Hive test unit

Two kind of unit tests

Normal unit test

mvn test -Dtest=ClassName#methodName -Phadoop-2


For example,

mvn test -Dtest=TestAbc -Phadoop-2
which TestAbc is the test case.

mvn test -Dtest='org.apache.hadoop.hive.ql.*' -Phadoop-2
.

Help Links : Maven Surefire Plugin

Query files

There are many test scripts. Not successed

$ ls ql/src/test/queries/
clientcompare  clientnegative  clientpositive  negative  positive

// run test unit,ql as example
cd ql
mvn test -Dtest=TestCliDriver -Dqfile=groupby1.q -Phadoop-2

//Take src/test/queries/clientpositive/groupby1.q for example.

mvn test -Dmodule=ql -Phadoop-2 -Dtest=TestCliDriver -Dqfile=groupby1.q -Dtest.output.overwrite=true


Help Links

Help Links1

Help Links2

https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-Unittestsanddebugging  https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode https://cwiki.apache.org/confluence/display/Hive/HowToContribute https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoIimportintoEclipse?  https://github.com/apache/hive/blob/trunk/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java[/code] 
                                            
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: