Hive Developing
2015-08-31 11:21
302 查看
Building hive from source
clone source
git clone https://git-wip-us.apache.org/repos/asf/hive.git or git clone git@github.com:wankunde/hive.git # ignore chmod changes git config core.filemode false
build
git branch -va // checkout the branch which you are intrest. git checkout -b branch-0.14 origin/branch-0.14 // or git checkout --track origin/branch-0.14 // compile and dist mvn clean install -DskipTests -Phadoop-2 -Pdist // generate protobuf code cd ql mvn clean install -DskipTests -Phadoop-2,protobuf // generate Thrift code mvn clean install -Phadoop-2,thriftif -DskipTests -Dthrift.home=/usr/local
Tips
An alternatvie method to active profile in pom.xml is add follow configuration in$MAVEN_HOME/conf/setting.xml. You can check the configuration using
mvn help:active-profilescommand.
<profiles> <profile> <id>hadoop-local-dev</id> <properties> <maven.test.classpath></maven.test.classpath> <test.dfs.mkdir>D:\hadoop_tmp\test_dfs_mkdir</test.dfs.mkdir> <test.output.overwrite>true</test.output.overwrite> <thrift>D:\hadoop_tmp\thrift</thrift> <thrift.home>${thrift}\home</thrift.home> <thrift.gen.dir>${thrift}\gen_dir</thrift.gen.dir> <thrift.args></thrift.args> </properties> </profile> </profiles> <activeProfiles> <activeProfile>hadoop-2</activeProfile> <activeProfile>hadoop-local-dev</activeProfile> </activeProfiles>
Configure system environment variable
HADOOP_HOME=D:\installs\hadoop-2.5.0-cdh5.2.0.
By default,before compile,maven will download many dependency packages and meet timeout exception.I use nexus and add
<timeout>120000</timeout>configuration to
<server>configuration item. Not Test
Hive HiveDeveloperFAQ Wiki
If you want to compile and test hive in Eclipse,continue the follow steps.
mvn eclipse:eclipse
import project
Configure - Convert to maven project
Some maven plugins may not work well,(may need to set local proxy host and port in eclipse),connect to m2e market and install m2e connectors what you need.(include antlr,build-helper)
You can access the m2e market place from the preferences: Preferences>Maven>Discovery>Open Catalog. Installing WTP integration solved most plugin issues for me.
Setup the development environment
Install single node hadoop
Install hive
hive-site.xml<configuration> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> <name>hive.metastore.uris</name> <value>thrift://x.x.x.x:9083</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://x.x.x.x:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>username</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>password</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx2g</value> </property> </configuration>
Start Hive metastore
#!/bin/sh nohup bin/hive --service metastore >> logs/metastore.log 2>&1 & echo $! > logs/hive-metastore.pid
Error solutions:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
Problem: Hive throws above exception when old version of MySQL is used as hive metastore.
Solution: Set Latin1 as the charset for metastore
mysql> alter database hive character set latin1.
Hive test and develop
Change hive log level
bin/hive -hiveconf hive.root.logger=DEBUG,console
Or change log4j properties.
cp conf/hive-log4j.properties.template conf/hive-log4j.properties
Connecting a Java Debugger to hive
Example java remote debug
Run remote java program using scriptJVM_OPTS="-server -XX:+UseParNewGC -XX:+HeapDumpOnOutOfMemoryError" DEBUG="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=2345" JVM_OPTS="$JVM_OPTS $DEBUG" export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:${LOG_DIR}/gc-`date +'%Y%m%d%H%M'` -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=512M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/bh/logs/hbase/hbase.heapdump -XX:+PrintAdaptiveSizePolicy -XX:+PrintFlagsFinal -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+UnlockExperimentalVMOptions -XX:InitiatingHeapOccupancyPercent=65 -XX:G1HeapRegionSize=32m -XX:G1RSetRegionEntries=16384 -XX:NewSize=1g -XX:MaxNewSize=1g -XX:MaxTenuringThreshold=1 -XX:SurvivorRatio=15 -XX:+UseNUMA -Xmx16384M -Xms16384M -Xprof" $JAVA_HOME/bin/java $JVM_OPTS -cp tools-1.0.jar com.wankun.tools.hdfs.Test2
// For JDK 1.5.x or higher -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=2345 // For JDK 1.4.x -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=2345 // For JDK 1.3.x -Xnoagent -Djava.compiler=NONE -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=2345
Add and run new remote debug configuration in eclipse
Start hive debug
Server :hive --help --debug
Idea :
run | edit configuration |default:remote:+ |configure the remote host and port| run the selected the remote debug
Practice :
hive --debug:childSuspend=y -hiveconf hive.root.logger=DEBUG,console
Run hive without a hadoop cluster
export HIVE_OPTS='--hiveconf mapred.job.tracker=local --hiveconf fs.default.name=file:///tmp \ --hiveconf hive.metastore.warehouse.dir=file:///tmp/warehouse \ --hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/tmp/metastore_db;create=true'
Hive test unit
Two kind of unit tests
Normal unit testmvn test -Dtest=ClassName#methodName -Phadoop-2
For example,
mvn test -Dtest=TestAbc -Phadoop-2which TestAbc is the test case.
mvn test -Dtest='org.apache.hadoop.hive.ql.*' -Phadoop-2.
Help Links : Maven Surefire Plugin
Query files
There are many test scripts. Not successed
$ ls ql/src/test/queries/ clientcompare clientnegative clientpositive negative positive // run test unit,ql as example cd ql mvn test -Dtest=TestCliDriver -Dqfile=groupby1.q -Phadoop-2 //Take src/test/queries/clientpositive/groupby1.q for example. mvn test -Dmodule=ql -Phadoop-2 -Dtest=TestCliDriver -Dqfile=groupby1.q -Dtest.output.overwrite=true
Help Links
Help Links1Help Links2
https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-Unittestsanddebugging https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-DebuggingHiveCode https://cwiki.apache.org/confluence/display/Hive/HowToContribute https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-HowdoIimportintoEclipse? https://github.com/apache/hive/blob/trunk/itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java[/code]
相关文章推荐
- [转]CentOS 6.5安全加固及性能优化
- thinkphp 在nginx服务器上url重写失败,找不到路径
- ecshop 网站标题不更新或内容不更新
- hadoop是什么?
- hdoj 4322 Candy 【最大费用最大流】【经典题目】【最大流时 维护费用的最大效益】
- Hadoop常见错误及解决办法汇总
- Linux:chmod -R 777 * 是什么意思?
- SPOJ Optimal Marks(最小割的应用)
- linux 命令汇总
- Linux-《Linux命令行与shell脚本编程大全》阅读笔记
- ubuntu中desktop文件执行报错
- Linux系统文件描述符理解
- Linux 进程(二):进程关系及其守护进程
- LinuxUnix time时间戳的处理转换函数
- linux module加载
- Linux系统信息日志
- 网站优化:引用CDN公共库
- Eclipse Tomcat 404
- Linux fork炸弹解析 ——Linux Fork Bomb
- 网站或系统中直接嵌入天气信息