您的位置：首页 > 其它

Hive入门

2018-02-02 16:04 148 查看

一、Hive的管理

1.Hive的启动方式

CLI(Command-line interface，命令行方式)

-启动方式： hive --service cli

Web界面方式

-端口号9999

-启动方式：hive --service hwi

-通过浏览器访问：http://<IP地址>:9999/hwi/

hive远程服务启动方式

#以JDBC或ODBC的程序登陆到hive中操作数据时，必须选用远程服务启动方式
-端口号10000
-启动方式：hive --service hiveserver

2.Hive常用命令

清屏：-Ctrl + L 或者 !clear
查看数据仓库中的表：-show tables;
查看数据仓库中内置的函数：-show functions;
查看表结构：-desc 表名
查看HDFS上的文件：
-dfs -ls 目录

-dfs -lsr 目录 #递归显示

执行操作系统的命令：-!命令
执行sql文件：-source 目录/***.sql
hive静默模式：-hive -S

二、Hive的数据类型

1.varchar和char的区别：

varchar(20) #上限为20，小于等于20

char(20)#固定20，不足20自动补齐

2.Array：数组类型，由一系列相同数据类型的元素组成

create table student
(sid int,
sname string,
grade array<float>);

#{1,Tom,[80,90,75]}

3.Map：集合类型，包含key->value键值对，可以通过key来访问元素

create table student1
(sid int,
sname string,
grade map<string,float>);

#{1,Tom,<'大学语文',85>}

create table student3
(sid int,
sname string,
grades array<map<string,float>>);

#{1,'Tom',[<'大学语文',85>,<'大学英语'，79>]}

4.Struct：结构类型，可以包含不同数据类型的元素。这些元素可以通过“点语法”的方式来得到所需要的元素

create table student4
(sid int,
info struct<name:string,age:int,sex:string>);

#{1,{'Tom',10,'男'}}

三、hive的数据模型

1.内部表(Table)

-与数据库中的Table在概念上类似

-每一个Table在Hive中都有一个相应的目录存储数据

-所有的Table数据(不包括External Table)都保存在这个目录中

-删除表时，元数据与数据都会被删除

create table t3

(tid int,tname string,age int)

location '/'#指定hdfs存储路径

row format delimited fields terminated by ',';#用‘，’分割各字段

#查询结果建为新表

create table t4

row format delimited

fields terminated by ','

as

select * from t1;

#添加新字段

alter table t1 add columns(english int)

2.分区表(Partition)

-Partition对应于数据库的Partition列的密集索引

-在Hive中，表中的一个Partition对应于表下的一个目录，所有的Partition的数据都存储在对应的目录中

#创建以gender字段分区的表

create table partition_table

(sid int,sname string)

partitioned by (gender string)

row format delimited fields terminated by ',';

#按分区条件插入数据

insert into table partition_table

partition(gender='F')

select sid,sname from sample_data where gender='F';

#查看HQL执行计划（从右往左，从下往上）

explain
select * from partition_table where gender='M';

3.外部表(External Table)

-指向已经在HDFS中存在的数据，可以创建Partition

-它和内部表在元数据的组织上是相同的，而实际数据的存储则有较大的差异

-外部表只有一个过程，加载数据和创建表同时完成，并不会移动到数据仓库目录中，只是与外部数据建立一个链接。当删除一个外部表时，仅删除该链接

#Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system.

hdfs dfs -put student01.txt /input

#创建外部表
create external table ext_table
(sid int,sname string,age int)
row format delimited fields terminated by ','
location '/input';

4.桶表(Bucket Table)
-桶表是对数据进行哈希取值，然后放到不同文件中存储。

create table bkt_table
(sid int,sname string,age int)
clustered by(sname) into 5 buckets;

5.视图(View)
-视图是一种虚表，是一个逻辑概念；可以跨越多张表

-视图建立在已有表的基础上，视图赖以建立的这些表称为基表
-视图可以简化复杂的查询

#查询员工信息：员工号，姓名，月薪，年薪，部门名称
create view empinfo
as
select e.empno,e.ename,e.sal,e.sal*12 annlsal,d.dname
from emp e,dept d
where e.deptno=d.deptno;

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： hive

相关文章推荐

新的分享

章节导航