Hive 5、Hive 的数据类型 和 DDL Data Definition Language) Hive DML(Data Manipulation Language)
2017-08-04 09:14
756 查看
官方帮助文档:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
Hive的数据类型
Hive DDL
Hive完整的DDL
Hive DDL的语方法为类SQL语法,所以标准的SQL语法大多数在Hive中都可用;
Hive建表
表新建成功以后可以使用desc查询表结构
Hive导入数据
导入数据:
数据查询
但在Hive中不推荐进行这些操作:Insert、Update、Delete等操作,因为Hive的特性是对数据仓库的数据进行提取,针对的数据是批量的,不适合行级的运算;
清空表
删除表
Hive DML(Data Manipulation Language)
DML主要是对Hive 表中的数据进行操作的(增 删 改),但是由于Hadoop的特性,所以单条的修改、删除,其性能会非常的低所以不支持进行级操作;
主要说明一下最常用的批量插入数据较为常用的方法:
1、从文件中加载数据
例:
2、从其他表中插入数据
例:
Hive的数据类型
-- 扩展数据类型 data_type : primitive_type | array_type | map_type | struct_type | union_type -- (Note: Available in Hive 0.7.0 and later)
array_type : ARRAY < data_type > map_type : MAP < primitive_type, data_type > struct_type : STRUCT < col_name : data_type [COMMENT col_comment], ...> union_type : UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)
-- 基本数据类型 primitive_type : TINYINT | SMALLINT | INT | BIGINT | BOOLEAN | FLOAT | DOUBLE | STRING | BINARY -- (Note: Available in Hive 0.8.0 and later) | TIMESTAMP -- (Note: Available in Hive 0.8.0 and later) | DECIMAL -- (Note: Available in Hive 0.11.0 and later) | DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later) | DATE -- (Note: Available in Hive 0.12.0 and later) | VARCHAR -- (Note: Available in Hive 0.12.0 and later) | CHAR -- (Note: Available in Hive 0.13.0 and later)
Hive DDL
Hive完整的DDL
Hive DDL的语方法为类SQL语法,所以标准的SQL语法大多数在Hive中都可用;
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later) [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)] ON ((col_value, col_value, ...), (col_value, col_value, ...), ...) [STORED AS DIRECTORIES] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later) ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later) [AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
Hive建表
-- Hive建表 语法 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path]; create table person( id int, name string, age int, likes array<string>, address map<string,string> ) row format delimited -- 指定导入数据的列与列之间的分隔符 fields terminated by ',' -- 指定Array类型的分隔符 collection ITEMS TERMINATED BY '-' -- 指定map类型的分隔符 map keys terminated by ':' -- 指定行与行之间的分隔符 lines terminated by '\n';
表新建成功以后可以使用desc查询表结构
Hive导入数据
# 三条数据,列与列之间用,号隔开;array之间用-号隔开;map之间用:号隔开;行与行用换行符隔开 1,tom,28,game-music-book,stu:henan-home:henan-work:beijing 2,jack,21,money-meinv,stu:wuhan-home:wuhan 3,lusi,18,shopping-music,stu:shanghai-home:beijing
导入数据:
hive> load data local inpath '/opt/data.txt' overwrite into table person;
数据查询
-- 查询所有 select * from person; -- 还可以这样查 select * from person where name='tom'; -- 或者这样 select * from person where likes[1]='music'; -- 还有这样 select * from person where address['stu']='shanghai'; -- 还有这样 select avg(age) from person; -- ... 等标准的SQL语法大多都可以在Hive中使用包括一些函数,因为Hive是类SQL的;
但在Hive中不推荐进行这些操作:Insert、Update、Delete等操作,因为Hive的特性是对数据仓库的数据进行提取,针对的数据是批量的,不适合行级的运算;
清空表
-- 使truncate清空表 TRUNCATE TABLE person; -- 通过覆盖的方式清空表 insert overwrite table person select * from person where 1=2;
删除表
drop table person;
Hive DML(Data Manipulation Language)
DML主要是对Hive 表中的数据进行操作的(增 删 改),但是由于Hadoop的特性,所以单条的修改、删除,其性能会非常的低所以不支持进行级操作;
主要说明一下最常用的批量插入数据较为常用的方法:
1、从文件中加载数据
语法:LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
例:
load data local inpath '/opt/data.txt' overwrite into table table1; -- 如果文件存放在HDFS中,则不需要写Local
2、从其他表中插入数据
语法:Standard syntax: INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement; INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement; Hive extension (multiple inserts): FROM from_statement INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...; FROM from_statement INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...; Hive extension (dynamic partition inserts): INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement; INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;
例:
FROM page_view_stg pvs INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country) SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip, pvs.cnt
相关文章推荐
- MySQL部分DDL(数据库定义语言(Data Definition Language)),MySQL数据库表中字段的数据类型
- Hive 6、Hive DML(Data Manipulation Language)
- 数据定义语句(DDL,Data definitionlanguage)
- Hive的数据类型、DDL和DML
- mysql数据操作数据(DML)Data Manipulation Language
- Hive 6、Hive DML(Data Manipulation Language)
- Hive DDL(Data Definition Language)之修改表
- 处理 DB2 数据 数据操纵语言(Data Manipulation Language,DML)
- SQL教程:数据定义 DDL(Data Definition Language)
- [置顶] Hive编程指南-数据类型、DDL、DML、常用字符串函数
- Hive 6、Hive DML(Data Manipulation Language)
- Hive Data Manipulation Language
- DDL(data definition language)
- Hive Data Definition Language
- 【5】mysql数据操作语言DML(Data Manipulation Language)
- DDL(Data Definition Language)语句 on Oracle
- 数据操作语句(Data Manipulation Language)
- SQL-DML(Data Manipulation Language) 数据操纵语言
- 05--MySQL自学教程:DDL(Data Definition Language:数据库定义语言)操作数据库(一)
- Hive Data Definition Language