您的位置：首页 > 数据库

hive之DML简单梳理

2018-01-10 12:41 260 查看

1、加载数据

LOAD DATA [LOCAL] INPATH

'filepath'

[OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

不加local导入的时hdfs上的数据，加local导入的是linux本地的数据
overwrite覆盖，不加是追加
数据可以加载到一张表里，也可以加载到一张表的分区里，如果加载到分区表的分区，则必须制定对应分区所有列的值。
该操作实际上是将hdfs的对应文件移动到了emp表对应的目录中
导入数据数据的方式：
第一种方式：
creat table emp (id int,name string)row format delimited by "\t" location hdfs_path;
hdfs dfs -put /tmp/emp /user/hive/warehouse/test1.db/test；
第二种方式：
reat table emp (id int,name string)row format delimited by "\t" ；
load data inpath '/user/hive/warehouse/test1.db/test' overwrite intotable emp;
第三种方式：

creat table emp (id int,name string)row format delimited by "\t";
hdfs dfs -put /tmp/emp /user/hive/warehouse/test1.db/test；
msck repair table emp;强制刷新元信息

2、Inserting data into Hive Tables from queries

标准用法：-- 表emp2结构和select子句查出来的列数量和类型要一致，否则报错-- 如果列数量和类型一致，但是列顺序相反，则会造成业务上的问题-- 比如将ename和job插入成了job和ename，或者俩job-- 不会报错，但是表的数据就是混乱的了

insertintotable emp2 select * from emp;-- 注意点和上方一样，该操作是追加数据
延伸用法：1 多联插入 insert overwrite

table emp3

select name, age;

2 插入动态分区

3、Writing data into the filesystem from queries

1 标准用法例子use test1;INSERT OVERWRITELOCAL DIRECTORY'/tmp/test_emp'ROW FORMAT DELIMITED FIELDS TERMINATEDBY"\t"select *from emp;-- 会查询test1.emp表，写入操作系统的/tmp/test_emp目录中的000000_0文件-- 目录和文件会自动创建，当然运行hive的用户要有相应的OS权限才可以

INSERT OVERWRITE DIRECTORY '/user/hive/warehouse/insert_test_emp' ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" select * from emp;-- 查询test1.emp表，写入hdfs的/user/hive/warehouse/insert_test_emp目录的000000_0文件-- 目录和文件会自动创建，当然运行hive的用户要有相应的hdfs权限才可以

insertintotable emp2 select *from emp;-- 注意点和上方一样，该操作是追加数据

2 延伸用法
from (select *from emp) tmpINSERT OVERWRITE LOCAL DIRECTORY'/tmp/hivetmp1'ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"select empno, ename INSERT OVERWRITE DIRECTORY '/user/hive/warehouse/hivetmp2'ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"select ename;-- 从一个from子句，插入数据到多个目录-- 如果数据来自一个from子句，那么这个子句必须起一个别名

insert overwrite directory linux_path row format delimited fields terminated by "\t"
select name,age;

4、select

HIVE的DML，select相关：跟通用的SQL是一样的，略。
分组聚合会产生数据倾斜，而 union all 常用于处理数据倾斜问题。
CASE语句和通用的SQL语法中的CASE一样的：

select * from emp where id=1 union all select * from emp2;将两张表查询结果放在一起
select ename, salary, casewhen salary > 1 and salary <= 1000then'LOWER'when salary > 1000 and salary <= 2000then'MIDDLE'when salary > 2000 and salary <= 4000then'HIGH'else'HIGHEST'endfrom emp;-- 就是标准的CASE语句格式

select * from emp where sal between 1000 and 2000;左闭右闭

select id ,name,avg(sal) from emp group by id, name;出现在select后的字段要么出现在group by后面，要么是在聚合函数内
select id ,name,avg(sal) from emp group by id ,name having sal>1000 and sal <2000;
group by中做过滤只能用having
5、Export 和import

export table department to 'hdfs_exports_location/department';

import from 'hdfs_exports_location/department';

Rename table on import:

export table department to 'hdfs_exports_location/department';
import table imported_dept from 'hdfs_exports_location/department';

Export partition and import:

export table employee partition (emp_country="in", emp_state="ka") to 'hdfs_exports_location/employee';
import from 'hdfs_exports_location/employee';

Export table and import partition:

export table employee to 'hdfs_exports_location/employee';
import table employee partition (emp_country="us", emp_state="tn") from 'hdfs_exports_location/employee';

Specify the import location:

export table department to 'hdfs_exports_location/department';
import table department from 'hdfs_exports_location/department'
location 'import_target_location/department';

Import as an external table:

export table department to 'hdfs_exports_location/department';
import external table department from 'hdfs_exports_location/department';

若泽数据交流群:707635769 【来自@若泽大数据】

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： hive SQL DML 数据库

相关文章推荐

新的分享

章节导航