您的位置:首页 > 数据库

hive之DML简单梳理

2018-01-10 12:41 260 查看

1、加载数据

LOAD DATA [LOCAL] INPATH 
'filepath'
 
[OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

不加local导入的时hdfs上的数据,加local导入的是linux本地的数据
overwrite覆盖,不加是追加
数据可以加载到一张表里,也可以加载到一张表的分区里,如果加载到分区表的分区,则必须制定对应分区所有列的值。
该操作实际上是将hdfs的对应文件移动到了emp表对应的目录中
   导入数据数据的方式:
第一种方式:
         creat table emp (id int,name string)row format delimited by "\t" location hdfs_path;
       hdfs dfs -put /tmp/emp /user/hive/warehouse/test1.db/test;
第二种方式:
reat table emp (id int,name string)row format delimited by "\t" ;
load data inpath '/user/hive/warehouse/test1.db/test' overwrite intotable emp;
第三种方式:

creat table emp (id int,name string)row format delimited by "\t";
hdfs dfs -put /tmp/emp /user/hive/warehouse/test1.db/test;
msck repair table emp;强制刷新元信息

2、Inserting data into Hive Tables from queries

标准用法:-- 表emp2结构和select子句查出来的列数量和类型要一致,否则报错-- 如果列数量和类型一致,但是列顺序相反,则会造成业务上的问题-- 比如将ename和job插入成了job和ename,或者俩job-- 不会报错,但是表的数据就是混乱的了

insertintotable emp2 select * from emp;-- 注意点和上方一样,该操作是追加数据
延伸用法:1 多联插入  insert overwrite 
table emp3 
select name, age;

2 插入动态分区

3、Writing data into the filesystem from queries

1 标准用法例子use test1;INSERT OVERWRITELOCAL DIRECTORY'/tmp/test_emp'ROW FORMAT DELIMITED FIELDS TERMINATEDBY"\t"select *from emp;-- 会查询test1.emp表,写入操作系统的/tmp/test_emp目录中的000000_0文件-- 目录和文件会自动创建,当然运行hive的用户要有相应的OS权限才可以

INSERT OVERWRITE DIRECTORY '/user/hive/warehouse/insert_test_emp' ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" select * from emp;-- 查询test1.emp表,写入hdfs的/user/hive/warehouse/insert_test_emp目录的000000_0文件-- 目录和文件会自动创建,当然运行hive的用户要有相应的hdfs权限才可以

insertintotable emp2 select *from emp;-- 注意点和上方一样,该操作是追加数据

2 延伸用法
from (select *from emp) tmpINSERT OVERWRITE LOCAL DIRECTORY'/tmp/hivetmp1'ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"select empno, ename INSERT OVERWRITE DIRECTORY '/user/hive/warehouse/hivetmp2'ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"select ename;-- 从一个from子句,插入数据到多个目录-- 如果数据来自一个from子句,那么这个子句必须起一个别名

  insert overwrite directory linux_path row format delimited fields terminated by "\t"
  select name,age;

4、select

HIVE的DML,select相关:跟通用的SQL是一样的,略。 
分组聚合会产生数据倾斜,而 union all 常用于处理数据倾斜问题。 
CASE语句和通用的SQL语法中的CASE一样的:

select * from emp where id=1 union all select * from emp2;将两张表查询结果放在一起
select ename, salary, casewhen salary > 1 and salary <= 1000then'LOWER'when salary > 1000 and salary <= 2000then'MIDDLE'when salary > 2000 and salary <= 4000then'HIGH'else'HIGHEST'endfrom emp;-- 就是标准的CASE语句格式

select * from emp where sal between 1000 and 2000;左闭右闭

select id ,name,avg(sal) from emp group by id, name;出现在select后的字段要么出现在group by后面,要么是在聚合函数内
select id ,name,avg(sal) from emp group by id ,name having sal>1000 and sal <2000;
group by中做过滤只能用having  
5、Export 和import

export table department to 'hdfs_exports_location/department';


import from 'hdfs_exports_location/department';
Rename table on import:
export table department to 'hdfs_exports_location/department';
import table imported_dept from 'hdfs_exports_location/department';
Export partition and import:
export table employee partition (emp_country="in", emp_state="ka") to 'hdfs_exports_location/employee';
import from 'hdfs_exports_location/employee';
Export table and import partition:
export table employee to 'hdfs_exports_location/employee';
import table employee partition (emp_country="us", emp_state="tn") from 'hdfs_exports_location/employee';
Specify the import location:
export table department to 'hdfs_exports_location/department';
import table department from 'hdfs_exports_location/department'
location 'import_target_location/department';
Import as an external table:
export table department to 'hdfs_exports_location/department';
import external table department from 'hdfs_exports_location/department';
 若泽数据交流群:707635769 【来自@若泽大数据】
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  hive SQL DML 数据库