您的位置：首页 > 其它

hive的数据定义相关语法知识

2016-05-03 11:29 435 查看

Data Definition

1.创建数据库：

CREATE DATABASE IF NOT EXISTS foo

[LOCATION]'/m/j/jkl' [COMMENT]'haha'

[WITH DBPROPERTIES] ('creator' = 'Mark',

'date' = '2012-01-02');

**************

--(分布式默认LOCATION hdfs:///user/hive/warehouse/foo.db

等价于

hdfs://hostname[:port]/user/hive/warehouse/foo.db

Note the .db extension)

--(COMMET命令用于在使用DESCRIBE DATABASE命令时，能够

显示注释信息)

--(WITH DBPROPERTIES 的键值对信息在使用命令

DESCRIBE DATABASE EXTENDED foo；时可以显示出来)

--（如果不加IF NOT EXISTS子句，会抛出异常，当foo存在时）

2.show命令；

SHOW DATABASES LIKE ‘h.*’;

**************

（正则中的'.*'代表h后的任意个字符）

3.显示当前working DATABASE:

SET hive.cli.print.current.db=true;(反之 false)

示例：hive>……… hive(foo)>……

4.删除DATABASE:

DROP DATABASE IF EXISTS foo CASCADE;

--(IF EXISTS:避免数据库不存在而抛异常)

--(CASCADE 如果此数据库有表时须加此关键字强制删除）

--(RESTRICT如果此数据库有表时拒绝删除)

5.Alter Database:

ALTER DATABASE foo SET DBPROPERTIES(‘editby’=’haah’);

--(如果设置了此键值对属性，则没有方法去删除或unset一个DBPROPERTIES)

6.创建表：

CREATE TABLE IF NOT EXISTS [mydb].employees (

name STRING [COMMENT]'Employee name',

salary FLOAT [COMMENT]'Employee salary',

subordinates ARRAY<STRING> COMMENT 'Names of subordinates',

deductions MAP<STRING, FLOAT>

[COMMENT]'Keys are deductions names, values are percentages',

address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>

[COMMENT] 'Home address')

[COMMENT]'Description of the table'

[TBLPROPERTIES] ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...)

[LOCATION] '/user/hive/warehouse/mydb.db/employees';

--（由此可以看出COMMENT可以跟在DATABASE、TABALE、及其属性后面，作为注释来用）

--( IF NOT EXISTS如果已存在表的schema不同与所要创建的表，hive将忽略这差异，也就是说hive只认表名！)

--（TBLPROPERTIES 可以用 SHOW TBLPROPERTIES tblname查看表的键值对属性；

--(可以用 desc tblname（注意：与描述database的差异）来显示表的各个属性及每个属性描述)

默认有last_modified_by last-usrname和last_modified_time 精确到纪元秒）

7.default database在 /user/hive/warehouse没有自己的目录，所创建的表，直接在此目录下以自己的表名创建目录。

8.拷贝表的schema：

CREATE TABLE IF NOT EXISTS mydb.employees2

LIKE mydb.employees;

--（只拷贝模式，不拷贝数据）

--（可以加入LOCATION子句，而其他属性则取决于原表）

9.SHOW TABLES:

--(如果显示的不是当前database中的表，则用SHOW TABLES IN dbname)

--(SHOW TABLES ‘foo.*’; show子句支持正则查询，注意 .*)

--(不支持以上两种查询同时使用！！)

10. DESCRIBE EXTENDED

DESCRIBE EXTENDED mydb.employees;

deductions map<string,float> Keys are deductions names, values are percentages

address struct<street:string,city:string,state:string,zip:int> Home address

Detailed Table Information Table(tableName:employees, dbName:mydb, owner:me,

...

location:hdfs://master-server/user/hive/warehouse/mydb.db/employeesparameters:{creator=me, created_at='2012-01-02 10:00:00',

last_modified_user=me, last_modified_time=1337544510,

comment:Description of the table, ...}, ...)

--(用FORMATTED 可以提供更可读而且更冗长的信息)

--(也可以精确到具体的列 : DESCRIBE mydb.employees.salary;

salary float Employee salary --(此时用EXTENDED 将不会输出更多信息))

--（last_modified_by和 last_modified_time 表属性是自动创建的）

--（此命令也可以查看表的属性：【内/外表】 tableType:MANAGED_TABLE或 EXTERNAL_TABLE)）

10-2. 描述带有分区的extended 表；

DESCRIBE EXTENDED tablename [ partition (year=2012, month=1, day=2) ];

...

location:s3n://ourbucket/logs/2011/01/02,;

11. 创建外部表：

CREATE EXTERNAL TABLE IF NOT EXISTS stocks (

price_adj_close FLOAT) comment ‘desc table’

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

LOCATION '/data/stocks';

--(comment 选项必须紧跟括号之后，否则出错)

12.复制外部表的模式：

CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3

LIKE mydb.employees

LOCATION '/path/to/data';

--（如果你忽略 EXTERNAL，并且这源表是 external, 这新表将是external. 如果你忽略EXTERNAL关键字，并且这源表是managed, 这新表将是managed. However, 如果你写了 EXTERNAL关键字，并且这源表是managed, 这新表将是external.）

13.分区内部表:

CREATE TABLE employees (……列..)

PARTITIONED BY (country STRING, state STRING);

--（注意：‘列’里定义过的属性列，在partitioned by里面就不用定义了，即partitioned by 相当于定义了新属性列）

目录结构: ../employees/country=CA/state=AB

.../employees/country=CA/state=BC

.../employees/country=US/state=AL

.../employees/country=US/state=AK

--(目录结构由分区的层数决定)

--(分区项使用时相当于普通表的列:

SELECT * FROM employees

WHERE country = 'US' AND state = 'IL';)

set hive.mapred.mode=strict/nonstrict; (strict禁止查询语句不带where字句,否则会出现FAILED: Error in semantic analysis: No partition predicate（谓语） found for Alias "wxptbl_part" Table "wxptbl_part"的错误提示。, nonstrict 可以不带)

14.显示分区：

hive> SHOW PARTITIONS tablename [PARTITION(….分区键值对…)] ;

country=US/state=AL

country=US/state=AK

DESCRIBE EXTENDED tablename; (此命令也显示分区keys) ，形如：

partitionKeys:[FieldSchema(name:country, type:string, comment:null),

FieldSchema(name:state, type:string, comment:null)],…

中括号中的partition语句是为了更详细的查找某个分区表分区中的分区（只显示最外层有几个分区时可以不写）

15.导入数据：

LOAD DATA LOCAL INPATH '${env:HOME}/california-employees'

[ overwrite ] INTO TABLE employees

PARTITION (country = 'US', state = 'CA');

16. Primitive data types

Type Size Literal syntax examples

TINYINT 1byte
signed integer. 20

SMALLINT 2 byte
signed integer. 20

INT 4 byte signed integer. 20

BIGINT 8 byte
signed integer. 20

BOOLEAN Boolean true or false. TRUE

FLOAT Single precision floating point.
3.14159

DOUBLE Double precision floating point.
3.14159

STRING Sequence of characters.
The character

set can be specified. Single or double

quotes can be used.

'Now is the time', "for all

good men"

TIMESTAMP(v0.8.0+) Integer, float, or string.
1327882394(Unix epoch seconds),1327882394.123456789(Unix epoch seconds plus nanoseconds), and

'2012-02-03

12:34:56.123456789'(JDBCcompliant java.sql.Timestamp

format)

BINARY(v0.8.0+) Array of bytes.
See discussion below

17. Collection data types

Type Description
Literal syntax examples

STRUCT Analogous to a C structor an “object.” Fields can be accessed using the “dot” notation. For example, if a column nameis of

type STRUCT {first STRING; last STRING}, then

the first name field can be referenced using name.first.

struct('John', 'Doe')

MAP A collection of key-value tuples, where the fields are accessed using array notation (e.g., ['key']). For example, if a column

nameis of type MAPwith key→value pairs

'first'→'John'and 'last'→'Doe', then the last

name can be referenced using name['last'].

map('first', 'John','last', 'Doe')

ARRAY Ordered sequences of the sametype that are indexable using zero-based integers. For example, if a column nameis of type

ARRAYof strings with the value ['John', 'Doe'], then the second element can be referenced using name[1].

array('John', 'Doe')

18. Hive’s default record and field delimiters

Delimiter Description

\n For text files, each line is a record, so the line feed character separates records.

^A(“control” A) Separates all fields (columns). Written using the octal code \001when explicitly

specified in CREATE TABLEstatements.

^B Separate the elements in an ARRAY or STRUCT, or the key-value pairs in a MAP. Written using the octal code \002when explicitly specified in CREATE TABLE statements.

^C Separate the key from the corresponding value in MAP key-value pairs. Written using the octal code \003when explicitly specified in CREATE TABLEstatements

19. Customizing Table Storage Formats：

CREATE TABLE employees (

subordinates ARRAY<STRING>,

deductions MAP<STRING, FLOAT>,

address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>

)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\001'

COLLECTION ITEMS TERMINATED BY '\002'

MAP KEYS TERMINATED BY '\003'

LINES TERMINATED BY '\n'

STORED AS TEXTFILE;

20. DROP TABLE IF EXISTS tablename;

--(如果不加IF EXISTS字句，当表不在时将返回error)

--（For managed tables, the table metadata and data are deleted.）

--（For external tables, the metadata is deleted but the data is not.）

21.alter table ：（Adding, Modifying, and Dropping a Table Partition）

给表重命名：

ALTER TABLE log_messages RENAME TO logmsgs;

给表添加分区：

ALTER TABLE log_messages ADD IF NOT EXISTS

PARTITION (year = 2011, month = 1, day = 1) LOCATION '/logs/2011/01/01'

PARTITION (year = 2011, month = 1, day = 2) LOCATION '/logs/2011/01/02'...;

--(location后面的目录名可以随意写,与partition后面的无关)

改变分区位置：

ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2) SET LOCATION 's3n://ourbucket/logs/2011/01/02';

--( This command does not move the data from the old location, nor does it delete the old data.)

删除表的分区：

ALTER TABLE log_messages DROP IF EXISTS PARTITION(year = 2011, month = 12, day = 2);

--（The IF EXISTS clause is optional, as usual. For managed tables, the data for the partition is deleted, along with the metadata, even if the partition was created using ALTER TABLE … ADD PARTITION. For external tables, the
data is not deleted.）

22. Changing Columns：

--（You can rename a column, change its position, type, or comment:）

ALTER TABLE log_messages CHANGE

COLUMN hms hours_minutes_seconds INT

COMMENT 'The hours, minutes, and seconds part of the timestamp' AFTER severity;

--(你必须指定老名字，新名字以及类型，即使名字和类型不改变。这关键字COLUMN，COMMENT子句是可选的，如果你不移动列，这ARTER other_column 子句是不必要的。在这个例子中，我们移动列在severity列之后。如果你想要移列到列之前，用FIRST 代替 [ AFTER other_column ] 整个子句。)

23. Adding Columns：

--（You can add new columns to the end of the existing columns, before any partition columns.）

ALTER TABLE log_messages ADD COLUMNS (

app_name STRING COMMENT 'Application name',

session_id LONG COMMENT 'The current session id');

24. Deleting or Replacing Columns:

下面的例子移除了所有存在的列并用新指定的列替换它们:

ALTER TABLE log_messages REPLACE COLUMNS (

hours_mins_secs INT COMMENT 'hour, minute, seconds from timestamp',

severity STRING COMMENT 'The message severity'

message STRING COMMENT 'The rest of the message');

--（这声明高效的重命名了 hms列并且移除了 server和process_id列从原来的模式定义.只改变元数据信息。

这 REPLACE声明只可以被用于表-此表用一个本地的 SerDe模块: DynamicSerDeor MetadataTypedColumnsetSerDe. 重新调用这 SerDe 决定记录怎用被反序列化为列and记录的列怎样被序列化写进存储）

25. Alter Table Properties:

--(You can add additional table properties or modify existing properties, but not remove them:)

ALTER TABLE log_messages SET TBLPROPERTIES (

'notes' = 'The process id is no longer captured; this column is always NULL');

26. Alter Storage Properties:

ALTER TABLE log_messages PARTITION(year = 2012, month = 1, day = 1) SET FILEFORMAT SEQUENCEFILE;

--(The following example demonstrates how to add new SERDEPROPERTIES for the current SerDe:)

ALTER TABLE table_using_JSON_storage

SET SERDEPROPERTIES (

'prop3' = 'value3',

'prop4' = 'value4');

You can alter the storage properties :

ALTER TABLE stocks

CLUSTERED BY (exchange, symbol)

SORTED BY (symbol)

INTO 48 BUCKETS;

--(The SORTED BY clause is optional, but the CLUSTER BY and INTO … BUCKETS are required.)

27. Miscellaneous Alter Table Statements:

Hive CLI:

hive -e 'ALTER TABLE log_messages TOUCH PARTITION(year = 2012, month = 1, day = 1);'

--(This statement won’t create the table or partition if it doesn’t already exist. Use the appropriate creation commands in that case.)

ALTER TABLE log_messages ARCHIVE

PARTITION(year = 2012, month = 1, day = 1);

--( The ALTER TABLE … ARCHIVE PARTITION statement captures the partition files into a Hadoop archive (HAR) file. This only reduces the number of files in the filesystem, reducing the load on the NameNode, but doesn’t provide any space savings (e.g., through

compression):)

--(To reverse the operation, substitute UNARCHIVE for ARCHIVE. This feature is only available for individual partitions of partitioned tables.)

various protections are available. The following statements prevent the partition from being dropped and queried:

ALTER TABLE log_messages PARTITION(year = 2012, month = 1, day = 1) ENABLE NO_DROP;

ALTER TABLE log_messages PARTITION(year = 2012, month = 1, day = 1) ENABLE OFFLINE;

--(To reverse either operation, replace ENABLE with DISABLE. These operations also can’t be used with nonpartitioned tables.)

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航