您的位置:首页 > 其它

延云 YDB版本v1.0.5-beta版本上线(支持hive与spark查询) 2015-12-28 13:13 阅读(0)

2015-12-28 13:14 330 查看


本次新增如下功能:

使用hive来读取ydb的数据进行分析。

使用spark来读取ydb的数据进行分析。

通过编程来导出到其他系统中的接口。

Mapreduce- InputFormat接口。


YDB下载地址:

您必须同意授权使用协议才允许使用该软件 

[b]授权协议下载
[/b])

当前版本v1.0.5-beta

获取延云YDB

360云盘获取:http://yunpan.cn/cuHD72ifTWtz2 提取码: 5928

使用hive来读取ydb的数据进行分析

通过ydb与hive的数据对接,可以利用hive或者spark对ydb的功能进行拓展,实现(如多表关联,中位数,SQL嵌套等)复杂的查询。

添加依赖的jar包

add jar /data/xxx.xxx.xx/ydb-x.x.x-pg.jar ;

将ydb的表与hive的表进行映射

注意两点
1.仅映射必须的字段,无用的字段尽量别映射。
2.映射使用where过滤条件将映射的记录条数限制的越少越好。
通过上述两点,可以减少ydb传递给hive的数据量,ydb本身的磁盘IO也会变小,故可提高效率。
 
 

映射示例一:

CREATE external  TABLE ydbhive_example (

4000
tradetime string,tradenum string,tradeid string,nickname string,cardnum string
)    
STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (
"ydb.handler.hostport"="101.200.130.48:8080",
"ydb.handler.sql.key"="ydb.sql.ydbhive_example",
 "ydb.handler.sql"=" select tradetime,tradenum,tradeid,nickname,cardnum from ydb_example_trade where ydbpartion='20151011' 
and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100'  limit 0,10"
);
 
映射示例二:

CREATE external  TABLE ydbhive_example_bigdata (
phonenum string, ydb_sex string, ydb_province string, ydb_grade string, ydb_age string
)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (
"ydb.handler.hostport"="101.200.130.48:8080",
"ydb.handler.sql.key"="ydb.sql.ydbhive_example_bigdata",
 "ydb.handler.sql"=" select phonenum,ydb_sex,ydb_province,ydb_grade,ydb_age from  ydb_example_ads where ydbpartion='20151111'  and (ydb_grade='博士')
and ydb_sex='女' and ydb_province='北京' and ydbkv='export.joinchar:%01' and
ydbkv='export.max.return.docset.size:1000000' and ydbkv='max.return.docset.size:100000000'
 limit 0,10"
);
 
映射示例三:

为了节省IO,部分查询可以在ydb端做预聚合,减轻hive与spark的压力

CREATE external  TABLE ydbhive_example_groupby (province string, bank  string, amt double,cnt double)   

STORED BY 'cn.net.ycloud.ydb.handle.YdbStorageHandler' 

TBLPROPERTIES  (
"ydb.handler.hostport"="101.200.130.48:8080",
"ydb.handler.sql.key"="ydb.sql.ydbhive_example_groupby",
 ' ydb.handler.sql'=" select province,bank,sum(amt),count(*) from ydb_example_trade where ydbpartion='20151011'  and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:100'
group by province,bank limit 0,10"
);
 

查询示例;

示例一:

select * from
ydbhive_example limit 10;
select count(*) from
ydbhive_example limit 10;
select tradeid,count(*) from
ydbhive_example group by tradeid limit 10;
 
示例二:
select
ydb_sex,ydb_province,ydb_grade,ydb_age,count(*) as cnt from ydbhive_example_bigdata group by
ydb_sex,ydb_province,ydb_grade,ydb_age order by cnt desc limit 100
 
select count(*) from
ydbhive_example_bigdata limit 10
 
示例三:
在hive端对ydb的聚合结果做进一步的查询
select * from
ydbhive_example_groupby  limit 10

 
select province,bank,sum(amt),sum(cnt) as cnt from
ydbhive_example_groupby group by
province,bank  order by cnt desc limit 100

 

查询过程中动态改变表的映射

设置的属性的名字由,创建表时候配置的ydb.handler.sql.key的值指定

示例一

set
 ydb.sql.ydbhive_example_bigdata=" select phonenum,ydb_sex,ydb_province,ydb_grade,ydb_age from  ydb_example_ads where ydbpartion='20151111'  and ydb_province='辽宁' and ydbkv='export.joinchar:%01'
and ydbkv='export.max.return.docset.size:1000000' and ydbkv='max.return.docset.size:100000000'  limit 0,10";
select count(*) from
ydbhive_example_bigdata limit 10
 
 
示例二

set
 ydb.sql.ydbhive_example_groupby ="
select province,bank,sum(amt),count(*) from ydb_example_trade where ydbpartion='20151011'  and province='辽宁省'  and ydbkv='export.joinchar:%01' 
and ydbkv='export.max.return.docset.size:1000000' and ydbkv='max.return.docset.size:100000000'  group by province,bank limit
0,10";
select province,bank,sum(amt),sum(cnt) as cnt from
ydbhive_example_groupby group by
province,bank  order by cnt desc limit 100

 

不需要的列可以使用higoempty_ex{N}_s占位,用以节省IO

set
 ydb.sql.ydbhive_example_bigdata=" select higoempty_ex1_s,ydb_sex, higoempty_ex2_s, higoempty_ex3_s, higoempty_ex4_s from  ydb_example_ads where ydbpartion='20151111'  and ydb_province='北京'
and ydbkv='export.joinchar:%01' and ydbkv='export.max.return.docset.size:1000000' and ydbkv='max.return.docset.size:100000000'  limit 0,10";
select
ydb_sex ,count(*) as cnt from ydbhive_example_bigdata group by
ydb_sex order by cnt desc limit 100
 
select * from
ydbhive_example_bigdata limit 10
 

使用spark来读取ydb的数据进行分析

spark操作ydb几乎与hive完全一样。

 

但由于spark不支持add jar
方法,记得配置 SPARK_CLASSPATH
 
示例如下:
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/data/ycloud/ycloud/ydb/lib/ydb-1.0.5-pg.jar
 
 
 

导出到其他系统中的接口

编程示例如下:

 

                  
String master="101.200.130.48:8080"

        String exportSql=" select tradetime,tradenum,tradeid,nickname,cardnum from ydb_example_trade where ydbpartion='20151011' and ydbkv='export.joinchar:%09' and ydbkv='export.max.return.docset.size:30'
limit 0,10 ";

 
       
HiveYdbTableInputFormat
format=new
HiveYdbTableInputFormat();
           YdbInputSplit[]
list=format.getSplits(master,
args[1],
"");
          

           Syst
db55
em.out.println(Arrays.toString(list));
          
//这个步骤仅仅为了随机
           HashMap<Integer, YdbInputSplit>
randomMap=new
HashMap<Integer, YdbInputSplit>();
          
for(YdbInputSplit
split:list)
           {
                
randomMap.put((int)
(Math.random()*1000000), split);
           }
          

          
//这里可以考虑多线程,并发导出
          
for(Entry<Integer, YdbInputSplit>
e:randomMap.entrySet())
           {
                 YdbInputSplit
split=e.getValue();
                 System.out.println("#######################");
                 System.out.println(split.toString());
                

                 YdbRecordReader
reader =new
YdbRecordReader(split);
                 LongWritable
key=new
LongWritable();
                 BytesWritable
wr=new
BytesWritable();
                
while(reader.next(key,
wr))
                 {
                                 System.out.println(reader.getProgress()+"\t"+reader.getPos()+"\t"+reader.getTotal()+"\t"+new
String(wr.getBytes(),0,wr.getLength(),"utf-8"));
                 }
                
reader.close();
           }
 
 
注:的exportSql只能是select
不能是其他统计的SQL

 

Mapreduce- InputFormat接口

使用示例如下:

 

String master="101.200.130.48:8080"

        String exportSql=" select tradetime,tradenum,tradeid,nickname,cardnum from ydb_example_trade where ydbpartion='20151011' and ydbkv='export.joinchar:%09' and ydbkv='export.max.return.docset.size:30'
limit 0,10 ";

 

 
 

job.setInputFormatClass(HiveYdbTableInputFormat.class);
HiveYdbTableInputFormat.setYdb(job.
getConfiguration(),master, exportSql);
 
注:的exportSql只能是select
不能是其他统计的SQL
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: