elasticsearch data importing
2015-11-18 15:07
239 查看
ElasticSearch stores each piece of data in a document.
That's what I need.
Using the bulk API.
Transform the raw data file from data.json to be new_data.json .
And then do this to import data to ElasticSearch :
For example, I now have a raw JSON data file as following:
The file data.json
Then I need to import these data to elasticsearch. So I have to manipulate this file by naming its index and type.
A new file will be created new_data.json
There are information above each of the data line in the file new_data.json
And if the JSON data file contains data those are not in the same _index or _type, just change the {"index":{"_******** line
Here is an example of a valid JSON file for elasticsearch.
full_data.json
Notice that : There are 2 indexes in the file above. They are myindex1 and myindex2
And the data schema in index myindex2 is different from that in index myindex1 .
That's why it's so important to have so many lines of {"index":{"_******** in the new data file.
-----
Now I am coding a python scripe to manipulate with some raw JSON data files.
Let's assume each line of the JSON data file are in the same schema. And I will do this to generate the schema out.
example_raw_data.json
-------------Updated on 27th Nov. 2015 ----------
I solved this by inventing a new wheel
You can check this out:
https://github.com/xros/json-py-es
-------------Updated on 28th Nov. 2015 at 01:33 A.M. ----------
I wrote this module and it works!
Happy hacking!
That's what I need.
Using the bulk API.
Transform the raw data file from data.json to be new_data.json .
And then do this to import data to ElasticSearch :
curl -s -XPOST 'localhost:9200/_bulk' --data-binary @new_data.json
For example, I now have a raw JSON data file as following:
The file data.json
{"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"} {"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"} {"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}
Then I need to import these data to elasticsearch. So I have to manipulate this file by naming its index and type.
A new file will be created new_data.json
{"index":{"_index":"myindex1","_type":"mytype1"}} {"key1":"valueA_row_1","key2":"valueB_row_1","key3":"valueC_row_1"} {"index":{"_index":"myindex1","_type":"mytype1"}} {"key1":"valueA_row_2","key2":"valueB_row_2","key3":"valueC_row_2"} {"index":{"_index":"myindex1","_type":"mytype1"}} {"key1":"valueA_row_3","key2":"valueB_row_3","key3":"valueC_row_3"}
There are information above each of the data line in the file new_data.json
And if the JSON data file contains data those are not in the same _index or _type, just change the {"index":{"_******** line
Here is an example of a valid JSON file for elasticsearch.
full_data.json
{"index":{"_index":"myindex1","_type":"mytype1"}} {"key1":"value1","key2":"value2","key3":"value3"} {"index":{"_index":"myindex1","_type":"mytype1"}} {"key1":"abcde","key2":"efg","key3":"klm"} {"index":{"_index":"myindex2","_type":"mytype2"}} {"newkey":"newvalue"}
Notice that : There are 2 indexes in the file above. They are myindex1 and myindex2
And the data schema in index myindex2 is different from that in index myindex1 .
That's why it's so important to have so many lines of {"index":{"_******** in the new data file.
-----
Now I am coding a python scripe to manipulate with some raw JSON data files.
Let's assume each line of the JSON data file are in the same schema. And I will do this to generate the schema out.
example_raw_data.json
import sys def get_schema(): """ """ return None if __name__ == "__main__": print(get_schema)
-------------Updated on 27th Nov. 2015 ----------
I solved this by inventing a new wheel
You can check this out:
https://github.com/xros/json-py-es
-------------Updated on 28th Nov. 2015 at 01:33 A.M. ----------
pip install jsonpyes
I wrote this module and it works!
Happy hacking!
相关文章推荐
- ShareSDK短信验证码集成详细步骤
- 无缝滚动scrollLeft
- 【黑马程序员】OC-Foundation框架—NSArray和NSMutableArray
- android中的broadcastreceiver不可以做耗时操作
- VNC Could not install VNC Server: 1603
- 我的后端开发书架2015 2.0版
- 《第一行代码--Android》读书笔记之UI篇
- ArcGIS API for Javascript在tomcat下的离线部署--基于3.9版本
- Hibernate—Restrictions
- WebService是什么?
- CentOS 中 YUM 安装桌面环境(转)
- extjs store中数据转换成json
- 折腾Ipython
- Android 你不知道的Service(服务) & Thread(线程)
- Java 8特性探究(1):通往lambda之路_语法篇
- jQuery简介及语法
- 关于RESTful
- System.out. 输出到指定文件中
- 给大家推荐两个单文件制作工具
- Java实现定时任务的三种方法