您的位置:首页 > 数据库

Comparison between Hive, Impala, Drill and SparkSQL

2018-01-16 18:12 621 查看
Hive
Impala
Drill
SparkSQL
Project Goal
Offline batch
processing stuff;
Long running job performing data
heavy operation, such as joins on huge data sets
Run
real-time queries
on top of existing Hadoop warehouse
Provides
distributed query capability across multiple big data platform.
Query data from any or all of those data sources at the same time and can push down into the underlying storage system.
Execute SQL query, then deal with the result sets.
Similarity
Impala is designed based on Hive.
Using the same metadata.
All designed for Hadoop env.
Support query data from a variety of different datasources. (RDBMS, NoSQL, File, JSON...)
All support JDBC/ODBC drivers.
 
 
 
 
 
Difference
Suitable for Offline data processing
Focus on online real-time data processing
Not only hadoop project
 
 
 
 
Schema Free: all data is internally represented as either a simple or complex JSON data structure
 
 
 
Fully support SQL Query
(ANSI SQL:2003)
Just have SQL query capabilities
Subset of SQL (SQL-Like)
 
 
Supported by many BI tools
 
 
 
 
Better security support for data accessing
References:

https://www.javacodegeeks.com/2015/12/apache-spark-vs-apache-drill.html
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: