Apache Calcite:一款开源 SQL 解析工具

Apache Calcite 是什么?

Apache Calcite™ is a dynamic data management framework.

Calcite Mountain

(图片来源:Pixabay™ 官网,已确认无版权)

基本概念

Catelog

 用于定义 SQL 语义相关的元数据与命名空间

SQL Parser

 负责将 SQL 转化成 AST(Abstract Syntax Tree)

SQL Validator

 负责通过 Catalog 对 AST 进行校证

Query Optimizer

 负责将 AST 转化成物理执行计划、优化物理执行计划

SQL Generator

 负责将物理执行计划反向转化成 SQL 语句

特性

  • 支持标准 SQL 语言
  • 通过适配器(Adapter)可以支持连接任何数据源
  • 支持丰富的关系代数(并集、交集、连接、笛卡尔积等)
  • 支持对逻辑规划规则进行定制(例如 Filter 下推)
  • 支持成本模型优化(CBO, Cost-Based Optimizer 和 RBO, Rule-Based Optimizer)
  • 支持管理物化视图(Materialized view
  • 支持查询流式数据
  • 稳定可靠(开发迭代 10 年以上)
  • 已贡献给 Apache 基金会(于 2013 年)
  • 开源社区活跃(Apache Druid、Apache Hive、Apache Drill、Apache FlinkApache Phoenix 等项目均在使用)
Apache Calcite 借助开源的 JavaCC 完成 SQL 解析,将 SQL 语句转化为 Java 代码
Apache Calcite 还使用了轻量级 Janino 编译运行时 Java 代码,以便灵活地管理元数据

架构

整体架构

Apache Calcite Architecture

(图片来源:arxiv.org™)

专注的层面

graph TD

query_language(fa:fa-language Query Language)
query_optimization(fa:fa-fast-forward Query Optimization)
query_execution(fa:fa-spinner Query Execution)
data_management(fa:fa-cog Data Management)
data_storage(fa:fa-database Data Storage)

query_language ==> query_optimization
query_optimization ==> query_execution
query_execution ==> data_management
data_management ==> data_storage

style data_management fill:#808080
style data_storage fill:#808080
其中,数据管理和存储,交由第三方计算和存储引擎实现

解析流程

graph LR

SQL[SQL]
Parser(fa:fa-align-left Parser)
AST[AST]
Validate(fa:fa-check-square-o Validate)
RelNode[RelNode]
Optimize(fa:fa-fast-forward Optimize)
Plan[Plan]
Execute(fa:fa-spinner Execute)

SQL --> Parser
Parser --> AST
AST --> Validate
Validate --> RelNode
RelNode --> Optimize
Optimize --> Plan
Plan --> Execute

style Parser fill:#0099FF
style Validate fill:#0099FF
style Optimize fill:#0099FF
style Execute fill:#0099FF

RBO

谓词下推

graph BT

Set1(fa:fa-table Set1)
Set2(fa:fa-table Set2)
Join(fa:fa-compress Join)
Filter(fa:fa-filter Filter)
Set1'(fa:fa-table Set1)
Set2'(fa:fa-table Set2)
Join'(fa:fa-compress Join)
Filter'(fa:fa-filter Filter)

Set1 --> Join
Set2 ==> Join
Join ==> Filter

Set1' --> Join'
Set2' ==> Filter'
Filter' ==> Join'

style Filter fill:#009933
style Filter' fill:#009933

列裁剪

graph BT

Set1(fa:fa-table Set1)
Set2(fa:fa-table Set2)
Join(fa:fa-compress Join on Set1.id == Set2.id)
Res(fa:fa-table Set1.name, Set1.age)
Set1'(fa:fa-table Set1)
Set2'(fa:fa-table Set2)
Crop'(fa:fa-cut Crop and only keep id)
Join'(fa:fa-compress Join on Set1.id == Set2.id)
Res'(fa:fa-table Set1.name, Set1.age)

Set1 --> Join
Set2 ==> Join
Join ==> Res

Set1' --> Join'
Set2' ==> Crop'
Crop' ==> Join'
Join' ==> Res'

style Crop' fill:#009933

常量折叠

graph BT

Set1(fa:fa-table Set1)
Set2(fa:fa-table Set2)
Join(fa:fa-compress Join)
Res(fa:fa-table Set1.name, Set1.age, 1024 + 996)
Set1'(fa:fa-table Set1)
Set2'(fa:fa-table Set2)
Join'(fa:fa-compress Join)
Res'(fa:fa-table Set1.name, Set1.age, 2020)

Set1 --> Join
Set2 --> Join
Join --> Res

Set1' --> Join'
Set2' --> Join'
Join' --> Res'

style Res' fill:#009933

比对

查询语言 JDBC SQL 解析与校验 Relational Algebra Execution Engine
Apache Drill SQL Native
Apache Druid SQL Native
Apache Phoenix SQL Apache HBase
Apache Solr SQL Native, Enumerable, Apache Lucene
Apache Storm SQL Native
Apache Kylin SQL Enumerable, Apache HBase
MapD SQL Native
Lingual SQL Cascading
Apache Hive SQL Apache Tez, Apache Spark
Qubole Quark SQL Apache Hive, Presto
Apache Apex Streaming SQL Native
Apache Flink Streaming SQL Native
Apache Samza Streaming SQL Native

实战

下载

1
$ git clone --depth 1 --single-branch --branch master https://github.com/apache/calcite.git

编译

1
2
$ cd calcite/example/csv
$ ./sqlline

链接

1
sqlline> !connect jdbc:calcite:model=src/test/resources/model.json admin admin

展示所有 Tables

1
0: jdbc:calcite:model=src/test/resources/mode> !tables
1
2
3
4
5
6
7
8
9
+-----------+-------------+------------+--------------+---------+----------+------------+-----------+---------------------------+----------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_CAT | TYPE_SCHEM | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION |
+-----------+-------------+------------+--------------+---------+----------+------------+-----------+---------------------------+----------------+
| | SALES | DEPTS | TABLE | | | | | | |
| | SALES | EMPS | TABLE | | | | | | |
| | SALES | SDEPTS | TABLE | | | | | | |
| | metadata | COLUMNS | SYSTEM TABLE | | | | | | |
| | metadata | TABLES | SYSTEM TABLE | | | | | | |
+-----------+-------------+------------+--------------+---------+----------+------------+-----------+---------------------------+----------------+

查询

1
0: jdbc:calcite:model=src/test/resources/mode> SELECT * FROM emps;
1
2
3
4
5
6
7
8
9
10
+-------+-------+--------+--------+---------------+-------+------+---------+---------+------------+
| EMPNO | NAME | DEPTNO | GENDER | CITY | EMPID | AGE | SLACKER | MANAGER | JOINEDAT |
+-------+-------+--------+--------+---------------+-------+------+---------+---------+------------+
| 100 | Fred | 10 | | | 30 | 25 | true | false | 1996-08-03 |
| 110 | Eric | 20 | M | San Francisco | 3 | 80 | | false | 2001-01-01 |
| 110 | John | 40 | M | Vancouver | 2 | null | false | true | 2002-05-03 |
| 120 | Wilma | 20 | F | | 1 | 5 | | true | 2005-09-07 |
| 130 | Alice | 40 | F | Vancouver | 2 | null | false | true | 2007-01-01 |
+-------+-------+--------+--------+---------------+-------+------+---------+---------+------------+
5 rows selected (1.266 seconds)

Join

1
2
3
4
0: jdbc:calcite:model=src/test/resources/mode> SELECT d.name, COUNT(*)
. . . . . . . . . . . . . . . . . . semicolon> FROM emps AS e
. . . . . . . . . . . . . . . . . . semicolon> JOIN depts AS d ON e.deptno = d.deptno
. . . . . . . . . . . . . . . . . . semicolon> GROUP BY d.name;
1
2
3
4
5
6
7
+-----------+--------+
| NAME | EXPR$1 |
+-----------+--------+
| Sales | 1 |
| Marketing | 2 |
+-----------+--------+
2 rows selected (0.336 seconds)

内置函数

1
0: jdbc:calcite:model=src/test/resources/mode> VALUES CHAR_LENGTH('Hello, ' || 'world!');
1
2
3
4
5
6
+--------+
| EXPR$0 |
+--------+
| 13 |
+--------+
1 row selected (0.092 seconds)

社区发展

Star 趋势

Apache Calcite Star History

(图片来源:star-history.t9t.io™ 官网)

个人贡献

 详见:《如何成为 Apache 的 PMC

资料

Doc

Paper

Github

Blog

Apache Calcite

JavaCC

欢迎加入我们的技术群,一起交流学习

群名称 群号
人工智能(高级)
人工智能(进阶)
BigData
算法