您的位置:首页 > 数据库

如何使用SQL窗口子句减少语法开销

2018-10-25 19:06 459 查看

SQL是一种冗长的语言,其中最冗长的特性之一是窗口函数.

在.最近遇到的堆栈溢出问题,有人要求计算某一特定日期的时间序列中的第一个值和最后一个值之间的差额:

输入

volume tstamp

29011 2012-12-28 09:00:00

28701 2012-12-28 10:00:00

28830 2012-12-28 11:00:00

28353 2012-12-28 12:00:00

28642 2012-12-28 13:00:00

28583 2012-12-28 14:00:00

28800 2012-12-29 09:00:00

28751 2012-12-29 10:00:00

28670 2012-12-29 11:00:00

28621 2012-12-29 12:00:00

28599 2012-12-29 13:00:00

28278 2012-12-29 14:00:00

期望输出

first last difference date

29011 28583 428 2012-12-28

28800 28278 522 2012-12-29

如何编写查询

请注意,值和时间戳级数可能不相关。所以,没有一条规定如果Timestamp2 > Timestamp1然后Value2 < Value1。否则,这个简单的查询就能工作(使用PostgreSQL语法):

SELECT

max(volume) AS first,

min(volume) AS last,

max(volume) - min(volume) AS difference,

CAST(tstamp AS DATE) AS date

FROM t

GROUP BY CAST(tstamp AS DATE);

有几种方法可以在不涉及窗口函数的组中找到第一个和最后一个值。例如:

在Oracle中,可以使用第一和最后函数,由于某些神秘原因,这些函数没有编写。FIRST(...) WITHIN GROUP (ORDER BY ...)或LAST(...) WITHIN GROUP (ORDER BY ...),与其他排序集聚合函数一样,但是some_aggregate_function(...) KEEP (DENSE_RANK FIRST ORDER BY ...)。围棋数字
在PostgreSQL中,可以使用DISTINCT ON语法与 ORDER BY和LIMIT
有关各种方法的更多细节可以在这里找到:
https://blog.jooq.org/2017/09/22/how-to-write-efficient-top-n-queries-in-sql

最好的方法是使用像Oracle这样的聚合函数,但是很少有数据库具有这种功能。所以,我们将使用FIRST_VALUE和LAST_VALUE窗口函数:

SELECT DISTINCT

first_value(volume) OVER (

PARTITION BY CAST(tstamp AS DATE)

ORDER BY tstamp

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

) AS first,

last_value(volume) OVER (

PARTITION BY CAST(tstamp AS DATE)

ORDER BY tstamp

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

) AS last,

first_value(volume) OVER (

PARTITION BY CAST(tstamp AS DATE)

ORDER BY tstamp

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

)

  • last_value(volume) OVER (

    PARTITION BY CAST(tstamp AS DATE)

    ORDER BY tstamp

    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

    ) AS diff,

    CAST(tstamp AS DATE) AS date

FROM t

ORDER BY CAST(tstamp AS DATE)

哎呀。

看上去不太容易读。但它将产生正确的结果。当然,我们可以包装列的定义。FIRST和LAST在派生表中,但这仍然会给我们留下两次窗口定义的重复:

PARTITION BY CAST(tstamp AS DATE)

ORDER BY tstamp

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

援救窗口条款

幸运的是,至少有3个数据库实现了sql标准。WINDOW条款:

MySQL
PostgreSQL
Sybase SQL Anywhere
上面的查询可以重构为这个查询:

SELECT DISTINCT

first_value(volume) OVER w AS first,

last_value(volume) OVER w AS last,

first_value(volume) OVER w

- last_value(volume) OVER w AS diff,

CAST(tstamp AS DATE) AS date

FROM t

WINDOW w AS (

PARTITION BY CAST(tstamp AS DATE)

ORDER BY tstamp

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

)

ORDER BY CAST(tstamp AS DATE)

请注意,如何使用窗口规范来指定窗口名称,就像定义公共表达式一样(WITH条款):

WINDOW

<window-name> AS (<window-specification>)

{ ,<window-name> AS (<window-specification>)... }

我不仅可以重用整个规范,还可以根据部分规范构建规范,并且只重用部分规范。我以前的查询可以这样重写:

SELECT DISTINCT

first_value(volume) OVER w3 AS first,

last_value(volume) OVER w3 AS last,

first_value(volume) OVER w3

- last_value(volume) OVER w3 AS diff,

CAST(tstamp AS DATE) AS date

FROM t

WINDOW

w1 AS (PARTITION BY CAST(tstamp AS DATE)),

w2 AS (w1 ORDER BY tstamp),

w3 AS (w2 ROWS BETWEEN UNBOUNDED PRECEDING

AND UNBOUNDED FOLLOWING)

ORDER BY CAST(tstamp AS DATE)

每个窗口规范可以从头创建,也可以基于先前定义的窗口规范。注在引用窗口定义时也是如此。如果我想重用PARTITION BY条款和ORDER BY子句,但请更改FRAME条款(ROWS ...),那么我就可以这样写了:

SELECT DISTINCT

first_value(volume) OVER (

w2 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS first,

last_value(volume) OVER (

w2 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

) AS last,

first_value(volume) OVER (

w2 ROWS UNBOUNDED PRECEDING

) - last_value(volume) OVER (

w2 ROWS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING

) AS diff,

CAST(tstamp AS DATE) AS date

FROM t

WINDOW

w1 AS (PARTITION BY CAST(tstamp AS DATE)),

w2 AS (w1 ORDER BY tstamp)

ORDER BY CAST(tstamp AS DATE)

内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  SQL 数据库