您的位置：首页 > 数据库

PostgreSQL window function

2015-08-17 21:47 405 查看

Postgresql
window function 第一篇

window function，简单来说就是窗口函数。postgresql 在8.4版本开始有了窗口函数的特性。

看多很多中文解释，找不到合适的解释什么是窗口函数，有句英文很好的诠释了什么是窗口函数。“ The
whole idea behind window functions is to allow you to process several values of the result set at a time: you see through the window some peer rows and are able to compute a single output value from them, much like when using an aggregate function. ”

先来介绍一下postgresql里面关于窗口函数的语法吧。

function name over ( partition by column order by column [ RANGE | ROWS ] BETWEEN frame_start AND frame_end);

这是简化的版本，官网上对于语句写的更详细，有兴趣的可以自己去看看。对于第一次看见的人来说，这么长的语法可能一下子无法接受，那我们就慢慢一步一步来的说。

首先创建下面这张表

点击(此处)折叠或打开

create table empsalary

(

depname varchar(20),

empno varchar(20),

salary integer

)

然后插入数据

点击(此处)折叠或打开

postgres=# select * from empsalary;

depname | empno | salary

----------+-------+--------

develop | 11 | 5200

develop | 7 | 4200

develop | 9 | 4500

personel | 5 | 3500

personel | 6 | 6500

personel | 12 | 6500

personel | 15 | 8900

(7 行记录)

EXAMPLE 1

只使用 function name over（），这是窗口函数使用最简单的方式了吧。

点击(此处)折叠或打开

postgres=# select depname, empno, salary, sum(salary) over() from empsalary;

depname | empno | salary | sum

----------+-------+--------+-------

develop | 11 | 5200 | 39300

develop | 7 | 4200 | 39300

develop | 9 | 4500 | 39300

personel | 5 | 3500 | 39300

personel | 6 | 6500 | 39300

personel | 12 | 6500 | 39300

personel | 15 | 8900 | 39300

(7 行记录)

光看这个结果，可以看出最后相等的一列就是表是所有行的salary的总值。仔细体会上面那句英文：you see through the window some peer rows and are able to compute a single output value from them。这里的single
output value 就是sum，这里的rows就是说有的表中所有的行。这就可以很好的理解了窗口函数的意思了吧。由于这里没有对窗口进行什么设定，所以看到的就是整个表当中的数据。

关于窗口函数就先写到这里吧。

Postgresql
window function 第二篇

上一篇很好的解释了什么是窗口函数，并且举了一个最简单的例子，帮助理解什么是窗口函数。接下来我们来更深入的理解postgresql的窗口函数。还是借用上一篇新建的表来讲解。

点击(此处)折叠或打开

postgres=# \\d
empsalary

资料表 \"public.empsalary\"

栏位 | 型别 | 修饰词

---------+-----------------------+--------

depname | character varying(20) |

empno | character varying(20) |

salary | integer |

EXAMPLE 2：

function name over ( partition by column) 这里的partition by 子句用于对行进行分组的。

点击(此处)折叠或打开

postgres=# select depname, empno, salary, sum(salary) over(partition by depname)
from empsalary;

depname | empno | salary | sum

----------+-------+--------+-------

develop | 11 | 5200 | 13900

develop | 7 | 4200 | 13900

develop | 9 | 4500 | 13900

personel | 5 | 3500 | 25400

personel | 6 | 6500 | 25400

personel | 12 | 6500 | 25400

personel | 15 | 8900 | 25400

(7 行记录)

由于这次对窗口进行了限制，每一行只能看见自己的分组，所以develop组的sum都是一样的，而personel组的sum是一样的。

EXAMPLE3：

function name over (order by column)

点击(此处)折叠或打开

postgres=# select depname, empno, salary, sum(salary) over(order by salary) from

empsalary;

depname | empno | salary | sum

----------+-------+--------+-------

personel | 5 | 3500 | 3500

develop | 7 | 4200 | 7700

develop | 9 | 4500 | 12200

develop | 11 | 5200 | 17400

personel | 6 | 6500 | 30400

personel | 12 | 6500 | 30400

personel | 15 | 8900 | 39300

(7 行记录)

上面的结果是按照salary的正序排列的，但是sum列显得很奇怪，好像是累积的加法，好像不是。这就取决于，到底这里的每一行从窗口看看到了什么，我们可以用以下的语句看一下。

点击(此处)折叠或打开

postgres=# select depname, empno, salary,array_agg(salary) over(order by salary)

,sum(salary) over(order by salary) from empsalary;

depname | empno | salary | array_agg | sum

----------+-------+--------+--------------------------------------+-------

personel | 5 | 3500 | {3500}
| 3500

develop | 7 | 4200 | {3500,4200}
| 7700

develop | 9 | 4500 | {3500,4200,4500}
| 12200

develop | 11 | 5200 | {3500,4200,4500,5200}
| 17400

personel | 6 | 6500 | {3500,4200,4500,5200,6500,6500}
| 30400

personel | 12 | 6500 | {3500,4200,4500,5200,6500,6500}
| 30400

personel | 15 | 8900 | {3500,4200,4500,5200,6500,6500,8900} | 39300

(7 行记录)

每一行能看见的结果都在array_agg列当中可以看出，这就很容易理解为什么sum的值会变成现在像上面一样奇怪的结果吧。其实这里的

EXAMPLE 3:

function name over （order by column [rows | range ] between framestart and frameend）;

rows between 子句无法独自和over一起使用，这个子句的作用也是决定了那些行可以被每一行看到。

点击(此处)折叠或打开

postgres=# select depname, empno, salary,sum(salary) over(order by salary rows between
unbounded preceding and current row) from empsalary;

depname | empno | salary | sum

----------+-------+--------+-------

personel | 5 | 3500 | 3500

develop | 7 | 4200 | 7700

develop | 9 | 4500 | 12200

develop | 11 | 5200 | 17400

personel | 6 | 6500 | 23900

personel | 12 | 6500 | 30400

personel | 15 | 8900 | 39300

(7 行记录)

现在的结果是安salary从小到大排列的，sum刚好就是累积的加法运算。这里的framestart和frameend只能从固定的三个值当中取：unbounded
preceding ， current row， unbounded following。第一个表示第一行，第二个表示当前行，最后一个表示最后一行，而且他们顺序也不能调换。这能是unbounded
preceding and current row 或者是current row and unbounded following 再或者就是 unbounded
preceding and unbounded following。

至此所有的部分都已经讲完了，在复杂的用法就是把他们结合起来使用。相信大家能很好的使用窗口函数。

postgresql
windows function 第三篇

前面两篇已经很好的介绍了窗口函数，这篇我们来关注一下function
name这个部分。postgresql有一些内置的函数，专门用于窗口函数。如下

Function	Return Type	Description
row_number()	bigint	number of the current row within its partition, counting from 1
rank()	bigint	rank of the current row with gaps; same as row_number of its first peer
dense_rank()	bigint	rank of the current row without gaps; this function counts peer groups
percent_rank()	double precision	relative rank of the current row: (rank - 1) / (total rows - 1)
cume_dist()	double precision	relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)
ntile(num_buckets integer)	integer	integer ranging from 1 to the argument value, dividing the partition as equally as possible
lag(value any [, offsetinteger [, default any ]])	same type as value	returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default. Bothoffset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null
lead(value any [, offsetinteger [, default any ]])	same type as value	returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default. Bothoffset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null
first_value(value any)	same type as value	returns value evaluated at the row that is the first row of the window frame
last_value(value any)	same type as value	returns value evaluated at the row that is the last row of the window frame
nth_value(value any, nthinteger)	same type as value	returns value evaluated at the row that is the nth row of the window frame (counting from 1); null if no such row

这里我们一个一个的来。

row_number()

点击(此处)折叠或打开

postgres=# select depname, empno, salary, row_number() over() from empsalary;

depname | empno | salary | row_number

----------+-------+--------+------------

develop | 11 | 5200 | 1

develop | 7 | 4200 | 2

develop | 9 | 4500 | 3

personel | 5 | 3500 | 4

personel | 6 | 6500 | 5

personel | 12 | 6500 | 6

personel | 15 | 8900 | 7

(7 行记录)

其实这个函数就是给窗口内的每一行编号，从一号开始。不过如果没有排序的话，这样的编号感觉也没有意思啊。

点击(此处)折叠或打开

postgres=# select depname, empno, salary, row_number() over() from empsalary
order by salary;

depname | empno | salary | row_number

----------+-------+--------+------------

personel | 5 | 3500 | 4

develop | 7 | 4200 | 2

develop | 9 | 4500 | 3

develop | 11 | 5200 | 1

personel | 6 | 6500 | 5

personel | 12 | 6500 | 6

personel | 15 | 8900 | 7

(7 行记录)

怪了，使用order by 但是这里放的地方错误，可以从这里推断，row_number(）是比order by 先执行的。

正确的写法是

点击(此处)折叠或打开

postgres=# select depname, empno, salary, row_number() over(order
by salary) from empsalary ;

depname | empno | salary | row_number

----------+-------+--------+------------

personel | 5 | 3500 | 1

develop | 7 | 4200 | 2

develop | 9 | 4500 | 3

develop | 11 | 5200 | 4

personel | 6 | 6500 | 5

personel | 12 | 6500 | 6

personel | 15 | 8900 | 7

(7 行记录)

rank（）

点击(此处)折叠或打开

postgres=# select depname, empno, salary, rank() over(order
by salary) from empsalary ;

depname | empno | salary | rank

----------+-------+--------+------

personel | 5 | 3500 | 1

develop | 7 | 4200 | 2

develop | 9 | 4500 | 3

develop | 11 | 5200 | 4

personel | 6 | 6500 | 5

personel | 12 | 6500 | 5

personel | 15 | 8900 | 7

(7 行记录)

从结果可以看出，rank其实和row_number()差不多，唯一的差别有两个5，没有6.这就是所谓的gap吧。

dense_rank()

点击(此处)折叠或打开

postgres=# select depname, empno, salary, dense_rank() over(order by salary) from
empsalary ;

depname | empno | salary | dense_rank

----------+-------+--------+------------

personel | 5 | 3500 | 1

develop | 7 | 4200 | 2

develop | 9 | 4500 | 3

develop | 11 | 5200 | 4

personel | 6 | 6500 | 5

personel | 12 | 6500 | 5

personel | 15 | 8900 | 6

(7 行记录)

dense_rank()和rank是一样的，但是区别还是有的，他是没有gap的。

percent_rank() 和cume_dist() 这两个函数不知道有什么用，不过计算的公式就在上面，看看就知道怎么算了。

ntile()

点击(此处)折叠或打开

postgres=# select depname, empno, salary, ntile(3) over(order
by salary) from empsalary ;

depname | empno | salary | ntile

----------+-------+--------+-------

personel | 5 | 3500 | 1

develop | 7 | 4200 | 1

develop | 9 | 4500 | 1

develop | 11 | 5200 | 2

personel | 6 | 6500 | 2

personel | 12 | 6500 | 3

personel | 15 | 8900 | 3

(7 行记录)

这个函数的作用是将结果分组，3 表示分为三组。每一组再编号。

lag(value any [, offsetinteger [, default any ]])

点击(此处)折叠或打开

postgres=# select depname, empno, salary, lag(salary,1,0) over(order
by salary)

from empsalary ;

depname | empno | salary | lag

----------+-------+--------+------

personel | 5 | 3500 | 0

develop | 7 | 4200 | 3500

develop | 9 | 4500 | 4200

develop | 11 | 5200 | 4500

personel | 6 | 6500 | 5200

personel | 12 | 6500 | 6500

personel | 15 | 8900 | 6500

(7 行记录)

获取前一行的对应列，如果没有的话，就用0表示。而lead刚好和他相反。自己试一下就知道了。

first_value()

点击(此处)折叠或打开

postgres=# select depname, empno, salary, first_value(salary) over () from
empsa

lary;

depname | empno | salary | first_value

----------+-------+--------+-------------

develop | 11 | 5200 | 5200

develop | 7 | 4200 | 5200

develop | 9 | 4500 | 5200

personel | 5 | 3500 | 5200

personel | 6 | 6500 | 5200

personel | 12 | 6500 | 5200

personel | 15 | 8900 | 5200

(7 行记录)

获取windows frame的第一行。

last_value() 刚好和first_value()相反，取最后一列。

点击(此处)折叠或打开

postgres=# select depname, empno, salary, last_value(salary) over () from
empsal

ary;

depname | empno | salary | last_value

----------+-------+--------+------------

develop | 11 | 5200 | 8900

develop | 7 | 4200 | 8900

develop | 9 | 4500 | 8900

personel | 5 | 3500 | 8900

personel | 6 | 6500 | 8900

personel | 12 | 6500 | 8900

personel | 15 | 8900 | 8900

(7 行记录)

再来一个

点击(此处)折叠或打开

postgres=# select depname, empno, salary, last_value(salary) over (order
by sala

ry) from empsalary;

depname | empno | salary | last_value

----------+-------+--------+------------

personel | 5 | 3500 | 3500

develop | 7 | 4200 | 4200

develop | 9 | 4500 | 4500

develop | 11 | 5200 | 5200

personel | 6 | 6500 | 6500

personel | 12 | 6500 | 6500

personel | 15 | 8900 | 8900

(7 行记录)

为什么这里会不一样的，仔细想想order by为什么会导致这么样的结果，如果你看了第一篇的话，就能明白了。

而nth_value()我想你自己从字面上就能理解了吧。这里就不讲解了。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航