当前位置:Gxlcms > 数据库问题 > Calcite - StreamingSQL

Calcite - StreamingSQL

时间:2021-07-01 10:21:17 帮助过:17人阅读

STREAM rowtime, productId, units, SUM(units) OVER (ORDER BY rowtime RANGE INTERVAL 1 HOUR PRECEDING) unitsLastHour FROM Orders;

这个和groupby的区别在于,窗口触发的时机

对于groupby,时间整点触发,会将窗口里records计算成一个值

而OVER,是record by record,每来一条record都会触发一次计算,上面的例子是,对每条record都会触发一次前一个小时的sum

技术分享

这里更加复杂,

先声明Window product,表示order by rowtime,partition by productId
再基于product,OVER生成7天和10分钟的AVG(units)

 

 

子查询

The previous HAVING query can be expressed using a WHERE clause on a sub-query:

having也可以实现成子查询的形式

技术分享

 

Since then, SQL has become a mathematically closed language, which means that any operation you can perform on a table can also perform on a query.

The closure property of SQL is extremely powerful. Not only does it render HAVING obsolete (or, at least, reduce it to syntactic sugar), it makes views possible:

技术分享

sql具有闭合特性,即任何可以在table上执行的操作,也同样可以在query上执行,因为query的结果也是一个关系表

所以上面通过create view创建子查询

Many people find that nested queries and views are even more useful on streams than they are on relations.
Streaming queries are pipelines of operators all running continuously, and often those pipelines get quite long. Nested queries and views help to express and manage those pipelines.

嵌套查询对于Streaming非常有用,因为流其实就是一组operators的pipelines;以嵌套查询或view的方式去表示会很方便

 

And, by the way, a WITH clause can accomplish the same as a sub-query or a view:

技术分享

With关键词,用于实现子查询或view

 

Sorting

技术分享

 

Joining streams to tables

A stream-to-table join is straightforward if the contents of the table are not changing.

技术分享

这个很直接,但有个问题是,静态表是会变化的,当数据record流过来时,我们需要和record发生时静态表做join,但如果静态表已经变化了,我们只能取到最新值

要解决这个问题,我们需要为静态表,创建版本表,保存每个时间的版本

One way to implement this is to have a table that keeps every version with a start and end effective date, ProductVersions in the following example:

技术分享

当前会从productVersion里面,根据record rowtime找出包含这个时间的版本

 

Joining streams to streams

技术分享

 

DML

It’s not only queries that make sense against streams; it also makes sense to run DML statements (INSERT, UPDATE, DELETE, and also their rarer cousins UPSERT and REPLACE) against streams.

 

DML is useful because it allows you do materialize streams or tables based on streams, and therefore save effort when values are used often.

Calcite - StreamingSQL

标签:tle   nes   effect   ott   not   ant   height   where   nested   

人气教程排行