A common goal of stream processing is to aggregate events into temporal intervals, or windows. For example, to count the number of social media posts per minute or to calculate the average rainfall per hour.
Azure Stream Analytics includes native support for five kinds of temporal windowing functions. These functions enable you to define temporal intervals into which data is aggregated in a query. The supported windowing functions are Tumbling, Hopping, Sliding, Session, and Snapshot.
No, these windowing functions are not exclusive to Azure Stream Analytics. They are commonly used concepts in stream processing and are available in various stream processing frameworks and platforms beyond Azure, such as Apache Flink, Apache Kafka Streams, and Apache Spark Streaming. The syntax and implementation might vary slightly between different platforms, but the underlying concepts remain the same.
Five different types of Window functions
Tumbling Window (Azure Stream Analytics):
A Tumbling Window in Azure Stream Analytics segments data into non-overlapping, fixed-size time intervals. An example query for a Tumbling Window could be:
```sql
SELECT
System.Timestamp() AS WindowStart,
System.Timestamp() AS WindowEnd,
COUNT(*) AS EventCount
INTO
Output
FROM
Input
GROUP BY
TumblingWindow(second, 10)
```
Hopping Window (Azure Stream Analytics):
A Hopping Window in Azure Stream Analytics segments data into fixed-size time intervals, but with an overlap between adjacent windows. An example query for a Hopping Window could be:
```sql
SELECT
System.Timestamp() AS WindowStart,
System.Timestamp() AS WindowEnd,
COUNT(*) AS EventCount
INTO
Output
FROM
Input
GROUP BY
HoppingWindow(second, 10, 5)
```
Sliding Window (Azure Stream Analytics):
A Sliding Window in Azure Stream Analytics continuously moves over the data stream, with each window including a specified number of the most recent events. An example query for a Sliding Window could be:
```sql
SELECT
System.Timestamp() AS WindowStart,
System.Timestamp() AS WindowEnd,
COUNT(*) AS EventCount
INTO
Output
FROM
Input
GROUP BY
SlidingWindow(second, 30)
```
Session Window (Azure Stream Analytics):
A Session Window in Azure Stream Analytics groups events that occur within a specified period of inactivity into individual sessions. An example query for a Session Window could be:
```sql
SELECT
SessionWindow(),
COUNT(*) AS EventCount
INTO
Output
FROM
Input
GROUP BY
SessionWindow(), DeviceId
```
Snapshot Window (Azure Stream Analytics):
A Snapshot Window in Azure Stream Analytics captures the current state of a stream at a specific point in time. An example query for a Snapshot Window could be:
```sql
SELECT
System.Timestamp() AS SnapshotTime,
*
INTO
Output
FROM
Input
WHERE
System.Timestamp() >= '2024-05-11T12:00:00Z' AND
System.Timestamp() <= '2024-05-11T12:05:00Z'
```
Before ending our Data Analytics related Window function. Let's also check if there can be a general-purpose SQL window function. Here's a general SQL example using a window function to find the Nth highest salary:
```sql
SELECT DISTINCT Salary
FROM (
SELECT Salary, DENSE_RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM Employee
) AS RankedSalaries
WHERE Rank = N;
```
In this query:
- We first assign a rank to each salary using the `DENSE_RANK()` window function, ordering them in descending order of salary.
- Then, we select the distinct salaries where the rank matches the desired Nth highest value.
Replace `Employee` with your actual table name and `N` with the desired rank you're interested in.