Posted by on December 15, 2017

If you’ve used Azure Stream Analytics, you’ve probably encountered the message “Exactly one temporal window is expected.” While obviously only one temporal window is possible, this error occurs any time that a query has more than one GROUP BY clause. It seems perfectly normal to me to specify a temporal window, then to want to aggregate based on the initial set of aggregates. Lots of statistical methods use mutliple levels of aggregation.

It’s possible to aggregate more than once by performing all but one of the aggregations inside a Javascript User Defined Aggregate (UDA).

The UDA examples provided by Microsoft all use simple variables to accumulate totals. It is, however, possible to include arbitrarily complex Javascript. In particular, it is possible for a UDA to keep a history of all of the values that have been passed to it. And then to group and aggregate and re-group and re-aggregate as many times as is desired.

Here is a basic example of the code used to keep the history, which returns the oldest event still in the window:

For reference, the most recent event can be remembered without the array:

Once the history is populated any arbitrary logic can be performed. This example makes a prediction based on linear regression.

Of course, this technique can also be used for simple anomaly detection:

The one caveat that I feel obliged to mention, is that this technique is not going to scale. If your temporal window as a few thousand events this works fine. With a few million you’ll want a different approach. Still, for most use cases short windows will be preferred, so this approach should be broadly applicable.

Of course, these examples are proof of concept rather than production ready. Do let me know if you’re interested in more detail.

Comments

Be the first to comment.

Leave a Reply