Download the sample script: Problem-Scalar-UDF-and-solution-andy-novick and a PDF of the slides: The Problem with scalar UDFs and a solution – Andy Novick Or watch the video on Youtube:
User-Defined functions (UDF) have been around SQL Server since the 2000 version. They are a great way to encapsulate business logic in a reusable form. There are there types of UDFs:
- Scalar Functions – Returns a single scalar value (int, real, varchar, etc.)
- Table Valued Functions – Returns a @Table of one or more columns
- Inline Functions – Returns the results of a SELECT statement. (A view with parameters)
However, from the start scalar and Table-Valued functions have suffered from a performance problem when they’re used in a query that touches a large number of rows. This article and the accompanying video on you-tube demonstrate the problem and the solution. There are two problems when using scalar and table-valued UDFs:
- use of the function causes Row-by-Agonizing-Row (RBAR) processing like a cursor
- Use of the function Inhibits Parallelism in the query
For an example function I’ll use the exponential-moving-average function (EMA). It’s used in many scientific and analytic fields including investment management were I’m working now. It gives an average over a time period but it’s advantage is that it can be updated on each new period by referring only to the previous period’s EMA and the new value.
Exponential Moving Average formula
This can be expressed as a scalar T-SQL User-Defined Function like this:
CREATE FUNCTION [dbo].[exp_moving_avg_scalar] (
@ema_yesterday FLOAT , @value_today FLOAT , @days INT , @smoothing FLOAT = 2.0 ) RETURNS FLOAT /* Single value exponential moving average (EMA) * * Example: select dbo.exp_moving_avg_scalar(3.545, 2.123, 21, default) * * When * 2013-10-11 Andy Novick Initial Coding ***********************************************************************/
AS BEGIN RETURN CASE WHEN @ema_yesterday IS NULL THEN @value_today ELSE @value_today * ( @smoothing / ( 1.0 + @days ) ) ) + ( @ema_yesterday * ( 1.0 – ( @smoothing / ( 1.0 + @days ) ) ) END
END
There is a straight forward but non-obvious solution to the performance problem that involves replacing the scalar or table-valued function with an inline function. Here’s the equivalent inline UDF:
CREATE FUNCTION [dbo].[exp_moving_avg_inline] (@ema_yesterday FLOAT , @value_today FLOAT , @days INT , @smoothing FLOAT = 2.0 ) returns TABLE /* Single value exponential moving average (EMA) * * Example: select ema_today from dbo.exp_moving_avg_inline(3.545, 2.123, 21, default) * * When * 2013-10-11 Andy Novick Initial Coding ***********************************************************************/ AS RETURN SELECT CASE WHEN @ema_yesterday IS NULL THEN @value_today ELSE @value_today * ( @smoothing / ( 1.0 + @days ) ) ) + ( @ema_yesterday * ( 1.0 – ( @smoothing / ( 1.0 + @days ) ) ) END AS ema_today
And there’s an important difference in the way that a inline UDF is called vs a scalar UDF. Calling a scalar UDF is pretty simple and very much like other programming languages. The UDF goes into the SQL any place that you could use an expression. Here’s an example with the function on the third line:
SELECT COALESCE (yesterday.security_id, today.security_id) security_id , yesterday.return_ema returns_ema_yesterday , dbo.exp_moving_avg_scalar(yesterday.return_ema , today.returns_today , 21 — @days , DEFAULT — @smoothing ) returns_ema_today INTO example_table FROM security_returns yesterday FULL OUTER JOIN todays_returns today ON yesterday.security_id = today.security_id
Unfortunately a inline UDF doesn’t just drop into place where the scalar UDF once was. Instead it has to be in a sub-query, which makes the change more difficult. Here’s what the above SELECT it looks like with the lnline UDF:
SELECT COALESCE (yesterday.security_id, today.security_id) security_id , yesterday.return_ema returns_ema_yesterday ,(SELECT ema_today FROM dbo.exp_moving_avg_inline(yesterday.return_ema , today.returns_today , 21 — @days , DEFAULT — @smoothing ) ) returns_ema_today INTO example_table FROM security_returns yesterday FULL OUTER JOIN todays_returns today ON yesterday.security_id = today.security_id
The proof is in the query plan. Instead of having a non-parallel plan with the scalar UDF like this:
query plan for using exponential weighted scalar UDF in a SELECT Statement.
Now take a look at the plan that uses the Inline UDF:
Query plan when the lnline UDF has been used to replace the scalar UDF. Now it’s parallel
Notice the small orange circiles with the double arrows pointing left. These symbols indicate that the query plan node is a parallel node. It will use as many processors as are available instead of just one like the query with the scalar UDF. On my system with 4 cores I find the results when using an inline UDF are 2 to 4 times faster than when using the scalar UDF. The performance difference can also