How to fill in gaps in time series

Microsoft SQL Server

In some cases it is necessary to have an unbroken time series, for instance to have all days during a year. But what if some days are missing from the data? How can we fill in gaps in time series?

By using a Numbers function in T-SQL, we can fill in the gaps in a serie of values. A common example is a time series of days, but any incrementing series of values can use the method I’ll describe in this blog post.

Let’s start by creating some sample data. This script will create a database with a table that will contain average temperatures over a year:

CREATE DATABASE FillInGaps
GO

USE FillInGaps
GO

CREATE TABLE dbo.Temperatures
	(
		TempDay DATE NOT NULL
		,Temperature NUMERIC(4,2) NOT NULL
	)
GO

To populate the table with some sample data, we’ll use a numbers function. The numbers function simply accepts the parameter @N and returns that many rows.

CREATE FUNCTION  dbo.GetNums2(@N AS BIGINT) RETURNS TABLE
AS
RETURN
SELECT TOP (@N) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS I
    FROM SYS.OBJECTS S1
	CROSS JOIN SYS.OBJECTS S2
	CROSS JOIN SYS.OBJECTS S3
	CROSS JOIN SYS.OBJECTS S4
	CROSS JOIN SYS.OBJECTS S5
GO

By using a T-SQL random function, we can generate some sample data. The following script will generate and insert all days 2015 and a random temperature between 0 and 30 (never mind the wildly fluctuating temperatures 🙂 )

DECLARE @START_DATE DATE

SET @START_DATE = '2014-12-31'

INSERT dbo.Temperatures
SELECT
	DATEADD(DAY, I, @START_DATE)
	,ABS(CHECKSUM(NewId())) % 30 AS TMP
FROM
	dbo.GetNums2(365)

SELECT * FROM dbo.Temperatures

But since the table contains all days for 2015, this is a unbroken time series. So let’s remove some of the days:

DELETE FROM dbo.Temperatures WHERE TempDay IN ('2015-01-13', '2015-01-14', '2015-01-15', '2015-03-14', '2015-04-23', '2015-07-26', '2015-09-19')

Note that I created a three day gap in January.

To create the unbroken time series for 2015, we once again use the numbers function together with a left outer join:

DECLARE @START_DATE DATE

SET @START_DATE = '2014-12-31'

SELECT
	DATEADD(DAY, I, @START_DATE) AS TempDay
	,Temperature
	,(
		SELECT Temperature
		FROM dbo.Temperatures
		WHERE TempDay =
			(SELECT MAX(TempDay) FROM dbo.Temperatures T_PREV WHERE T_PREV.TempDay <= DATEADD(DAY, I, @START_DATE))
	) AS TemperatureLastKnown
FROM
	dbo.GetNums2(365)
LEFT OUTER JOIN
	dbo.Temperatures T
ON
	DATEADD(DAY, I, @START_DATE) = T.TempDay
ORDER BY
	TempDay

This T-SQL gives us a result like the following:

gaps in time series demo results

gaps in time series demo results

The first column simply returns the date. Note that I use the function instead of the column TempDay, since that column misses some values.

The second column, Temperature, returns the temperature. Note that the value is NULL if there is no measurement for that day.

The last column, TemperatureLastKnown, shows a solution where I use the “last known” value for the temperature. This method can be useful if you want to avoid NULLs or if you want a smoothing effect.

2 comments » Write a comment

  1. Thanks Thomas, very useful article.

    I have to do this kind of filling gaps but have to do it gradually.
    I should not only take into account the previous value but also with the next one.
    There are gaps from 1 to a maximum of 9 days. Longer gaps may not be filled in my case.

    The most simple situation is a gap of 1 day where the filled data is the average of the previous and next value.
    Taking your example (where the previous value is 8 en the next known value = 0), the value of
    2015-01-13 would become 0.75*8 + 0.25*0 = 6
    2015-01-14 would become 0.50*8 + 0.50*0 = 4
    2015-01-15 would become 0.25*8 + 0.75*0 = 2

    It is becoming increasingly difficult as the gaps get bigger.
    Is there a function to gradually fill in the gaps to smooth the results.

Leave a Reply

Required fields are marked *.