Capturing Long Running Queries

Long running queries in a database may or may not be a problem, depending on the type of system. In a data warehouse solution, queries that runs for several minutes may be perfectly normal and nothing to worry about. But in an OLTP system, long running queries may cause performance problems. Read on to see how to monitor for long running queries in SQL Server.

Dynamic Management Views for currently running processes

The dynamic management view sys.dm_exec_requests contains information about requests that are currently running in a SQL Server instance. The column total_elapsed_time shows how long the query has been running in milliseconds. By regularly checking this view for any requests that has run for over a predetermined amount of time, for instance 20 seconds, we can start monitoring that request. When it finishes, we can store information about the request in a separate table for later investigation.

But sys.dm_exec_requests returns much more information that you probably want. After all, the point of monitoring for long running queries is to find statements in your system that can cause problems and that are good candidates for optimization. So we need to filter out some information from the view.

The first filter would be to get rid of SQL Server’s internal spid’s. They have an session id of 50 or lower.

Second, “idle” connections are not of interest here. Those have a status of background or sleeping. Read more about different request statuses here.

Thirdly, there are wait types that can safely be ignored. This is because they don’t consume or block any resources. The list of those wait types may be differing according to your needs and interests, but to simplify I reused a list created by Paul Randal in his article Wait statistics, or please tell me where it hurts.

Finally, we are not interested of “our own” spid. This solution will use a wait in order to check when a request finishes. This means the monitoring solution may run for almost as long as the monitored query. But it is not of our interest here, and the monitoring solution is not very resource consuming.

The DMV sys.dm_exec_requests is joined to other dynamic managent views for more information about the current request, and also about any blocking requests if any.

That covers the basics about what we are monitoring. I will not cover the details in the solution much more than that, so I’ll just go on with the solution and what you need to do and know to get it running.

Long Running Queries

The solution uses a SQL Agent job that runs every 10 seconds to check if there is any long running query that should be monitored. The time limit I use to classify a query as long running is 20 seconds. Another time limit could be used of course, but the reason I do not use a longer time period is that I want to make sure to catch queries that run 30 seconds and then time out. 30 seconds is the default time out setting in many connections.

So, create a new SQL Agent job called “LongRunningQueries”. In that job, create a step with the following T-SQL:

SET QUOTED_IDENTIFIER ON
SET NOCOUNT ON

DECLARE @StartMonitoringAfterSeconds INT = 20

IF (SELECT OBJECT_ID('#TLind_IGNORED_WAIT_TYPES')) IS NULL BEGIN
	CREATE TABLE #TLind_IGNORED_WAIT_TYPES
		(
			WaitType NVARCHAR(60) NOT NULL
		)
	--http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/
	INSERT #TLind_IGNORED_WAIT_TYPES (WaitType)
	VALUES
	('BROKER_EVENTHANDLER'),
	('BROKER_RECEIVE_WAITFOR'),
	('BROKER_TASK_STOP'),
	('BROKER_TO_FLUSH'),
	('BROKER_TRANSMITTER'),
	('CHECKPOINT_QUEUE'),
	('CHKPT'),
	('CLR_AUTO_EVENT'),
	('CLR_MANUAL_EVENT'),
	('CLR_SEMAPHORE'),
	('DBMIRROR_DBM_EVENT'),
	('DBMIRROR_EVENTS_QUEUE'),
	('DBMIRROR_WORKER_QUEUE'),
	('DBMIRRORING_CMD'),
	('DIRTY_PAGE_POLL'),
	('DISPATCHER_QUEUE_SEMAPHORE'),
	('EXECSYNC'),
	('FSAGENT'),
	('FT_IFTS_SCHEDULER_IDLE_WAIT'),
	('FT_IFTSHC_MUTEX'),
	('HADR_CLUSAPI_CALL'),
	('HADR_FILESTREAM_IOMGR_IOCOMPLETION'),
	('HADR_LOGCAPTURE_WAIT'),
	('HADR_NOTIFICATION_DEQUEUE'),
	('HADR_TIMER_TASK'),
	('HADR_WORK_QUEUE'),
	('KSOURCE_WAKEUP'),
	('LAZYWRITER_SLEEP'),
	('LOGMGR_QUEUE'),
	('ONDEMAND_TASK_QUEUE'),
	('PWAIT_ALL_COMPONENTS_INITIALIZED'),
	('QDS_PERSIST_TASK_MAIN_LOOP_SLEEP'),
	('QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP'),
	('REQUEST_FOR_DEADLOCK_SEARCH'),
	('RESOURCE_QUEUE'),
	('SERVER_IDLE_CHECK'),
	('SLEEP_BPOOL_FLUSH'),
	('SLEEP_DBSTARTUP'),
	('SLEEP_DCOMSTARTUP'),
	('SLEEP_MASTERDBREADY'),
	('SLEEP_MASTERMDREADY'),
	('SLEEP_MASTERUPGRADED'),
	('SLEEP_MSDBSTARTUP'),
	('SLEEP_SYSTEMTASK'),
	('SLEEP_TASK'),
	('SLEEP_TEMPDBSTARTUP'),
	('SNI_HTTP_ACCEPT'),
	('SP_SERVER_DIAGNOSTICS_SLEEP'),
	('SQLTRACE_BUFFER_FLUSH'),
	('SQLTRACE_INCREMENTAL_FLUSH_SLEEP'),
	('SQLTRACE_WAIT_ENTRIES'),
	('WAIT_FOR_RESULTS'),
	('WAITFOR'),
	('WAITFOR_TASKSHUTDOWN'),
	('WAIT_XTP_HOST_WAIT'),
	('WAIT_XTP_OFFLINE_CKPT_NEW_LOG'),
	('WAIT_XTP_CKPT_CLOSE'),
	('XE_DISPATCHER_JOIN'),
	('XE_DISPATCHER_WAIT'),
	('XE_TIMER_EVENT')
END

SELECT
	R.session_id
	,R.request_id
	,R.start_time
	,CAST('1900-01-01' AS DATETIME2) AS completed_time
	,S.[host_name]
	,S.[program_name]
	,S.login_name
	,R.[status]
	,R.command
	,R.database_id
	,R.wait_type
	,R.wait_time / 1000 AS blocked_wait_time_sec
	,R.last_wait_type
	,R.wait_resource
	,R.total_elapsed_time / 1000 AS total_elapsed_time_sec
	,ST.text
	,S.transaction_isolation_level AS blocked_transaction_isolation_level
	,R.blocking_session_id
	,S_B.[host_name] AS blocking_host_name
	,S_B.[program_name] AS blocking_program_name
	,S_B.login_name AS blocking_login_name
	,ST_B.text AS blocking_text
	,S_B.transaction_isolation_level AS blocking_transaction_isolation_level
INTO
	#TLind_LONG_RUNNING
FROM
	sys.dm_exec_requests R
CROSS APPLY
    sys.dm_exec_sql_text(R.sql_handle) AS ST
JOIN
	sys.dm_exec_sessions S
ON
	R.session_id = S.session_id
LEFT OUTER JOIN
	sys.dm_exec_connections C
ON
	R.blocking_session_id = C.session_id
OUTER APPLY
    sys.dm_exec_sql_text(C.most_recent_sql_handle) AS ST_B
LEFT OUTER JOIN
	sys.dm_exec_sessions S_B
ON
	R.blocking_session_id = S_B.session_id
WHERE
	R.[status] NOT IN ('background', 'sleeping')
AND
	R.wait_type NOT IN (SELECT WaitType FROM #TLind_IGNORED_WAIT_TYPES)
AND
	R.session_id > 50
AND
	R.session_id <> @@spid
AND
	(R.total_elapsed_time / 1000) > (@StartMonitoringAfterSeconds)


IF EXISTS (SELECT * FROM #TLind_LONG_RUNNING) BEGIN

	CREATE UNIQUE CLUSTERED INDEX TLind_LONG_RUNNING_CLU ON #TLind_LONG_RUNNING(session_id, request_id, start_time) WITH IGNORE_DUP_KEY

	WHILE EXISTS(SELECT * FROM #TLind_LONG_RUNNING WHERE completed_time = '1900-01-01') BEGIN
		WAITFOR DELAY '00:00:01'

		UPDATE
			#TLind_LONG_RUNNING
		SET
			completed_time = CASE WHEN R.session_id IS NULL AND LR.completed_time = '1900-01-01' THEN GETDATE() ELSE LR.completed_time END
			,[host_name] = COALESCE(S.[host_name], LR.[host_name])
			,[program_name] = COALESCE(S.[program_name], LR.[program_name])
			,login_name = COALESCE(S.[login_name], LR.[login_name])
			,[status] = COALESCE(R.[status], LR.[status])
			,command = COALESCE(R.command, LR.command)
			,database_id = COALESCE(R.database_id, LR.database_id)
			,wait_type = COALESCE(R.wait_type, LR.wait_type)
			,blocked_wait_time_sec = COALESCE(R.wait_time / 1000, LR.blocked_wait_time_sec)
			,last_wait_type = COALESCE(R.last_wait_type, LR.last_wait_type)
			,wait_resource = COALESCE(R.wait_resource, LR.wait_resource)
			,total_elapsed_time_sec = COALESCE(R.total_elapsed_time / 1000, LR.total_elapsed_time_sec)
			,text = COALESCE(ST.text, LR.text)
			,blocked_transaction_isolation_level = COALESCE(ST.text, LR.text)

			,blocking_session_id = COALESCE(NULLIF(R.blocking_session_id, 0), LR.blocking_session_id)
			,blocking_host_name = COALESCE(S_B.[host_name], LR.blocking_host_name)
			,blocking_program_name = COALESCE(S_B.[program_name], LR.blocking_program_name)
			,blocking_login_name = COALESCE(S_B.login_name, LR.blocking_login_name)
			,blocking_text = COALESCE(ST_B.text, LR.blocking_text)
		FROM
			#TLind_LONG_RUNNING LR
		LEFT OUTER JOIN
			sys.dm_exec_requests R
		ON
			LR.session_id = R.session_id
		AND
			LR.request_id = R.request_id
		AND
			LR.start_time = R.start_time
		OUTER APPLY
			sys.dm_exec_sql_text(R.sql_handle) AS ST
		LEFT OUTER JOIN
			sys.dm_exec_sessions S
		ON
			R.session_id = S.session_id
		LEFT OUTER JOIN
			sys.dm_exec_connections C
		ON
			R.blocking_session_id = C.session_id
		OUTER APPLY
			sys.dm_exec_sql_text(C.most_recent_sql_handle) AS ST_B
		LEFT OUTER JOIN
			sys.dm_exec_sessions S_B
		ON
			R.blocking_session_id = S_B.session_id

		INSERT
			#TLind_LONG_RUNNING
		SELECT
			R.session_id
			,R.request_id
			,R.start_time
			,CAST('1900-01-01' AS DATETIME2) AS completed_time
			,S.[host_name]
			,S.[program_name]
			,S.login_name
			,R.[status]
			,R.command
			,R.database_id
			,R.wait_type
			,R.wait_time / 1000 AS blocked_wait_time_sec
			,R.last_wait_type
			,R.wait_resource
			,R.total_elapsed_time / 1000 AS total_elapsed_time_sec
			,ST.text
			,R.blocking_session_id
			,S_B.[host_name] AS blocking_host_name
			,S_B.[program_name] AS blocking_program_name
			,S_B.login_name AS blocking_login_name
			,ST_B.text AS blocking_text
		FROM
			sys.dm_exec_requests R
		CROSS APPLY
			sys.dm_exec_sql_text(R.sql_handle) AS ST
		JOIN
			sys.dm_exec_sessions S
		ON
			R.session_id = S.session_id
		LEFT OUTER JOIN
			sys.dm_exec_connections C
		ON
			R.blocking_session_id = C.session_id
		OUTER APPLY
			sys.dm_exec_sql_text(C.most_recent_sql_handle) AS ST_B
		LEFT OUTER JOIN
			sys.dm_exec_sessions S_B
		ON
			R.blocking_session_id = S_B.session_id
		WHERE
			R.[status] NOT IN ('background', 'sleeping')
		AND
			R.wait_type NOT IN (SELECT WaitType FROM #TLind_IGNORED_WAIT_TYPES)
		AND
			R.session_id > 50
		AND
			R.session_id <> @@spid
		AND
			(R.total_elapsed_time / 1000) > (@StartMonitoringAfterSeconds)
	END

	IF (SELECT OBJECT_ID('TLind_LONG_RUNNING')) IS NULL BEGIN
		CREATE TABLE TLind_LONG_RUNNING
			(
				session_id SMALLINT NOT NULL
				,request_id INT NOT NULL
				,start_time DATETIME NOT NULL
				,completed_time DATETIME2 NOT NULL
				,[host_name] NVARCHAR(128) NULL
				,[program_name] NVARCHAR(128) NULL
				,login_name NVARCHAR(128) NOT NULL
				,[status] NVARCHAR(30) NOT NULL
				,command NVARCHAR(32) NOT NULL
				,database_id SMALLINT NOT NULL
				,wait_type NVARCHAR(60) NULL
				,blocked_wait_time_sec INT NOT NULL
				,last_wait_type NVARCHAR(60) NOT NULL
				,wait_resource NVARCHAR(256) NOT NULL
				,total_elapsed_time_sec INT NOT NULL
				,text NVARCHAR(MAX) NULL
				,blocking_session_id SMALLINT NULL
				,blocking_host_name NVARCHAR(128) NULL
				,blocking_program_name NVARCHAR(128) NULL
				,blocking_login_name NVARCHAR(128) NULL
				,blocking_text NVARCHAR(MAX) NULL
			)
	END

	INSERT TLind_LONG_RUNNING
	SELECT * FROM #TLind_LONG_RUNNING

END

DROP TABLE #TLind_LONG_RUNNING

Schedule the job to run every 10 seconds. (Note that a job running every 10 seconds will generate a lot of history in msdb. Make sure you purge the history via SQL Agent -> Properties -> History, by using sp_purge_jobhistory, or by some other means.)

Basically, that’s all you need to start to monitoring. The long running queries will be stored in the table TLind_LONG_RUNNING in tempdb. Using tempdb means you may want to pick up the content of that table regularly since it will be deleted when SQL Server is restarted.

The columns stored in TLind_LONG_RUNNING are:

session_id

session_id for the request. Can be useful to see if several requests are made from the same or different spid’s. Part of the key (1 of 3) to identifying a single request.

request_id

A unique ID in the session. Part of the key (2 of 3) to identifying a single request.

start_time

When the request arrived. Part of the key (3 of 3) to identifying a single request.

completed_time

The time when a long running query was finished. Since the solution “only” checks for completion of a request every second, this value has an error margin at about a second. The value is set to 1900-01-01 until a request is completed.

host_name

Name of the client workstation executing the request.

program_name

Name of the client program that is executing the request.

login_name

SQL Server login name or Windows domain user name of the user that is executing the request.

status

Status of the request. Shows the latest status before the request was finished. Can be one of the following: Running, Runnable or Suspended.

command

The type of command that is running. For instance SELECT, INSERT, UPDATE, EXECUTE and BACKUP LOG. Shows the latest command before the request was finished.

database_id

The ID of the database. Get the name with SELECT DB_NAME(X). Shows the latest ID before the request was finished.

wait_type

The type of the wait. See sys.dm_os_wait_stats. Many wait types are filtered out from this solution, see above. Shows the latest wait_type before the request was finished.

blocked_wait_time

If the request was blocked, this column shows the total amount of time spent waiting for a blocked resource. In seconds.

last_wait_type

If the request was blocked previously, this column shows the latest blocking type. Shows the last_wait_type before the request was finished.

wait_resource

Shows what resource for which the request is waiting. Shows the last wait_resource before the request was finished.

total_elapsed_time

The total time the request was running.

text

By using the function sys.dm_exec_sql_text the text of the request is gathered from the sql_hanlde in sys.dm_exec_requests. Shows the last text before the request was finished.

blocking_session_id

If the request is blocked by another, the spid of the blocker will show here. Use this column to understand if the reason for the long running is another blocking session. Shows the latest blocking_session_id before the request was finished.

blocking_host_name

Name of the client workstation executing the request that is blocking the long running query. Shows the last blocking_host_name before the request was finished.

blocking_program_name

Name of the client program that is executing the request that is blocking the long running query. Shows the last blocking_program_name before the request was finished.

blocking_login_name

SQL Server login name or Windows domain user name of the user that is executing the request that is blocking the long running query. Shows the last blocking_login_name before the request was finished.

blocking_text

Same as text above, but for the blocking request. Shows the last blocking_text before the request was finished.

As always, make sure to test the solution before running it in a production environment. Have fun! 🙂