SQL Server Integration Services (SSIS)
execution is sequential by default. SQL Server Integration Services (SSIS) does
allow parallel execution. Now question is how we can configure SSIS to run in
parallel? In SQL Server 2008 and later on version Integration Services,
pipeline parallelism has been enhanced with architecture improvements to the
data flow engine. SQL 2005 Integration Services each execution tree in a DataFlow task within a package is
generally assigned a single worker thread, and, under certain conditions, each
execution tree actually shares a thread with other execution trees. This approach has certain benefits
and also has some drawbacks. Comparing
SQL 2005 SSIS and SQL 2008 SSIS parallelism is out of scope of this article. Any
curios user who is interested to know more about it should try google/bing.
Now coming back to our original question and
let us continue our talk about SSIS 2008 parallelism. The answer is simple and parallelism
can be achieved in SSIS 2008 and later version in two different ways. It can be
controlled by using following two properties:
- MaxConcurrentExecutables – It is a property of SSIS package which defines how many tasks can run simultaneously. Its default value is -1.
- EngineThreads – It is a property of the Data Flow Task that defines how many work threads the scheduler will create and run in parallel. Its default value is 5 for SQL 2008 and default value is 10 in SQL 2012 Evaluation version.
Now let us take an example: Assume we
have a SSIS package with 5 Data Flow Tasks. Each task has 15 flows in the transform
of “OLE DB SQL Source to SQL Server Destination”.
Just Set MaxConcurrentExecutables to 5 for package properties, then all 5
Data Flow Tasks will run simultaneously. Please see below figure:
Now all 15 flows in each individual
Data Flow Task start concurrently or not, it is a good question. The answer is
quite interesting. The answer is No if you are using SQL 2008 and later
version. The default value of EngineThreads is 5 for SQL 2008 and it is 10 for
SQL 21012 evaluation version. All 15
flows in each individual Data Flow Task can run concurrently if we modify EngineThreads properties of Data Flow
Task and set it to 15. Please see below figure:
EngineThreads governs both source
threads (for source components) and work threads (for transformation and
destination components). Here when we say source threads than we are referring to
source components and when we say work threads than we are referring to transformation
and destination components. Please take a note that source threads and work
threads are both engine threads created by the Data Flow’s scheduler.