Saturday 29 September 2012

Parallel Execution in SSIS 2008 and later version


SQL Server Integration Services (SSIS) execution is sequential by default. SQL Server Integration Services (SSIS) does allow parallel execution. Now question is how we can configure SSIS to run in parallel? In SQL Server 2008 and later on version Integration Services, pipeline parallelism has been enhanced with architecture improvements to the data flow engine. SQL 2005 Integration Services each execution tree in a DataFlow task within a package is generally assigned a single worker thread, and, under certain conditions, each execution tree actually shares a thread with other execution trees.  This approach has certain benefits and also has some drawbacks.  Comparing SQL 2005 SSIS and SQL 2008 SSIS parallelism is out of scope of this article. Any curios user who is interested to know more about it should try google/bing.
Now coming back to our original question and let us continue our talk about SSIS 2008 parallelism. The answer is simple and parallelism can be achieved in SSIS 2008 and later version in two different ways. It can be controlled by using following two properties:
       
  •        MaxConcurrentExecutables – It is a property of SSIS package which defines how many tasks can run simultaneously. Its default value is -1.
  •    EngineThreads – It is a property of the Data Flow Task that defines how many work threads the scheduler will create and run in parallel. Its default value is 5 for SQL 2008 and default value is 10 in SQL 2012 Evaluation version.


Now let us take an example: Assume we have a SSIS package with 5 Data Flow Tasks. Each task has 15 flows in the transform of “OLE DB SQL Source to SQL Server Destination”.
Just Set MaxConcurrentExecutables to 5 for package properties, then all 5 Data Flow Tasks will run simultaneously. Please see below figure:


Now all 15 flows in each individual Data Flow Task start concurrently or not, it is a good question. The answer is quite interesting. The answer is No if you are using SQL 2008 and later version. The default value of EngineThreads is 5 for SQL 2008 and it is 10 for SQL 21012 evaluation version.  All 15 flows in each individual Data Flow Task can run concurrently if we modify EngineThreads properties of Data Flow Task and set it to 15. Please see below figure:




EngineThreads governs both source threads (for source components) and work threads (for transformation and destination components). Here when we say source threads than we are referring to source components and when we say work threads than we are referring to transformation and destination components. Please take a note that source threads and work threads are both engine threads created by the Data Flow’s scheduler.


4 comments:

  1. Great article; nice one. Some comments added.
    We can put as many numbers as we want but the max thread behind the scenes uses the windows threading framework. Which means it is dependent on the CPU. So we should not forget that large number of parallel threads may not be parallel actually because the m/c have only 2 CPUs; so on and so forth.
    This max number of parallism is also configurable and we can set it at runtime while calling package from job or any other mechanism.

    ReplyDelete
  2. Thanks Arnay. I completely agree with your comments.

    ReplyDelete