Tips for Performance Tuning in SSIS package
Data Flow Optimization Modes
The data flow task has a property called RunInOptimizedMode. When this property is enabled, any down-stream component that doesn’t use any of a source component columns is automatically disabled, and any unused column is also automatically disabled. The net result of enabling the RunInOptimizedMode property is the performance of the entire data flow task is improved.
SSIS projects also have an RunInOptimizedMode property. This indicates that the RunInOptimizedMode property of all the data flow tasks in the project are overridden at design time, and that all of the data flow tasks in the project run in optimized mode during debugging.
Buffers
A buffer is an in-memory dataset object utilized by the data flow engine to transform data. The data flow task has a configurable property called DefaultMaxBufferRows, which is set to 10,000 by default. The data flow task also has a configurable property called DefaultBufferSize, which is set to 10 MB by default. Additionally, the data flow task has a property called MaxBufferSize, which is set to 100 MB and cannot be changed.
Buffer Sizing
When performance tuning a data flow, the goal should be to pass as many records as possible through a single buffer while efficiently utilizing memory. This begs the question: what does “efficiently utilizing memory” mean? SSIS estimates the size of a buffer row by calculating the data source metadata at design-time. Optimally, the buffer row size should be as small as possible…which can be accomplished by employing the smallest possible data type for each column. SSIS automatically multiplies the estimated buffer row size by the DefaultMaxBufferRows setting to determine how much memory to allocate to each buffer in the data flow engine. If this amount of memory exceeds the MaxBufferSize (100 MB), SSIS automatically reduces the number of buffer rows to fit within the 100 MB memory boundary.
The data flow task has another property called MinBufferSize, which is 64 KB and cannot be changed. If the amount of memory estimated by SSIS to be allocated for each buffer is below 64 KB, SSIS will automatically increase the number of buffer rows per buffer in order to exceed the MinBufferSize memory boundary.
Buffer Tuning
The data flow task has a property called BufferSizeTuning. When the value of this property is set to true, SSIS will add information to the SSIS log indicating where SSIS has adjusted the buffer size. While buffer tuning, the goal should be to fit as many rows into a buffer as possible. Thus, the value for DefaultMaxBufferRows should be as large as possible without exceeding a total buffer size of 100 MB.
Parallelism
SSIS natively supports the parallel execution of packages, tasks, and transformations. Therefore, parallelism can greatly improve the performance of a package when it is configured within the constraints of system resources. A package has a property called MaxConcurrentExecutables, which can be configured to set the maximum number of threads that can execute in parallel per package. By default, this is set to -1, which translates to the number of logical machine processors plus 2. All or some of the operations in a package can execute in parallel.
Additonally, the data flow task has a property called EngineThreads, which defines how many threads the data flow engine can create and run in parallel. This property applies equally to both the source threads that the data flow engine creates for sources and the worker threads that the engine creates for transformations and destinations. For example, setting the EngineThreads property to 10 indicates that the data flow engine can create up to 10 source threads and 10 worker threads.
No comments:
Post a Comment