The SQL CAT team has loads of experience in loading large volumes of data using SQL Server Integration Services. Thanks to the team for putting this wonderful post together.
Here are highlights plus some of my own tips:
- Do all transforms inSSIS
- SSIS is an in-memory engine and transform will be faster here than in TSQL.
- Understand resource utilization
- Use correct performance counters to monitor and measure throughput and stress on resources.
- Baselineextract performance
- You need a baseline to measure improvements in performance.
- OptimizeTSQL,SSIS lookups and destination components
- Join, Group By and Sort operations are best in TSQL.
- Optimize destination components to ensure you bulk load data into SQL
- Tune your network
- Depending on networking you might get better performance adjusting your packet size property in SSIS
- Pay special attention to data types
- Narrow data types use less memory.
- Use set based operations
- Set based update statements are faster than row-by-row updates over OLEDB.
- TSQL Merge is faster than SSIS for slowly changing dimension operations.
- Use partitioning patterns
- Break the problem into smaller pieces and conquer.
- Create ranges of equal size.
- Use minimally logged operations
- Seriously consider enabling SQL trace flag 610.
- Use Truncate instead of Delete.
- Distribute workload
- Implement dynamic multi-threading for your ETL.