This question most customers ask business intelligence (BI) consultants who implemented data warehouse (DWH) or BI solutions.
Not stating the obvious, DWH’s do grow over time, what we need to pay attention to is the database log space. The log space should always be small.
Shouldn’t it be large due to the huge volumes of data being processed?
If your log space is increasing and taking up a lot of disk space then you are not performing minimally logged operations.
Batch operations could be fully logged. Meaning the engine will write a log entry for each row that you change (insert, update or delete). This is contrary to common perception.
Most people assume their bulk load operations were minimally logged. This is not always true.
Here is what you need to do to ensure your log space remains small and you get good performance loading your DWH.
- Set your DB recovery model to simple or bulk-logged.
- Next you need to know your target.
- For clustered index table inserts:
- specify tablock and order by
- set trace flag 610 on. Otherwise all clustered index inserts will be fully logged. (see here for more details)
- For non-clustered indexes:
- disable them before insert and then re-build them afterwards.
Tip: The Select into is the fastest from a table for fully logged operations.
This applies to Microsoft SQL Server 2008 and up.
Now you may wonder – will I lose all my data if the minimally logged operation insert fails? See my next post for this answer…