Dynamic source data type length calculation

On every BI project you will be required to pull in data from a source system into a staging database. In order to do this you will need to determine what the appropriate field lengths and data types. For example when the source system “Customer” table has a column “CustomerCountry” which is a string with maximum length 4000. The maximum length of the data it contains is only 98 characters. It would not make sense to set the column length so big on your staging database.

Now you can image if you have lots of tables this can become quite a tedious task.

What we did on our project was to create a TSQL script which we run on the SQL Server source. Thanks @Drickusa for the help. We specify the table name and the dynamic SQL script will determine, using a formula, what our best field length should be. We will then in the ETL convert the source data to our new best field length option.

The script returns 4 columns; columname, datatype, sourcefieldlength, bestlengthoption

PS. You need read access to the system tables.

The formula we use is the following.
1) Max source data length rounded up to nearest 10 times 2.
2) Double the value above.
3) Max source system data type times 1.5.
4) Compare and pick the lowest value between 1) & 3)

Example:
Column CustomerCountry varchar(4000) with max data length 98
1)98 -> 100 x 2 = 200
2)200 x 2 = 400
3) 4000 x 1.5 = 6000
4) 200 < 6000 thus we us 200

*************************
Declare @SQL VARCHAR(max) = ”

Declare @TableName sysname
Declare @ColName sysname

Set @TableName = ‘Customer’ –Add your Table name here

Select @SQL = @SQL + ‘SELECT ‘ + QUOTENAME(sc.name, ””) + ‘ AS ColumnName, ‘ + QUOTENAME(t.name, ””) + ‘ AS DataType, ‘ +
+ Convert(varchar,sc.max_length) + ‘AS SourceFieldLength,’ +
‘case when round((‘ + Convert(varchar,sc.max_length) + ‘*1.5),0) < ceiling( + MAX(Len(‘ + QUOTENAME(sc.name) + ‘)) / 10.0)*10*4 then round((‘ + Convert(varchar,sc.max_length) + ‘ * 1.5),0) else ceiling(MAX(Len(‘ + QUOTENAME(sc.name) + ‘))/ 10.0)*10*4
end as BestLengthOption ‘ +
‘From ‘+@TableName+ char(10) + ‘ Union ‘
FROM sys.columns sc
join sys.types t
on t.system_type_id = sc.system_type_id
and t.name != ‘sysname’
WHERE sc.OBJECT_ID = OBJECT_ID(@TableName)
order by sc.name

Set @SQL = Left(@SQL, Len(@SQL)-6)

–Print @SQl

Exec(@SQL)

*************************

Advertisements
This entry was posted in SQL Server 2014 and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s