Home » Tips for Developing a Data Warehouse 1

Tips for Developing a Data Warehouse 1

1. Think of the Business requirements first.
2. Get the list of source destinations
3. Setup a Staging Database to separate out the “Extract” out of ETL.
4. Always use master and child packages for making debugging easy later using execute package task.
5. Now think of a Audit table to store various values helpful in debugging the package while developing and also after deploying it to production.

Example of Audit Table Script below.
AuditTable_Script

Or read my post on designing Audit table for DW here.

6. First create a Master package which you can slowly populate as yougo.
7. In master package create a Execute SQL Task to get the AuditKey from AuditTable created above. This can be done using the query below in SQL Statement in Execute Query bit

Select CAST(SCOPE_IDENTITY() AS INT) AS AuditKey;

Now we need to set ResultSet to Single row for returning the AuditKey. This auditkey must be captured to a variable using Result set Tab with 0 mapped to user::auditkey(where Auditkey variable should be created already)

8. Think of implementing RowCounts for Source tables before inserting it into Staging into the Audit Table using variable and also RowCounts of Staging tables with any error rows count into separate variables.
For Achieving this, Use Execute SQL Task

Select Count(*) as RowCnt from [StagingTable];

Now we need to set ResultSet to Single row for returning the Row Count. This RowCount must be captured to a variable using Result set Tab with 0 mapped to user::RowCnt(where RowCnt variable should be created already)
9. For Staging its always recomended to Truncate/Reload the Source Date using a Execute SQL Task for Truncation bit.
10. Use the Auditkey again in child packages to insert package information. To get Auditkey use

Select CAST(SCOPE_IDENTITY() AS INT) AS AuditKey;

11. After the Staging part is implemented, then think of Actual Transformations needed like Data Cleaning, conversions, Aggregations etc.
12. It is always recomended to load data in batches of say 1000 rows instead of inserting it all at once. The error rows are then processed all at once. If there are still errors then this errorcount of the rows should be recored to a raw file(or a Table somewhere) for debugging purposes later. Please find screenshots below:


13. Now Create a Datawarehouse to the store the final output.
14. If the transformations needed multiple steps go for multiple layer approach of having a separate Staging layer Database, Enterprise layer Database, Presentation layer Database. Using a proper naming conventions is also helpful like putting Staging tables as “S_”, Enterprise tables with “E_”, Presentation table as “P_”. For dimension related tables use “Dim_” and Fact related tables with “F_”.
15. Always keep the source queries simple as possible in Data Source Connectors.

I will keep posting more Topic soon.

Leave a Reply

Your email address will not be published. Required fields are marked *