Surrogate keys and lookup Transformation in Developing Data Warehouse:
With conformed Dimensions approach, the Sk’s are created in when Dimensions are created by using Identity property (also can use alternative approach) but this has been convenient to me.
When adding identity to Dimension tables for Surrogate Keys make sure the Dummy rows like Unknown are added before you enable the identity.
Based on Business key, a Surrogate key lookup needs to be made in Package for Fact table and retrieve the Surrogate key from Dimension table by using lookup transformation.(Normally I use, Full Cache and Ole DB Data Connection).
By doing this, this will act as a RI (Referential Integrity) for joining facts and Dimensions tables together in Snow Flake Schema.
The major part in designing the Data Warehouse lies in understanding the underlying Business rules and data. Most of the time, you will need to write TSQL Scripts in SSMS first before designing SSIS packages using joins to analyse the data.
Another consideration in Dimensions is using Slowly changing Dimensions:
With slowly changing dimensions you can respond in one of three ways.
1.Type 1 Response: Overwrite the Old Data Value: You can overwrite the old data with the new data, which works best when the old data has no significance and can be discarded.
2.Type 2 Response: Create a New Dimension Record: You can create a new dimension record for the time period that defines the change, which works best with changes that neatly partition history.
3.Type 3 Response: Create an Old Value Column: You can also create an “old value” column in the slowly changing dimension to store the previous value, which works best with soft changes.
This can be achieved using Slowly Changing Dimensions Transformation which either inserts or UPdates the records based on a Businness key which needs to be defined as Key column with others as changing columns.
There has always been a Debate on adding Referential integrity constraints on Data Warehouse but if you like Data Load to fail instead of loading Wrong or unvalid data then, referential integrity would be handy. More over, if using analysis services then it would help it when you import the tables because it will see the relationships.
If you decide on adding the RI’s then below are few examples of them:
Add Default values to the Fact and Dimension table to columns where needed using a Default Constraint.
Add Primary key constraints and Foreign key contraints very carefully with effecting the package execution to Facts and Dimension tables.
Primary key for Date Dimension will be Datekey with Foreign key contraints on Releavant Fact Tables using it in Snow Flake Model.
A combined Primary key is normally built on Fact table like SalesOrderNum, SalesOrderLineNum,SalesOrderRevisionNUm on Orders Fact Table. We need to make sure its unique for all the table.
We can also add check contraints based on Auditkeys inserted inside the package for checking the audit key used in Dimensions are indeed already exists in AuditTable.