Quantcast
Channel: The Business Intelligence Palace » Saeed Barghi
Viewing all articles
Browse latest Browse all 15

SSIS, Tempdb Database, and SQL Server Log Files

$
0
0

The tempdb system database is a global resource that is available to all users connected to the instance of SQL Server. Tempdb is re-created every time SQL Server is started, which means the system always starts with a clean copy of the database and there is never anything in tempdb that is saved from one session of SQL Server to another.

The tempdb’s usage could be roughly separated into 3 categories:

User objects: Any user with permission to connect to an instance of SQL Server would be able to create temporary tables and table variables. Temporary user objects that are explicitly created, such as: global or local temporary tables, temporary stored procedures, table variables, or cursors, are stored in tempdb database.

Internal job: Tempdb holds the intermediate data when SQL Server is doing tasks like:

  • DBCC CHECKDB or DBCC CHECKTABLE;
  • Common Table Expressions (CTE);
  • ORDER BY, GROUP BY, UNION Queries;
  • Index rebuild or creation.

Version store: Version Store is a collection of data pages that hold the data rows used for row versioning. Triggers, MARS (Multiple Active Result Sets), and online index operations are examples of SQL Server features that use version store.

Here in this post, I’m going to talk about how tempdb is affected when SSIS is used to load data into target tables or update previously loaded data (specifically in fact tables), and provide some workaround for preventing the database from growing too large. The solution provided here will pretty much have the same effect on SQL Server logs, resulting in smaller log files and saving space.

Populating Fact Tables

A fact table in a data warehouse has direct or indirect (through mapping tables) relationship to the dimension tables it is related to. Whether or not the table that a fact table is related to is dimension or mapping, there must be a column (foreign key) in the fact table that refers to the primary key of the linked table. This means there are 2 sets of columns in a fact table: measure columns which hold the numeric values that should be analysed and foreign keys to dimension tables that are used to slice and dice the measure values.

The set of dimension and/or mapping foreign keys in a fact table define its granularity: the lowest level that data should be aggregated before loading into DW. In other words, data should be aggregated at the right granular level, which is defined by its foreign key columns.

A good way of aggregating data at the required granular level is using a GROUP BY clause in the query that is used to pump the data into DW’s fact table. For example, if the structure of the fact table to be populated is like the one depicted below:

Image

Then the following query is the best to populate the table:

SELECT     ‘ProductKey’ = p.[ProductKey]

,’DateKey’ = d.[DateKey]

,’SalesAmount’ = SUM(s.Sales_Value)

FROM       [OLTP].[dbo].[Sales] s

INNER JOIN [DW].[dbo].[DimProduct] p ON s.Product_Key = p.ProductKey

INNER JOIN [DW].[dbo].[DimDate] d ON s.Sales_DateTime = d.DateTime_Value

GROUP BY    p.[ProductKey]

,d.[DateKey]

What should be considered at this stage is that since populating a fact table, especially for the first time, needs a huge number of records to be transferred from OLTP database to the DW database and the query to do so has a GROUP BY clause that engages tempdb database in the process of preparing data, the ETL should be designed in the way that it doesn’t blow up tempdb: The ETL for populating fact tables must be designed and implemented in such a way that it doesn’t load the data in a single data transfer process, and breaks it into smaller chunks of data to be loaded once at a time.

The best candidate for breaking down data is the date-related factor at which the data is aggregated. In our example the data is aggregated at the date level (it could be at any other level like Week, Month, or Year.)

Design and implementation of ETL using SSIS:

As explained before, the ETL for populating fact table should be designed in such a way that it loads data into DW by period (day, week, month, and year) to prevent tempdb and log files from growing too large. This needs our SSIS package to do the following steps:

A- Get the list of distinct periods for the whole data that is going to be loaded,

B- Save the periods extracted in previous step into an SSIS variable,

C- Run the process that cleanses and inserts data into DW once at a time for each period.

Let’s see how each step could be implemented using SSIS:

1- Open BIDS, create a new SSIS project;

2- Add a new Execute SQL Task to your SSIS Project. This task is responsible for reading the data to be loaded and build a list of distinct periods;

3- Create a variable of type Object at the scope of Package. This variable is going to hold the list of periods and will be used by different tasks at different steps of loading data. I named it Periods;

4- Add another variable of type Integer, Int32, to your package again at the scope of Package. This variable will be used to hold each period of time read from Periods variable, for which the package is going to load data. In my example this variable is called CalendarDate;

5- Next step is to extract the periods for which the data should be loaded into the fact table. This would be done by writing a query that extracts the distinct list of time periods from the OLTP database, and configure the previously added Execute SQL Task to run the query and save the result in the Periods variable:

a-  Add the SQL script to your Execute SQL Task:

b- Change the ResultSet property of the Execute SQL Task to Full Result Set;

c- Click on the Result Set tab, and add a new result set to store the result of running your query to the Periods Variable:

6- To get the package to transfer data once for a period at a time, we need a Foreach Loop Container. The Foreach container is going to loop through what we have already loaded into Periods SSIS variable, and run the process for transferring the data once for each period:

a- Add a Foreach Loop container to your project and open its editor by double-clicking on it;

b- Go to Collection tab, and select “Foreach ADO Enumerator” from Enumarator drop down and “User::Periods” from ADO object source variable:

c- Click on on Variable Mapping tab of Foreach Loop Editor and add CalendarDate variable. This variable is going to hold the period for which the data is going to be transferred from OLTP to DW at each iteration:

7- Add the DFT and the sequence container (optional) to your project to do the transfer using CalendarDate variable. You should be able to find loads of resources on how to use a variable in an OLE DB source of a DFT on internet, in case you don’t know how to do that.


Filed under: Business Intelligence, Data Warehousing, SQL Server Database Engine, SSIS Tagged: Foreach Loop, SQL Server 2008, SQL Server Log Files, SSIS, tempdb

Viewing all articles
Browse latest Browse all 15

Trending Articles