Wednesday, May 25, 2011

Doing Partition Elimination

Let us assume, that we want to move old data to secondary filegroups. In this case, we would then partition on the timestamp of a row. However, our applications querys a lot on the primary key column "ID", so we prefer to make the ID instead of the timestamp the partition column.

The table therefore partitioned like this:

Partition 1: ID < 99 (before 01-February-2011)
Partition 2: ID 100..199 (February 2011)
Partition 3: ID 200..299 (March 2011)
Partition 4: ID 300..399 (April 2011)
Partition 5: ID > 400 (after 01-May-2011)

At the same time, there are a lot of queries with timely conditions such as
select * from table where timestamp > '2011-04-01' and error is not null

The SQL Server query optimzer would now search the whole table for entries that are younger then the 1st of April 2011 (and have the error column filled). Since the timestamp is not part of the partition column, the query optimizer doesn't know that all data from April 2011 and younger is in partition 4 and 5 and, hence, it can't do partition elimination but has to go through all partitions.

How can we help the query optimizer that he only has to look up the concrete partition.

Until now, I found three answers:

  • You can add a condition to the partition to the query:
    select * from table where timestamp > '2011-04-01' and error is not null and $partition.function(ID)>=4
    This way, the SQL Server goes through all rows with an clustered index seek and returns rows from partition 4 and greater. It's fast, but you have to change the query.

  • You can add a condition to the ID to the query:
    You can also limit the query to the boundary value of the partition - adding and ID>=300, and thus, the SQL Server uses a clustered index scan, which is faster than a clustered index seek (see Scans vs. Seeks from Freedman), and furthermore, only reads through the partitions 4 and 5.

  • You could add a non-clustered index on the timestamp:
    Thus, the ID is automatically included because it's in the clustered index. Then, you don't have to change your queries, and the SQL Server can use the index to find the IDs quite fast. This is the most convenient option because you don't change the query; however, the SQL Server needs more steps to find the result.
If you have more ideas, please let me know in a comment. :-)

Thursday, May 19, 2011

Move a Table to another Filegroup

In case you want to move a table to another filegroup, you would have to change the primary key of the table... and also all the foreign keys to that primary key.

This script allows you to do all this, it drops the constraints of the foreign keys, then it moves the primary key and then it creates again the constraints of the foreign keys.

You can just run the script - as a result you receive the needed SQL statements to move the table (the script itself doesn't make any changes!!).

declare @tablename varchar(50)
set @tablename = --'TableA
declare @newFilegroup varchar(50)
set @newFilegroup = --'FilegroupB'

declare @tableID int
select @tableID = object_id from sys.tables where name=@tablename

select 'BEGIN TRANSACTION'

UNION ALL

select 'ALTER TABLE [dbo].[' + t.name + '] DROP CONSTRAINT [' + fk.name + ']' from sys.tables t
inner join sys.foreign_keys fk on t.object_id = fk.parent_object_id
inner join sys.foreign_key_columns fkc on fk.object_id = fkc.constraint_object_id
inner join sys.tables tMove on fkc.referenced_object_id = tMove.object_id
where tMove.name = @tablename

UNION ALL select 'GO'

UNION ALL

select 'ALTER TABLE [dbo].[' + @tablename + '] DROP CONSTRAINT ' + kc.name + ' WITH (MOVE TO ' + @newFilegroup + ')' from sys.key_constraints kc where kc.parent_object_id = @tableID and kc.type='PK'

UNION ALL select 'GO'

UNION ALL

select 'ALTER TABLE [dbo].[' + @tablename + '] ADD CONSTRAINT ' + kc.name + ' PRIMARY KEY(' + c.name + ') ' from sys.key_constraints kc
inner join sys.columns c on kc.parent_object_id = c.object_id
where kc.parent_object_id = @tableID and c.is_identity = 1 and kc.type='PK'

UNION ALL select'GO'

UNION ALL

select 'ALTER TABLE [dbo].[' + t.name + '] WITH CHECK ADD CONSTRAINT [' + fk.name + '] FOREIGN KEY([' + cMove.name + ']) REFERENCES [dbo].[' + @tablename + '] ([' + c.name + '])' from sys.tables t
inner join sys.foreign_keys fk on t.object_id = fk.parent_object_id
inner join sys.foreign_key_columns fkc on fk.object_id = fkc.constraint_object_id
inner join sys.tables tMove on fkc.referenced_object_id = tMove.object_id
inner join sys.columns c on t.object_id = c.object_id
inner join sys.foreign_keys fkMove on fkMove.object_id=fkc.constraint_object_id
inner join sys.columns cMove on cMove.column_id=fkc.parent_column_id
where tMove.name = @tablename and c.is_identity = 1
and fkc.referenced_object_id=@tableID and fkc.parent_object_id=cMove.object_id

UNION ALL select 'GO'

UNION ALL

select 'ALTER TABLE [dbo].[' + t.name + '] CHECK CONSTRAINT [' + fk.name + ']' from sys.tables t
inner join sys.foreign_keys fk on t.object_id = fk.parent_object_id
inner join sys.foreign_key_columns fkc on fk.object_id = fkc.constraint_object_id
inner join sys.tables tMove on fkc.referenced_object_id = tMove.object_id
where tMove.name = @tablename

UNION ALL

select '--COMMIT'


The result of the script looks like this:

Have fun!

Wednesday, May 18, 2011

Eat Frozen Yogurt!


My first post in this blog was about motivating software developers with chocolate.

No, I have to advertise a frozen yogurt cafe of friends in Hamburg-Winterhude, which offers delicious frozen yogurt - ice cream mixed with fruits or chocolates - and good gepa coffee (which since all times has motivated developers). If you can't go there by foot (Dorotheenstraße 99), you should at least visit them online: FrozenYogurtWelt bei Facebook and the FrozenYogurtWelt homepage.


Wednesday, May 11, 2011

Good articels on PARTITION

Here are some articels about PARTITION in SQL Server 2008 that helped me:
And, a bit of propaganda for myself:

Wednesday, May 4, 2011

Query a SQL Server Database while it is still restoring

PARTITION is an SQL Server Enterprise Edition feature that can be used to spread data over different partitions, which can be put into different filegroups.
When you think about Backup and Restore strategies, a neat thing is that secondary filegroups can be set read-only and then don't have to be backuped permanently (simply because the data keeps unchanged). Thus, your backup runs faster, because you only backup readwrite filegroups, and thus the backup doesn't effect your database so much. Furthermore, you can do piecemeal restore after a disaster, which means that you can make the database available already after restoring the primary filegroup. So the users can query the restored part of the database until you restored all the filegroups of the database.

The following example shows how queries to the database work, if one filegroup isn't restored.

In the example, there are two tables with ID and CreateDate. The second table has a foreign key to the ID of the first table. The tables are partitioned with column ID at value 3, so that ID 1 and 2 of each table reside in partition 1, and ID 3 and 4 in partition 2. The first partition is on filegroup 1, the other on filegroup 2. A join query an the two tables returns this result:









In case of a crash, we would restore first the primary filegroup. If we run the same query against all the data in the tables, we get an error because not all data is available:
Msg 679, Level 16, State 1, Line 8
One of the partitions of index 'PK_Transfile' for table 'dbo.Transfile'(partition ID 72057594039238656) resides on a filegroup ("SECONDARY") that cannot be accessed because it is offline, restoring, or defunct. This may limit the query result.


In order to make your query run, you have to restrict it to the restored partitions. In this case, you have to restrict the query so that the ID columns of the tables are smaller or equal to 2. Since the partition column is known, the SQL processor knows that it only has to query the first partition and, thus, only enters the first, restored filegroup. The result then is not complete, but it doesn't return an error:







This way, you can use PARTITION to get your system run much faster, you can shorten your backup time and you can ease restore.