Archive for September, 2010


Optimising StringBuilder Appending

I have been working on some C# code optimisation, for speed.  The code in question does a lot of work with strings, so we already use StringBuilder to make it more efficient.  But what is the fastest way to append stuff to the StringBuilder?  In this instance, we’re building a lot of comma seperated lists, and even with StringBuilder there are several ways, for example:

Method 1, AppendFormat:

sb.AppendFormat(“{0},”,myStr);

Method 2, Multiple Append:

sb.Append(myStr);

sb.Append(“,”);

Method 3, Multiple Append (with a comma already in a string):

string sep = ”,”;

sb.Append(myStr);

sb.Append(sep);

Method 4, Single Append:

sb.Append(myStr+sep);

… so I decided to check it out.  Here are the results (in seconds), after using each technique one hundred million times (I did five tests and then averaged the results):

#1 #2 #3 #4 #5 Average
1 AppendFormat 17.20 17.56 17.36 17.58 17.55 17.45
2 Multiple Append 6.61 7.27 6.70 7.27 7.28 7.03
3 Multiple Append (comma in string) 6.65 7.20 6.75 7.20 7.20 7.00
4 Single Append 8.13 8.34 8.31 8.34 8.34 8.29

I was quite surprised at how much slower AppendFormat was (although I can understand why).  But it’s also worth remembering that the lazy coders way of using a single Append with the + operator to attach the comma is quite a bit slower than calling two Append methods (again, you can understand why when you think about it).

Here is the code that I used to test:

static void Main(string[] args)

{

DateTime start = DateTime.Now;

StringBuilder sb = new StringBuilder();

string myStr = ”hi”;

for (int t = 0; t < 100000000; t++)

{

sb.AppendFormat(“{0},”,myStr);

if ((t % 1000)==0) sb.Remove(0, sb.Length – 1);

}

TimeSpan gap = DateTime.Now – start;

Console.WriteLine(“AppendFormat takes: {0}”, gap);

sb.Remove(0, sb.Length – 1);

start = DateTime.Now;

for (int t = 0; t < 100000000; t++)

{

sb.Append(myStr);

sb.Append(“,”);

if ((t % 1000) == 0) sb.Remove(0, sb.Length – 1);

}

gap = DateTime.Now – start;

Console.WriteLine(“Multiple Appends takes: {0}”, gap);

sb.Remove(0, sb.Length – 1);

string sep = ”,”;

start = DateTime.Now;

for (int t = 0; t < 100000000; t++)

{

sb.Append(myStr);

sb.Append(sep);

if ((t % 1000) == 0) sb.Remove(0, sb.Length – 1);

}

gap = DateTime.Now – start;

Console.WriteLine(“Multiple Appends (comma in string) takes: {0}”, gap);

sb.Remove(0, sb.Length – 1);

start = DateTime.Now;

for (int t = 0; t < 100000000; t++)

{

sb.Append(myStr+sep);

if ((t % 1000) == 0) sb.Remove(0, sb.Length – 1);

}

gap = DateTime.Now – start;

Console.WriteLine(“Single Append (comma in string) takes: {0}”, gap);

sb.Remove(0, sb.Length – 1);

Console.ReadKey();

}

Partitioned Tables in SQL Server 2005

Partitioned tables are a new feature available in SQL Server version 2005, aimed mainly at improving the performance of large database systems. The feature is only available for enterprise and developer edition. For other editions you can get a similar functionality with a partitioned view.

This article focuses on how to create a partitioned table and manipulate the partitions, rather than exploring the performance aspects.

Partitioned tables are a new feature available in SQL Server version 2005, aimed mainly at improving the performance of large database systems. The feature is only available for enterprise and developer edition. For other editions you can get a similar functionality with a partitioned view.

This article focuses on how to create a partitioned table and manipulate the partitions, rather than exploring the performance aspects.

Creating the Partitioned Table

First of all, we will create a partitioned table and prove that it is acting in a partitioned manner, in that the queries will only access the partitions that are required.

To start with, we need to create a partition function. This will define how many partitions exist, and the values contained in partition columns within those partitions

CREATE PARTITION FUNCTION MyPartitionRange (INT)
AS RANGE LEFT FOR VALUES (1,2)

The partition function has been named MyPartitionRange.

The partition column is an int.

‘Range left’ means that the value is the upper bound for the partition.

The above code will define a partitioned table that contains 3 partitions.

  • Partition 1 – Partition value <= 1
  • Partition 2 – Partition value > 1 and <= 2
  • Partition 3 – Partition value > 2

As the partition column is an integer the partitions will actually be

  • Partition 1 – Partition value <= 1
  • Partition 2 – Partition value =2
  • Partition 3 – Partition value > 2

Note: that the range left / range right determines the partition for data that matches the partition literal value. For integer data this would change the partition number for all the data .

Now we create a partition scheme.

CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])

The partition scheme is named MyPartitionScheme and references the partition function MyPartitionRange. All of the partitions are to be held on the primary filegroup.

Now we can create the table.

All that is needed is to reference the partition scheme naming the partition column in an “on” clause

CREATE TABLE MyPartitionedTable
       (
       i INT ,
       s CHAR(8000) ,
       PartCol INT
       )
ON
 MyPartitionScheme (PartCol)

We can check the table structure via…

SELECT *
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('MyPartitionedTable')

…which gives…

partition_id         object_id index_id  partition_number   hobt_id            rows
------------------ ----------- --------- ----------------   ------------------ ----
72057594038714368  229575856   0         1                  72057594038714368    0
72057594038779904  229575856   0         2                  72057594038779904    0
72057594038845440  229575856   0         3                  72057594038845440    0

Now we add some data

INSERT MyPartitionedTable (i, s, PartCol) SELECT 1, 'a', 1
INSERT MyPartitionedTable (i, s, PartCol) SELECT 2, 'a', 2
INSERT MyPartitionedTable (i, s, PartCol) SELECT 3, 'a', 2
INSERT MyPartitionedTable (i, s, PartCol) SELECT 4, 'a', 3
INSERT MyPartitionedTable (i, s, PartCol) SELECT 5, 'a', 3
INSERT MyPartitionedTable (i, s, PartCol) SELECT 6, 'a', 3
INSERT MyPartitionedTable (i, s, PartCol) SELECT 7, 'a', 4

… and check that the rows have been added to the correct partitions

SELECT *
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('MyPartitionedTable')

… giving as expected…

partition_id        object_id index_id    partition_number hobt_id              rows
------------------- --------- ----------- ---------------- -------------------- ----
72057594038714368   229575856 0           1                72057594038714368    1
72057594038779904   229575856 0           2                72057594038779904    2
72057594038845440   229575856 0           3                72057594038845440    4

A function, $partition, is available to give the partition number for the data.

SELECT PartitionNo = $partition.MyPartitionRange(PartCol), NumRows = COUNT(*)
FROM MyPartitionedTable
GROUP BY $partition.MyPartitionRange(PartCol)
ORDER BY $partition.MyPartitionRange(PartCol)

The following…

SELECT PartitionNo = $partition.MyPartitionRange(6)

…is a valid statement and shows that a row with partition value 6 would be added to partition 3.

The function gives the row counts for each partition and so can be useful to find which partition holds the data.

Now to test that queries only access the required partition. We make partition 2 very large (this is why we made s a char(8000)).

DECLARE @i INT
SELECT @i = 13
WHILE @i > 0
BEGIN
       SELECT @i = @i - 1 

INSERT MyPartitionedTable (i, s, PartCol)
       SELECT i, s, PartCol
       FROM MyPartitionedTable
       WHERE PartCol = 2
END

…and we check the rowcounts …

SELECT *
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('MyPartitionedTable')

…giving as expected

/*
partition_id        object_id   index_id  partition_number hobt_id              rows
------------------- ----------- --------- ---------------- -------------------- -----
72057594041532416   2117582582  0         1                72057594041532416    1
72057594041597952   2117582582  0         2                72057594041597952    16384
72057594041663488   2117582582  0         3                72057594041663488    4
*/

On my laptop that is enough to give an appreciable difference in query times – if the first query below is too quick to detect on your machine then increase the number of loops to add rows to the partition.

Now try the queries

SELECT COUNT(DISTINCT s) FROM MyPartitionedTable WHERE PartCol = 2
SELECT COUNT(DISTINCT s) FROM MyPartitionedTable WHERE PartCol = 1

You should find that the first takes a lot longer than the second because in both cases only one partition is accessed, a single read for partition 1, many reads for partition 2.

Adding and Removing Partitions

In the previous example, we added data to the partition by inserting rows into the partitioned table. It is also possible to populate a table, and then add that table to the partitioned table as a partition. There are many restrictions on the nature of the table and data that can be added.

The command to add the table as a partition is “alter table …. Switch…”. It actually swaps the table with a partition already existing in the partitioned table.

We will now add a new partition to MyPartitionedTable for partition value 3 and then swap it for a new tableMyNewPartition.

To add a new partition to MyPartitionedTable, use the split range command on the partition function.

ALTER PARTITION FUNCTION MyPartitionRange () split RANGE (3)

This has added a new partition for partition value 3.

The reverse of this is a merge range statement

ALTER PARTITION FUNCTION MyPartitionRange () merge RANGE (3)
SELECT *
FROM sys.partitions
WHERE
 OBJECT_ID = OBJECT_ID('MyPartitionedTable')

/*
partition_id         object_id   index_id  partition_number hobt_id              rows
-------------------- ----------- --------- ---------------- -------------------- ----
72057594042056704    322100188   0         1                72057594042056704    1
72057594042122240    322100188   0         2                72057594042122240    2
72057594042187776    322100188   0         4                72057594042187776    1
72057594042253312    322100188   0         3                72057594042253312    3
*/

Note that the existing rows for partition value 3 have been moved to partition 3. The row with partition value 4 has been moved to partition 4.

We create a new table to swap with a partition. This table must have the same structure as the partition. It must also include a check constraint to ensure that the partition column values in the table are included in the correct partition. The check constraint must be at least as restrictive as the partition range function for that partition.

CREATE TABLE MyNewPartition
       (
       i
 INT ,
       s CHAR(8000) ,
       PartCol INT CHECK (PartCol = 3 AND PartCol IS NOT NULL)
       )

I normally create and populate the table and add the check constraint later.

Now we add some data to the new table

INSERT MyNewPartition SELECT 1, 'a', 3

To perform this operation the partition must be empty so…

DELETE MyPartitionedTable WHERE PartCol = 3

And we can swap the table with the partition

ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

Viewing the partitions

SELECT *
FROM
 sys.partitions
WHERE OBJECT_ID = OBJECT_ID('MyPartitionedTable')
/*
partition_id         object_id   index_id  partition_number hobt_id              rows
-------------------- ----------- --------- ---------------- -------------------- ----
72057594042056704    322100188   0         1                72057594042056704    1
72057594042122240    322100188   0         2                72057594042122240    2
72057594042187776    322100188   0         4                72057594042187776    1
72057594042318848    322100188   0         3                72057594042318848    1
*/

We see that the partition has been swapped with the new table.

The advantage of this is that the swap does not move the data – it just updates the metadata so that the table becomes the partition. This means that it is very fast. The table can be created, populated, and then added as a partition, thereby causing minimal impact on the partitioned table.

Switching a populated partition

Of course the partition that is being swapped out would often contain a lot of data – usually this would be used for adding another partition to the right to split up the data. Deleting the data from the partition would not be feasible – nor would the split on a populated partition.

To accomplish this, you would need to first populate the two tables to replace the ‘catch all’ partition (in our scenario partition 3).

Now create a table MyOldPartition3 – remember this must be the same structure as above. As this is a destination for a switch the check constraint must be less restrictive than that on the partitioned table.

Now switch the partition out

ALTER TABLE MyPartitionedTable switch PARTITION
 3 TO MyOldPartition3

Now the partition split can be carried out on an empty partition and the two partitions switched in without moving any data.

Identities in a partitioned table

Remember that an identity is not guaranteed to be unique or sequential. It just allocates the next value from the current seed. With that in mind nothing that follows should come as a surprise.

For this we will create a new partition function, scheme and table

CREATE PARTITION FUNCTION MyIdentityPartitionRange (INT)
AS RANGE LEFT FOR VALUES (1,2,3)
CREATE PARTITION SCHEME MyIdentityPartitionScheme AS
PARTITION MyIdentityPartitionRange
ALL TO ([PRIMARY])
CREATE TABLE MyIdentityPartitionedTable
       (
       i INT IDENTITY (1,1) ,
       s CHAR(10)
 ,
       PartCol INT
       )
ON MyPartitionScheme (PartCol)

--And we add some data
INSERT MyIdentityPartitionedTable (s, PartCol) SELECT 'a', 1
INSERT MyIdentityPartitionedTable (s, PartCol) SELECT 'a', 2
INSERT MyIdentityPartitionedTable (s, PartCol) SELECT 'b', 1
INSERT MyIdentityPartitionedTable (
s, PartCol) SELECT 'b', 2

SELECT * FROM MyIdentityPartitionedTable

i           s          PartCol
———– ———- ———–
1           a          1
3           b          1
2           a          2
4           b          2

Showing that the identity is a property of the partitioned table rather than the partition.

More interesting is what happens when partitions are swapped:

CREATE TABLE MyIdentityPartitionedTableNew
       (
       i INT IDENTITY (1,1) ,
       s CHAR(10
) ,
       PartCol INT CHECK (PartCol = 3 AND PartCol IS NOT NULL)
       )

INSERT MyIdentityPartitionedTableNew (s, PartCol) SELECT 'c', 3
INSERT MyIdentityPartitionedTableNew (s, PartCol) SELECT 'd', 3
ALTER TABLE MyIdentityPartitionedTableNew
       switch TO MyIdentityPartitionedTable PARTITION 3

SELECT * FROM MyIdentityPartitionedTable

i           s          PartCol
———– ———- ———–
1           a          1
3           b          1
2           a          2
4           b          2
1           c          3
2           d          3

And we see duplicate identity values from the new partition.

Adding a new row

INSERT MyIdentityPartitionedTable (s, PartCol) SELECT 'e', 1
SELECT * FROM MyIdentityPartitionedTable

i           s          PartCol
———– ———- ———–
1           a          1
3           b          1
5           e          1
2           a          2
4           b          2
1           c          3
2           d          3

We see that the partition swap has not affected the identity seed for the table.

Partitioned tables and Indexes

If the index contains the partitioning column then the index is referred to as being ‘aligned’ with the table.

If the index uses the same partitioning scheme as the table and is in the same filegroup then the index must be aligned with the table.

For a non-clustered non-unique index the partitioning column can be included to align the index rather than being indexed.

I think it is best to always explicitly include the partitioning column in your indexes.

The following assumes a single filegroup and scheme.

Clustered index

As stated this index must be aligned with the table. If it is not then the partitioning column will be implicitly added as the last column of the index.

CREATE PARTITION FUNCTION MyPartitionRange (INT)
AS RANGE LEFT FOR VALUES (1,2,3)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])
CREATE TABLE MyPartitionedTable
       (
       i INT ,
       j INT ,
       s VARCHAR(MAX) ,
       PartCol INT

       )
ON MyPartitionScheme (PartCol)
CREATE TABLE MyNewPartition
       (
       i INT ,
       j INT ,
       s VARCHAR(MAX) ,
       PartCol INT CHECK (PartCol = 3 AND PartCol IS NOT NULL)
       ) 

CREATE CLUSTERED INDEX cl_ix ON MyPartitionedTable (j)
CREATE CLUSTERED INDEX cl_ix ON MyNewPartition (j)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

…gives the error…

ALTER TABLE SWITCH statement failed. There is no identical index in source table ‘tempdb.dbo.MyNewPartition’ for the index ‘cl_ix’ in target table ‘tempdb.dbo.MyPartitionedTable’ .

whereas these all succeed.

DROP INDEX MyNewPartition.cl_ix
DROP INDEX MyPartitionedTable.cl_ix
CREATE CLUSTERED INDEX cl_ix ON MyPartitionedTable (j, PartCol)
CREATE CLUSTERED INDEX cl_ix ON MyNewPartition (j, PartCol)
ALTER
 TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3
DROP INDEX MyNewPartition.cl_ix
DROP INDEX MyPartitionedTable.cl_ix
CREATE CLUSTERED INDEX cl_ix ON MyPartitionedTable (PartCol, j)
CREATE CLUSTERED indeex cl_ix ON MyNewPartition (PartCol, j)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

This will also work

DROP INDEX MyNewPartition.cl_ix
DROP INDEX MyPartitionedTable.cl_ix
CREATE CLUSTERED INDEX cl_ix
 ON MyPartitionedTable (j)
CREATE CLUSTERED INDEX cl_ix ON MyNewPartition (j, PartCol)

…showing that the partitioning column has been implicitly added to the clustered index in the partitioned table.

Note that this extra column will not be shown by sp_helpindex on the partitioned table nor by scripting the index but it is shown by sys.index_columns

I don’t know why Microsoft decided to add the column to the index automatically. I think it would be less confusing to give an error, thereby forcing the user to add the column explicitly.

Unique index

Unique indexes must contain the partitioning column as an indexed column.

CREATE PARTITION FUNCTION MyPartitionRange (INT)
AS RANGE LEFT FOR VALUES (1,2,3)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY]) 

CREATE TABLE MyPartitionedTable
       (
       i
 INT ,
       j INT ,
       s VARCHAR(MAX) ,
       PartCol INT
       )
ON MyPartitionScheme (PartCol) 

CREATE TABLE MyNewPartition
       (
       i INT ,
       j INT ,
       s VARCHAR(MAX) ,
       PartCol INT CHECK (PartCol = 3
 AND PartCol IS NOT NULL)
       )
CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (j)

…gives the error

   Msg 1908, Level 16, State 1, Line 1 olumn 'PartCol' is partitioning
column of the index 'cl_ix'. Partition columns for a unique index must
be a subset of the index key.

… one of the more explanatory error messages.

The Partitioning column must be part of the index so …

CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (j) include (PartCol)

…will also fail.

Unlike clustered indexes the partitioning column must be explicitly part of the index.
These will both work

CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (j, PartCol)
CREATE UNIQUE INDEX cl_ix
ON MyPartitionedTable (j, PartCol)

ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3
CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (PartCol , j)
CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (PartCol , j)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

Non-unique index

Non-unique indexes do not need to have the partitioning column as part of the index, it can be an INCLUDE column. If it is not explicitly included, then the column will be automatically added – again this will not appear in sp_helpindex

CREATE PARTITION FUNCTION MyPartitionRange (INT)
AS
 RANGE LEFT FOR VALUES (1,2,3)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])
CREATE TABLE MyPartitionedTable
       (
       i INT ,
       j INT ,
       s VARCHAR(MAX) ,
       PartCol INT
       )
ON MyPartitionScheme (PartCol)

CREATE TABLE MyNewPartition
       (
       i INT ,
       j INT ,
       s VARCHAR(MAX) ,
       PartCol INT CHECK (PartCol = 3 AND PartCol IS NOT NULL)
       )
CREATE INDEX cl_ix ON MyPartitionedTable (j)
CREATE INDEX cl_ix ON MyNewPartition (j) include (PartCol)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

Is successful as are…

CREATE INDEX cl_ix ON MyPartitionedTable (j, PartCol)
CREATE INDEX cl_ix ON MyNewPartition (j, PartCol)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3
CREATE INDEX cl_ix ON MyPartitionedTable (PartCol, j)
CREATE INDEX cl_ix
 ON MyNewPartition (PartCol, j)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

Scenarios

Partitioning on multiple columns

Although the partitioning column must be a single column, it does not need to be numeric and it can be calculated so that the range can include multiple columns. For instance it is common to partition on datetime data by month. This will work well, because that data is usually in a single column, but what do you do if you have data for multiple companies and you also want to partition by company? For this you could use a computed column for the partitioning column. This will create a computed column using the ‘company id’ and ‘order month’ which is then used for the partitions. It will partition three companies for the first three months of 2007.

CREATE PARTITION FUNCTION MyPartitionRange (INT)
AS RANGE LEFT FOR VALUES
       (1200701,1200702,1200703,2200701,2200702,2200703,3200701,3200702,3200703)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange

ALL TO ([PRIMARY])
CREATE TABLE CompanyOrders
       (
       Company_id      INT ,
       OrderDate       datetime ,
       Item_id         INT ,
       Quantity        INT ,
       OrderValue      decimal(19,5) ,
       PartCol AS Company_id * 10000 + CONVERT(VARCHAR(
4),OrderDate,112) persisted
       )
ON MyPartitionScheme (PartCol)

The computed column must be ‘persisted’ to form the partitioning column.

We will investigate the maintaining of partitioned data in this table later.

Monthly Data – the sliding range

A common requirement is to partition by month. This means that new month partitions need to be added and possibly old data partitions removed. I will describe the process for the addition of a new partition for later data, to remove an old partition the process is the same except that you swap out two partitions, merge the range and swap in a single table.

We create a partitioned table for data by OrderDate month

CREATE PARTITION FUNCTION MyPartitionRange (datetime)
AS RANGE RIGHT FOR VALUES ('20070101', '20070201', '20070301', '20070401')
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])
CREATE TABLE Orders
       (

       OrderDate       datetime ,
       Item_id         INT ,
       Quantity        INT ,
       OrderValue      decimal(19,5)
       )
ON MyPartitionScheme (OrderDate)

This will give four partitions…

OrderDate < '20070101'
OrderDate >= '20070101' AND < '20070201'
OrderDate >= '20070201' AND < '20070301'
OrderDate >= '20070301' AND < '20070401'
OrderDate >= '20070401'

Therefore the data will be split into partitions by month.

And we insert some test data

--insert Orders select '19000101', 1, 1, 1
INSERT Orders SELECT '20070101', 1, 1, 1
INSERT Orders SELECT '20070201', 1, 1, 1
INSERT Orders SELECT '20070301', 1, 1, 1
INSERT Orders SELECT '20070401', 1, 1, 1
INSERT Orders SELECT '20070402', 1, 1, 1
INSERT Orders SELECT '20070501', 1, 1, 1

To add the next month’s partition it is possible to just split the range and let the system take care of the data. This though would mean that the data would be off-line for the duration of the operation. It is better to use the experience we have gained in switching partitions to create the new data in separate tables then switch them in. This means that the table would be off-line for a very short time – just while the switch operations are taking place.

If data is being continually added to the partitioned table then a snapshot can be taken, the new tables prepared on this and the switch-in operation will need to take the table off-line, merge the new data with the prepared snapshot data then perform the switch-in. An identity column on the table would help to identify new data added since the snapshot. This would mean longer downtime than for static data but still a lot less than splitting a populated range.

To add a new months partition of static data

  1. Create the table containing the new months data.
  2. Create the table containing the data after the new month
  3. Swap out the last month from the partitioned table
  4. Split the (empty) range in the partitioned table
  5. Swap in the new months data
  6. Swap in the table containing the data after the new month
  7. Check the result

1. Create the table containing the new months data.

In the initial discussion we created a table for this but you might find it easier to create a partitioned table for the operation.

Note that if you script the existing table and indexes make sure that you remember that not all indexed columns may appear in the index script.

Also if you take this route you will have to use a new partition function and scheme as you will need to split the range. I prefer to use non-partitioned tables for flexibility.

To population of the table will depend on where the data resides but wil only affect the production table if you need to read the data from that table.

CREATE TABLE Orders_200704
       (
       OrderDate       datetime CHECK (OrderDate >= '20070401'
                       AND OrderDate < '20070501' AND OrderDate IS NOT NULL) ,
       Item_id         INT
 ,
       Quantity        INT ,
       OrderValue      decimal(19,5)
       )

INSERT Orders_200704 SELECT * FROM Orders WHERE OrderDate >= '20070401
            AND OrderDate < '20070501'

2. Create the table containing the data after the new month

In the same way as the previous table populate with the data that is later than the new month.

CREATE TABLE
Orders_200705
       (
       OrderDate       datetime CHECK (OrderDate >= '20070501' AND OrderDate
                                           IS NOT NULL) ,
       Item_id         INT ,
       Quantity        INT ,
       OrderValue      decimal(19,5)
       )

INSERT Orders_200705 SELECT * FROM Orders WHERE OrderDate >= '20070501'

At this point take the production table off-line for the swap.

3. Swap out the last month from the partitioned table

For this you will need an empty table to swap the data into. It is tempting again to use a partitioned table for this – if so you will need to create a new partition function and scheme as you would not want to lose the data until after splitting the range on the production table.

 CREATE TABLE
Orders_200704_Old
       (
       OrderDate       datetime CHECK (OrderDate >= '20070401'
                                        AND OrderDate IS NOT NULL) ,
       Item_id         INT ,
       Quantity        INT ,
       OrderValue      decimal(19,5)
       )

ALTER TABLE Orders switch PARTITION 5 TO Orders_200704_Old

4 Split the (empty) range in the partitioned table

This is done by adding a new value to the partition function

ALTER PARTITION FUNCTION MyPartitionRange () split RANGE ('20070501')

As the partition is empty this should be quick as it just means creating a new empty partition.

5 Swap in the new months data

ALTER TABLE Orders_200704 switch TO Orders PARTITION 5

6 Swap in the table containing the data after the new month

ALTER TABLE Orders_200705 switch TO Orders PARTITION 6

7 Check the result

SELECT *
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('Orders')

Gives the expected result

partition_id         object_id   index_id  partition_number hobt_id              rows
——————– ———– ——— —————- ——————– —-
72057594038845440    213575799   0         1                72057594038845440    1
72057594038910976    213575799   0         2                72057594038910976    1
72057594038976512    213575799   0         3                72057594038976512    1
72057594039042048    213575799   0         4                72057594039042048    1
72057594039173120    213575799   0         5                72057594039173120    2
72057594039238656    213575799   0         6                72057594039238656    1

Now the table Orders_200704_Old can be dropped at your leisure.

Adding a new partition with a computed partition function

We return to the table partitioned by company and month.

CREATE PARTITION FUNCTION MyPartitionRange (INT)
AS RANGE LEFT FOR VALUES
       (1200701,1200702,1200703,2200701,2200702,2200703,3200701,3200702,3200703)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])

CREATE TABLE CompanyOrders
       (
       Company_id      INT , 

OrderDate       datetime ,
       Item_id         INT ,
       Quantity        INT ,
       OrderValue      decimal(19,5) ,
       PartCol AS Company_id * 10000 + CONVERT(VARCHAR(4),OrderDate,112) persisted
       )
ON MyPartitionScheme (PartCol)

In order too add a new month to this table, you will need to split each company’s range for that month. To add a new company partition means splitting all months for that company.

There is no need to add all the partitions in one process, these operations can be performed one partition at a time. This should not affect the results of queries on the table.

The process of adding the new company or month is the same as for the sliding month data, only with many splits and swaps.

Other uses of partitioned tables

Usually, partitioned tables are used to horizontally, or vertically, partition the data. However, the partitions are sometimes used for different purposes – not partitioning the data at all.

I recently came across a reporting system that was querying a single flat table for aggregated results. The table was about 15Gb in size and could be filtered and grouped on any combination of columns. Indexing by date meant that a report for a year would take about 20 minutes – far too long (performance checked just before the release date of course), in fact anything more than 3 months was unacceptable. Normalising and using a view helped with the retrieval of the filter column data, but the actual report was limited by the amount of data it needed to aggregate.

Due to time constraints and policy, we were not allowed to create a cube and the report could not call a stored procedure. Oddly, there was a lot of flexibility in the query used to extract data but it had to access the single table.

The solution was to create a partitioned table. The first partition (partition value = 1) contained the old data table and would still be slow to access.

The second partition contained aggregated data, aggregated by filter columns that kept the table size less than 200K rows. All partitions have the same structure so those filter columns that were excluded were set to null. Accessing this partition was quick enough for any report.

The report application was then changed to check if the filter/grouping columns were all included in the second partition – if so it appended “and PartCol = 2” to the query otherwise it appended “and PartCol = 1”.

The reporting application could warn the users if they were about to do something that would take a long time.

We then checked how the reports were being used, and selected combinations of columns that could be used for other partitions. The partitions were added to the table (not affecting the system) and then, at a later time, the reporting application changed to use the new partitions.

This would have been easier with a stored procedure as the partitions could have been left as separate tables and the stored procedure could choose which to query, but a stored procedure call was not allowed by the reporting application.

Continue reading “Partitioned Tables in SQL Server 2005” »

Thank you very much for visiting this article. In case if you are not on the MSDN blogs then I would request you to please visit my blog at http://blogs.msdn.com/manisblog because at times I improve the existing articles after reading emails from people who enthusiastically provide their feedback. These improvements might not be reflected on the other blog sites who have indexed this article.


Hmm.. what am I going to cover today…. it is something that we all expect from a SQL Server database i.e. performance and I am going to write it in the same fashion that you like most i.e. simplify the contents, make it easy to understand and provide the steps with pictures.

Continue reading “Easy Table Partitions with SQL Server 2008” »

I have received call from my DBA friend who read my article SQL SERVER – 2005 – Introduction to Partitioning. He suggested that I should write simple tutorial about how to horizontal partition database table. Here is simple tutorial which explains how a table can be partitioned. Please read my articleIntroduction to Partitioning before continuing to this article.
Continue reading “SQL SERVER – 2005 – Database Table Partitioning Tutorial – How to Horizontal Partition Database Table” »

CSS Caching Hack

If you change the HTML and the CSS of a page there is a decent chance that a user will get the new HTML but not the new CSS. This is especially true for sites with high usage and users that come back to the site several times a day.

If they only get the new HTML but not the new CSS then the site will break, the user will get confused, and you will look unprofessional. On the development site of this the problem is already solved. Having a Dev server as a place to test code is standard practice but how does this translate to CSS?

The problem is that the CSS is cashing on the client side and there isn’t an obvious way of telling the browser to un-cache it. Luckily the key word is obvious. Though it’s not in wide use there is a quick hack that will keep your CSS as fresh as your HTML.

The trick is to pass a variable on the end of the CSS file like so:

<link rel="stylesheet" xhref="http://www.stefanhayden.com/style.css?version=1" type="text/css" />

What does ?version=1 mean? This is what a URL looks like if it’s passing a GET variable from one page to the next. To the browser it means the page is dynamic and it needs to get a new version because code may have changed. The browser has no way of knowing if the CSS file is actually dynamic or not.The trick is to change the number each time you update the CSS file to make sure the browser always downloads the new code.

When a browser looks to see if it has anything cashed it compares file names. If you have “style.css” in your cashe then it’s not going to download it again. But if the browser compares “style.css?version=1″ to what the new HTML is “style.css?version=2″ then the browser thinks they are different files and needs to download the new CSS file.

The other reason this works is because you can add anything you want after the ? and the web page just ignores it unless it’s an actually variable on the page.

This seems to be a really good solution to version css and yet so few seem to use it. The only 2 sites I know of who do is Odeo and Sconex. Yet clearly we are in the middle of a big web boom with CSS being used every where. How are other people versioning their CSS so it doesn’t break the user experience?

In general I can’t see to many other solutions. You could make the CSS file parsed by the web server and pass headers with different cacheing info. I have not tried this but I’ll bet the browser would still cache the file as it does not know it is dynamic even if it is. You could rename the CSS file with .php but clearly no one is doing that and I’ll bet there is a browser out there that would not apply the styles because of that.

No every one gets to work on a large production site but with so many jumping in the area and quickly updating the service I’m surprised this subject has not been covered.

- Tên phim: Tân Vịnh Xuân Quyền – Wing Chun
- Đạo diễn: Edmond Fung Yuen Man, Gary Sing
- Diễn viên: Tạ Ðình Phong, Hồng Kim Bảo, Nguyên Bưu, Hạ Khi
- Sản xuất: Universe
- Thể loại phim: Võ Thuật
- Quốc gia: Trung Quốc
- Năm sản xuất: 2006
- Thời lượng: 40 tập

Continue reading “Tân Vịnh Xuân Quyền [CDrama - Thuyết Minh] [DVDrip] [Mediafire]” »

Dbz 144 – Xuất hiện rồi !!! Siêu Kamehameha

Dbz 141 – Bốn người thiên tân phạm

Powered by WordPress | Theme: by 85ideas. Editor by Khoanguyen