Category: SQL SERVER


A very good book by the authors that I admire and appreciate mentioned the use of a class called SQLAsyncResult to access data asynchronously using ADO.NET 2.0 via the callback approach. My attempt to try out an example failed as I couldn’t find the SQLAsyncResult class, not in the namespace, not in the MSDN library. However, I did find a post on ASP.NET forum about a dev’s unsuccessful hunt for this class and an MVP answering that it might be a type in the example.
The reason I’m mentioning it at the start of this post is that, if any one developer finds this class, I would really appetite you leaving a comment.
Now, back to the Callback approach …

This post is in continuation to my previous post about Asynchronous data access using ADO.NET however, that post need not be referred if you are interesting in the callback only.
In this approach we leverage .NET infrastructure relating to threading, AsyncCallback delegate, the AsyncResult class and the IAsyncResult interface.
The approach is accomplished by:

  • Creating a call back method which will process the data (essentially call the end invoke command) which accepts the IAsyncResult
  • Creating an AsyncCallback delegate for the callback methods created &
  • Passing the delegate along with the command object in the begin invoke command.

Here’s a simple implementation of the call back approach:
The model class to load data:

public class Customer {

    public Customer() {
    }

    public Customer(SqlDataReader reader) {

        CustomerID = reader["CustomerID"].ToString();
        CompanyName = reader["CompanyName"] == null ? string.Empty : reader["CompanyName"].ToString();
        ContactName = reader["ContactName"] == null ? string.Empty : reader["ContactName"].ToString();
        Phone = reader["Phone"] == null ? string.Empty : reader["Phone"].ToString();
    }

    public string CustomerID { get; set; }
    public string CompanyName { get; set; }
    public string ContactName { get; set; }
    public string Phone { get; set; }

}

Here’s the data access class that fetches customer asynchronously using the callback approach:

public class CustomerDB {

    /// <summary>
    /// This variable keeps the track of number of asynchronous calls
    /// </summary>
    private static int counter = 0;

    /// <summary>
    /// Class member to store the result.
    /// </summary>
    private List<Customer> _customers = new List<Customer>();

    /// <summary>
    /// This is the main method that needs to be called from other application layer to get Customers.
    /// </summary>
    /// <returns></returns>
    public List<Customer> GetCustomers() {

        //Initiate Connection
        SqlConnection con = new SqlConnection();
        con.ConnectionString = ConfigurationManager.ConnectionStrings["NorthwindConnectionString"].ConnectionString;

        //Initiate Command
        SqlCommand cmd = con.CreateCommand();
        cmd.CommandText = "SELECT top 10 * FROM [Customers]";
        cmd.CommandType = CommandType.Text;

        try {

            con.Open();

            // create AsyncCallback delegate
            AsyncCallback callback = new AsyncCallback(RetrieveDataCallback);

            cmd.BeginExecuteReader(callback, cmd, CommandBehavior.CloseConnection);

            //Wait until the call is complete
            while (counter < 1) {
            }

        }
        catch {
            //Log error
        }

        return _customers;

    }

    /// <summary>
    /// The callback method to retrieve data.
    /// </summary>
    /// <param name="result"></param>
    private void RetrieveDataCallback(IAsyncResult result) {

        SqlDataReader dr = null;

        try {
            SqlCommand command = (SqlCommand)result.AsyncState;
            Customer customer = null;

            dr = command.EndExecuteReader(result);

            while (dr.Read()) {

                customer = new Customer(dr);
                _customers.Add(customer);
            }
        }
        catch {
            //Log error
        }
        finally {

            try {
                if (!dr.IsClosed) {
                    dr.Close();
                }
            }
            catch { } //Suppress any exceptions here

            //Increment value to notify that the call is complete.
            Interlocked.Increment(ref counter);

        }
    }
}

Let’s rewrite this code to make multiple calls to different databases using the same customer model class. To make sure that the data access is thread safe we need to add a static object. This static object needs to be locked every time a thread completes and attempts to update the Customers list. Here’s the new CustomerDB class:

public class CustomerDB {

    /// <summary>
    /// static object for thread safety.
    /// </summary>
    static object lockObject = new object();

    /// <summary>
    /// This variable keeps the track of number of asynchronous calls
    /// </summary>
    private static int counter = 0;

    /// <summary>
    /// Class member to store the result.
    /// </summary>
    private List<Customer> _customers = new List<Customer>();

    /// <summary>
    /// This is the main method that needs to be called from other application layer to get Customers.
    /// </summary>
    /// <returns></returns>
    public List<Customer> GetCustomers() {

        //Initiate Connection
        SqlConnection con1 = new SqlConnection();
        con1.ConnectionString = ConfigurationManager.ConnectionStrings["NorthwindConnectionString"].ConnectionString;
        SqlConnection con2 = new SqlConnection();
        con1.ConnectionString = ConfigurationManager.ConnectionStrings["SouthwindConnectionString"].ConnectionString;
        SqlConnection con3 = new SqlConnection();
        con1.ConnectionString = ConfigurationManager.ConnectionStrings["EastwindConnectionString"].ConnectionString;

        //Initiate Command
        SqlCommand cmd1 = con1.CreateCommand();
        SqlCommand cmd2 = con2.CreateCommand();
        SqlCommand cmd3 = con3.CreateCommand();

        //The table structures are same in my case.
        //In case of different table structures include logic in the call back method
        //or Customer constructor to DB specific load.
        cmd1.CommandText = "SELECT top 10 * FROM [Customers]";
        cmd1.CommandType = CommandType.Text;
        cmd2.CommandText = "SELECT top 10 * FROM [Customers]";
        cmd2.CommandType = CommandType.Text;
        cmd3.CommandText = "SELECT top 10 * FROM [Customers]";
        cmd3.CommandType = CommandType.Text;

        try {

            con1.Open();
            con2.Open();
            con3.Open();

            // create AsyncCallback delegate
            AsyncCallback callback1 = new AsyncCallback(RetrieveDataCallback);
            AsyncCallback callback2 = new AsyncCallback(RetrieveDataCallback);
            AsyncCallback callback3 = new AsyncCallback(RetrieveDataCallback);

            cmd1.BeginExecuteReader(callback1, cmd1, CommandBehavior.CloseConnection);
            cmd2.BeginExecuteReader(callback2, cmd2, CommandBehavior.CloseConnection);
            cmd3.BeginExecuteReader(callback3, cmd3, CommandBehavior.CloseConnection);

            //Wait until all the call are complete
            while (counter < 3) {
            }

        }
        catch {
            //Log error
        }

        return _customers;

    }

    /// <summary>
    /// The callback method to retrieve data.
    /// </summary>
    /// <param name="result"></param>
    private void RetrieveDataCallback(IAsyncResult result) {

        SqlDataReader dr = null;

        try {
            SqlCommand command = (SqlCommand)result.AsyncState;
            Customer customer = null;

            dr = command.EndExecuteReader(result);

            while (dr.Read()) {

                customer = new Customer(dr);

                // lock the static object when data is loaded
                lock (lockObject) {

                    _customers.Add(customer);
                }

            }
        }
        catch {
            //Log error
        }
        finally {

            try {
                if (!dr.IsClosed) {
                    dr.Close();
                }
            }
            catch { } //Suppress any exceptions here

            //Increment value to notify that the call is complete.
            Interlocked.Increment(ref counter);

        }
    }
}

Batch Updates in ADO.NET 2.0 for Improved Performance

When you updated a database using the DataAdapter in .NET 1.1 each command was sent to the database one at a time.  This caused a lot of roundtrips to the database.

ADO.NET 2.0 has introduced the concept of Batch Updates, which allows you to designate the number of commands sent to the database at a given time.  If used correctly, this can increase the performance of your data access layer by reducing the number of roundtrips to the database.

DataAdapter.UpdateBatchSize Property

The DataAdapter has an UpdateBatchSize Property that allows you to set the number of commands that will be sent to the database with each request.

  • UpdateBatchSize = 1, disables batch updates
  • UpdateBatchSize = X where X > 1, sends x statements to the database at a time
  • UpdateBatchSize = 0, sends the maximum number of statements at a time allowed by the server

Command.UpdatedRowSource Property

When using batch mode, the UpdatedRowSource property of the command can only be set to either UpdatedRowSource.None or UpdatedRowSource.OutputParameters

Continue reading “Batch Updates in ADO.NET 2.0 for Improved Performance” »

Monitor User Connections in SQL Server

Applications typically make connections to the SQL Server databases in large numbers but what if those connections are not successfully closed? How you can monitor them? This short article describes the methods we can use to monitor unclosed connections.

Sample Application

Let us start with a simple C# WinForms application which has the following UI.

The Connect button will have the below coding behind it:

private void Connect_Click(object sender, EventArgs e)

{

int i = 0;

while (1 == 1) // Loop indefinitely

{

//Connection is create and connectiong to database

string conn = “Data Source=.;Initial Catalog=UserConnection;

Integrated Security=SSPI;enlist=false”;

SqlConnection sqlConn = new SqlConnection(conn);

sqlConn.Open();

//Store procedure insert_connection_details is executed,

//looping instance value  (i) will be passed

SqlCommand com = new SqlCommand(“insert_connection_details”, sqlConn);

com.CommandType = CommandType.StoredProcedure;

com.Parameters.Add(new SqlParameter(“@connectionId”, SqlDbType.Int)).Value = i;

com.ExecuteNonQuery();

//if the Close check box is selected onlyConnection will be closed

if (chkClose.Checked)

sqlConn.Close();

//if the dealy is not 0, there will be delayed of millisecond of given value in Delay box

if (txtDelay.Text != “0″)

{

Thread.Sleep(Convert.ToInt32(txtDelay.Text));

}

i = i + 1;

}

Since this code is very simple and fully commented you should understand what is happening with the connect button.

From the SQL Server side, as seen from the above code, there should be a stored procedure as well as a table to store this information.

CREATE TABLE [dbo].[user_connection](

[ConnectionID] [int] NOT NULL,

[SPID] [int] NULL,

[DateTime] [datetime] NULL,

CONSTRAINT [PK_user_connection] PRIMARY KEY CLUSTERED

(

[ConnectionID] ASC

)

)

So after the table is created, the next step is to create the stored procedure:

CREATE PROC [dbo].[insert_connection_details]

@connectionId int

AS

SET NOCOUNT ON

INSERT INTO user_connection

SELECT

@connectionId,

@@SPID,

GETDATE()

So now we are ready for the testing. For testing, we will use standard tools like perfmon and SQL Server Profiler. In perfmon we will be using the User Connections counter of the SQL Server General Statics object. Meanwhile, in the SQL Profiler, the Audit Login event will be captured.

So are you ready to go.

Scenario 1: Not closing the database connection.

When we executed the above code, the code will fail with the following error.

We will further analyze this error is at the end of this article.

After this error if we look at the perfmon output you may see something like this:

Before running the application, the count was 5 which is caused by system processes which are connected to SQL Server.   When the application starts, the number of connections starts to increase rapidly and after reaching 105 it stops.

If you check the profiler or the table, you will see there are more than 100 connections that have been made to the databases. To be exact, there are 304 records in the table.

Let’s execute few queries against this table.

If you simply run the SELECT * statement, you will see following result set.

Here you can see that SPID is not a unique number and it is repeating. For example, SPID  57 can be seen in two places. Similarly, other  SPIDs are also duplicated.

SELECT SPID

,COUNT(1) SPID_COUNT

FROM user_connection

GROUP BY SPID

ORDER BY 1

The above result shows that SPID is used multiple times. For example, SPID 56 is used for 6  times, 68 is used 4 times etc.

However, total count for rows here is exactly matches 100. There you are – the Application is failing when it reaches unique SPID equal 100 not the number of connections.

Scenario 2: Closing the connection

In this scenario, what we are trying is close the connection by clicking the check box.

This time there are no failures since application can  use the same connection. If we observe the perfmon and query we can see that one SPID is used throughout the execution of the application.

When the application was started the number of connections are increased by one but no more. Moreover, throughout the execution period the same SPID is being used.

Scenario 3: Changing the Delay without closing the connection.

Now, lets try this with the delay option on. We want to see whether there is a different behavior with the delay option. We will start with delay of 10ms.

Let us look at the perfmon output:

Still the application is failing after reaching 100 unique SPIDs (you can verify this by running the T-SQL query which was executed before), however, the increase in the number of connections is not as steep as before. Also, now we can see no of actual connections has increased to 650 from 304.

Let us see this by increasing the delay.

Delay 25 50
Perfmon Graph
Delay 100
Perfmon

Graph

It is very clear that from the above graphs, whenever you increase the delay, the time taken for the application to fail has also increased.

Delay (ms) No of Connections
No Delay 304
10 650
25 730
50 727
100 723

Scenario 4: Multiple execution of application instance

In all the above scenarios we were looking at a single instance of the execution of the application. What of if you execute the application using two instances?

The  below perfmon graph results from exectuing the app using two instances

Now you can see that when two applications are running, you will see 200 unique SPIDs. This means allocation of 100 is per application instance.

Explanation

All the details above are “laboratory” experiments. What is the science behind this? Can we configure this?

Let us review the error message we received when we are doing the first review.

“Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.”

Before analyzing this error message, let us see technology behind when a client makes a connection to  SQL Server -  http://www.microsoft.com/sqlserver/en/us/product-info/future-editions.aspx has full details on this but I will summaries it using our laboratory details.

In practice, most applications use only one or a few different configurations for connections. This means that during application execution, many identical connections will be repeatedly opened and closed. To minimize the cost of opening connections, ADO.NET uses an optimization technique called connection pooling.

Connection pooling reduces the number of times that new connections must be opened. The pooler maintains ownership of the physical connection. It manages connections by keeping alive a set of active connections for each given connection configuration. Whenever a user calls Open on a connection, the pooler looks for an available connection in the pool. If a pooled connection is available, it returns it to the caller instead of opening a new connection. When the application calls Close on the connection, the pooler returns it to the pooled set of active connections instead of closing it. Once the connection is returned to the pool, it is ready to be reused on the next Open call. By default, maximum pool connection is 100 that is why application are failing after reaching the connection to 100.

When a new connection is opened, if the connection string is not an exact match to an existing pool, a new pool is created. Connections are pooled per process, per application domain, per connection string and when integrated security is used, per Windows identity.  That is why, when we execute the scenario 4, it uses two connection pools.

ADO.NET 2.0 introduced two new methods to clear the pool: ClearAllPools and ClearPool. ClearAllPools clears the connection pools for a given provider, and ClearPool clears the connection pool that is associated with a specific connection. If there are connections being used at the time of the call, they are marked appropriately. When they are closed, they are discarded instead of being returned to the pool.

Let me put this into our sample code.

private void Connect_Click(object sender, EventArgs e)

{

int i = 0;

while (1 == 1) // Loop indefinitely

{

//Connection is create and connectiong to database

try

{

string conn = “Data Source=.;Initial

Catalog=UserConnection;Integrated Security=SSPI;enlist=false”;

SqlConnection sqlConn = new SqlConnection(conn);

sqlConn.Open();

//Store procedure insert_connection_details is executed,

//looping instance value  (i) will be passed

SqlCommand com = new SqlCommand

(“insert_connection_details”, sqlConn);

com.CommandType = CommandType.StoredProcedure;

com.Parameters.Add(new SqlParameter

(“@connectionId”, SqlDbType.Int)).Value = i;

com.ExecuteNonQuery();

//if the Close check box is selected onlyConnection will be closed

if (chkClose.Checked)

{

sqlConn.Close();

}

//if the dealy is not 0, there will be delayed of millisecond of given value in Delay box

if (txtDelay.Text != “0″)

{

Thread.Sleep(Convert.ToInt32(txtDelay.Text));

}

}

catch (Exception ex)

{

SqlConnection.ClearAllPools();// Clearing the connection pool

}

finally

{

i = i + 1;

}

}

Now if we execute the previous code again this is what you will see in our  perfmon.

Now application is running happily without failing. However, you can observe that it is not clearing out the existing connection and the limit of 100 is not applicable. At every horizontal line you are executing your pool connection.

Conclusion

It is obvious to say you need to close your opened connections. However as DBAs you can verify whether the connections that were opened are closing by the application.

Resource: http://www.sql-server-performance.com/articles/dev/monitor_user_connections_sql_server_p1.aspx

Partitioned Tables and Indexes in SQL Server 2005

Table-based partitioning features in SQL Server 2005 provide flexibility and performance to simplify the creation and maintenance of partitioned tables. Trace the progression of capabilities from logically and manually partitioning tables to the latest partitioning features, and find out why, when, and how to design, implement, and maintain partitioned tables using SQL Server 2005. (41 printed pages)

Continue reading “Partitioned Tables and Indexes in SQL Server 2005” »

Partitioned Tables in SQL Server 2005

Partitioned tables are a new feature available in SQL Server version 2005, aimed mainly at improving the performance of large database systems. The feature is only available for enterprise and developer edition. For other editions you can get a similar functionality with a partitioned view.

This article focuses on how to create a partitioned table and manipulate the partitions, rather than exploring the performance aspects.

Partitioned tables are a new feature available in SQL Server version 2005, aimed mainly at improving the performance of large database systems. The feature is only available for enterprise and developer edition. For other editions you can get a similar functionality with a partitioned view.

This article focuses on how to create a partitioned table and manipulate the partitions, rather than exploring the performance aspects.

Creating the Partitioned Table

First of all, we will create a partitioned table and prove that it is acting in a partitioned manner, in that the queries will only access the partitions that are required.

To start with, we need to create a partition function. This will define how many partitions exist, and the values contained in partition columns within those partitions

CREATE PARTITION FUNCTION MyPartitionRange (INT)
AS RANGE LEFT FOR VALUES (1,2)

The partition function has been named MyPartitionRange.

The partition column is an int.

‘Range left’ means that the value is the upper bound for the partition.

The above code will define a partitioned table that contains 3 partitions.

  • Partition 1 – Partition value <= 1
  • Partition 2 – Partition value > 1 and <= 2
  • Partition 3 – Partition value > 2

As the partition column is an integer the partitions will actually be

  • Partition 1 – Partition value <= 1
  • Partition 2 – Partition value =2
  • Partition 3 – Partition value > 2

Note: that the range left / range right determines the partition for data that matches the partition literal value. For integer data this would change the partition number for all the data .

Now we create a partition scheme.

CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])

The partition scheme is named MyPartitionScheme and references the partition function MyPartitionRange. All of the partitions are to be held on the primary filegroup.

Now we can create the table.

All that is needed is to reference the partition scheme naming the partition column in an “on” clause

CREATE TABLE MyPartitionedTable
       (
       i INT ,
       s CHAR(8000) ,
       PartCol INT
       )
ON
 MyPartitionScheme (PartCol)

We can check the table structure via…

SELECT *
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('MyPartitionedTable')

…which gives…

partition_id         object_id index_id  partition_number   hobt_id            rows
------------------ ----------- --------- ----------------   ------------------ ----
72057594038714368  229575856   0         1                  72057594038714368    0
72057594038779904  229575856   0         2                  72057594038779904    0
72057594038845440  229575856   0         3                  72057594038845440    0

Now we add some data

INSERT MyPartitionedTable (i, s, PartCol) SELECT 1, 'a', 1
INSERT MyPartitionedTable (i, s, PartCol) SELECT 2, 'a', 2
INSERT MyPartitionedTable (i, s, PartCol) SELECT 3, 'a', 2
INSERT MyPartitionedTable (i, s, PartCol) SELECT 4, 'a', 3
INSERT MyPartitionedTable (i, s, PartCol) SELECT 5, 'a', 3
INSERT MyPartitionedTable (i, s, PartCol) SELECT 6, 'a', 3
INSERT MyPartitionedTable (i, s, PartCol) SELECT 7, 'a', 4

… and check that the rows have been added to the correct partitions

SELECT *
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('MyPartitionedTable')

… giving as expected…

partition_id        object_id index_id    partition_number hobt_id              rows
------------------- --------- ----------- ---------------- -------------------- ----
72057594038714368   229575856 0           1                72057594038714368    1
72057594038779904   229575856 0           2                72057594038779904    2
72057594038845440   229575856 0           3                72057594038845440    4

A function, $partition, is available to give the partition number for the data.

SELECT PartitionNo = $partition.MyPartitionRange(PartCol), NumRows = COUNT(*)
FROM MyPartitionedTable
GROUP BY $partition.MyPartitionRange(PartCol)
ORDER BY $partition.MyPartitionRange(PartCol)

The following…

SELECT PartitionNo = $partition.MyPartitionRange(6)

…is a valid statement and shows that a row with partition value 6 would be added to partition 3.

The function gives the row counts for each partition and so can be useful to find which partition holds the data.

Now to test that queries only access the required partition. We make partition 2 very large (this is why we made s a char(8000)).

DECLARE @i INT
SELECT @i = 13
WHILE @i > 0
BEGIN
       SELECT @i = @i - 1 

INSERT MyPartitionedTable (i, s, PartCol)
       SELECT i, s, PartCol
       FROM MyPartitionedTable
       WHERE PartCol = 2
END

…and we check the rowcounts …

SELECT *
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('MyPartitionedTable')

…giving as expected

/*
partition_id        object_id   index_id  partition_number hobt_id              rows
------------------- ----------- --------- ---------------- -------------------- -----
72057594041532416   2117582582  0         1                72057594041532416    1
72057594041597952   2117582582  0         2                72057594041597952    16384
72057594041663488   2117582582  0         3                72057594041663488    4
*/

On my laptop that is enough to give an appreciable difference in query times – if the first query below is too quick to detect on your machine then increase the number of loops to add rows to the partition.

Now try the queries

SELECT COUNT(DISTINCT s) FROM MyPartitionedTable WHERE PartCol = 2
SELECT COUNT(DISTINCT s) FROM MyPartitionedTable WHERE PartCol = 1

You should find that the first takes a lot longer than the second because in both cases only one partition is accessed, a single read for partition 1, many reads for partition 2.

Adding and Removing Partitions

In the previous example, we added data to the partition by inserting rows into the partitioned table. It is also possible to populate a table, and then add that table to the partitioned table as a partition. There are many restrictions on the nature of the table and data that can be added.

The command to add the table as a partition is “alter table …. Switch…”. It actually swaps the table with a partition already existing in the partitioned table.

We will now add a new partition to MyPartitionedTable for partition value 3 and then swap it for a new tableMyNewPartition.

To add a new partition to MyPartitionedTable, use the split range command on the partition function.

ALTER PARTITION FUNCTION MyPartitionRange () split RANGE (3)

This has added a new partition for partition value 3.

The reverse of this is a merge range statement

ALTER PARTITION FUNCTION MyPartitionRange () merge RANGE (3)
SELECT *
FROM sys.partitions
WHERE
 OBJECT_ID = OBJECT_ID('MyPartitionedTable')

/*
partition_id         object_id   index_id  partition_number hobt_id              rows
-------------------- ----------- --------- ---------------- -------------------- ----
72057594042056704    322100188   0         1                72057594042056704    1
72057594042122240    322100188   0         2                72057594042122240    2
72057594042187776    322100188   0         4                72057594042187776    1
72057594042253312    322100188   0         3                72057594042253312    3
*/

Note that the existing rows for partition value 3 have been moved to partition 3. The row with partition value 4 has been moved to partition 4.

We create a new table to swap with a partition. This table must have the same structure as the partition. It must also include a check constraint to ensure that the partition column values in the table are included in the correct partition. The check constraint must be at least as restrictive as the partition range function for that partition.

CREATE TABLE MyNewPartition
       (
       i
 INT ,
       s CHAR(8000) ,
       PartCol INT CHECK (PartCol = 3 AND PartCol IS NOT NULL)
       )

I normally create and populate the table and add the check constraint later.

Now we add some data to the new table

INSERT MyNewPartition SELECT 1, 'a', 3

To perform this operation the partition must be empty so…

DELETE MyPartitionedTable WHERE PartCol = 3

And we can swap the table with the partition

ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

Viewing the partitions

SELECT *
FROM
 sys.partitions
WHERE OBJECT_ID = OBJECT_ID('MyPartitionedTable')
/*
partition_id         object_id   index_id  partition_number hobt_id              rows
-------------------- ----------- --------- ---------------- -------------------- ----
72057594042056704    322100188   0         1                72057594042056704    1
72057594042122240    322100188   0         2                72057594042122240    2
72057594042187776    322100188   0         4                72057594042187776    1
72057594042318848    322100188   0         3                72057594042318848    1
*/

We see that the partition has been swapped with the new table.

The advantage of this is that the swap does not move the data – it just updates the metadata so that the table becomes the partition. This means that it is very fast. The table can be created, populated, and then added as a partition, thereby causing minimal impact on the partitioned table.

Switching a populated partition

Of course the partition that is being swapped out would often contain a lot of data – usually this would be used for adding another partition to the right to split up the data. Deleting the data from the partition would not be feasible – nor would the split on a populated partition.

To accomplish this, you would need to first populate the two tables to replace the ‘catch all’ partition (in our scenario partition 3).

Now create a table MyOldPartition3 – remember this must be the same structure as above. As this is a destination for a switch the check constraint must be less restrictive than that on the partitioned table.

Now switch the partition out

ALTER TABLE MyPartitionedTable switch PARTITION
 3 TO MyOldPartition3

Now the partition split can be carried out on an empty partition and the two partitions switched in without moving any data.

Identities in a partitioned table

Remember that an identity is not guaranteed to be unique or sequential. It just allocates the next value from the current seed. With that in mind nothing that follows should come as a surprise.

For this we will create a new partition function, scheme and table

CREATE PARTITION FUNCTION MyIdentityPartitionRange (INT)
AS RANGE LEFT FOR VALUES (1,2,3)
CREATE PARTITION SCHEME MyIdentityPartitionScheme AS
PARTITION MyIdentityPartitionRange
ALL TO ([PRIMARY])
CREATE TABLE MyIdentityPartitionedTable
       (
       i INT IDENTITY (1,1) ,
       s CHAR(10)
 ,
       PartCol INT
       )
ON MyPartitionScheme (PartCol)

--And we add some data
INSERT MyIdentityPartitionedTable (s, PartCol) SELECT 'a', 1
INSERT MyIdentityPartitionedTable (s, PartCol) SELECT 'a', 2
INSERT MyIdentityPartitionedTable (s, PartCol) SELECT 'b', 1
INSERT MyIdentityPartitionedTable (
s, PartCol) SELECT 'b', 2

SELECT * FROM MyIdentityPartitionedTable

i           s          PartCol
———– ———- ———–
1           a          1
3           b          1
2           a          2
4           b          2

Showing that the identity is a property of the partitioned table rather than the partition.

More interesting is what happens when partitions are swapped:

CREATE TABLE MyIdentityPartitionedTableNew
       (
       i INT IDENTITY (1,1) ,
       s CHAR(10
) ,
       PartCol INT CHECK (PartCol = 3 AND PartCol IS NOT NULL)
       )

INSERT MyIdentityPartitionedTableNew (s, PartCol) SELECT 'c', 3
INSERT MyIdentityPartitionedTableNew (s, PartCol) SELECT 'd', 3
ALTER TABLE MyIdentityPartitionedTableNew
       switch TO MyIdentityPartitionedTable PARTITION 3

SELECT * FROM MyIdentityPartitionedTable

i           s          PartCol
———– ———- ———–
1           a          1
3           b          1
2           a          2
4           b          2
1           c          3
2           d          3

And we see duplicate identity values from the new partition.

Adding a new row

INSERT MyIdentityPartitionedTable (s, PartCol) SELECT 'e', 1
SELECT * FROM MyIdentityPartitionedTable

i           s          PartCol
———– ———- ———–
1           a          1
3           b          1
5           e          1
2           a          2
4           b          2
1           c          3
2           d          3

We see that the partition swap has not affected the identity seed for the table.

Partitioned tables and Indexes

If the index contains the partitioning column then the index is referred to as being ‘aligned’ with the table.

If the index uses the same partitioning scheme as the table and is in the same filegroup then the index must be aligned with the table.

For a non-clustered non-unique index the partitioning column can be included to align the index rather than being indexed.

I think it is best to always explicitly include the partitioning column in your indexes.

The following assumes a single filegroup and scheme.

Clustered index

As stated this index must be aligned with the table. If it is not then the partitioning column will be implicitly added as the last column of the index.

CREATE PARTITION FUNCTION MyPartitionRange (INT)
AS RANGE LEFT FOR VALUES (1,2,3)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])
CREATE TABLE MyPartitionedTable
       (
       i INT ,
       j INT ,
       s VARCHAR(MAX) ,
       PartCol INT

       )
ON MyPartitionScheme (PartCol)
CREATE TABLE MyNewPartition
       (
       i INT ,
       j INT ,
       s VARCHAR(MAX) ,
       PartCol INT CHECK (PartCol = 3 AND PartCol IS NOT NULL)
       ) 

CREATE CLUSTERED INDEX cl_ix ON MyPartitionedTable (j)
CREATE CLUSTERED INDEX cl_ix ON MyNewPartition (j)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

…gives the error…

ALTER TABLE SWITCH statement failed. There is no identical index in source table ‘tempdb.dbo.MyNewPartition’ for the index ‘cl_ix’ in target table ‘tempdb.dbo.MyPartitionedTable’ .

whereas these all succeed.

DROP INDEX MyNewPartition.cl_ix
DROP INDEX MyPartitionedTable.cl_ix
CREATE CLUSTERED INDEX cl_ix ON MyPartitionedTable (j, PartCol)
CREATE CLUSTERED INDEX cl_ix ON MyNewPartition (j, PartCol)
ALTER
 TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3
DROP INDEX MyNewPartition.cl_ix
DROP INDEX MyPartitionedTable.cl_ix
CREATE CLUSTERED INDEX cl_ix ON MyPartitionedTable (PartCol, j)
CREATE CLUSTERED indeex cl_ix ON MyNewPartition (PartCol, j)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

This will also work

DROP INDEX MyNewPartition.cl_ix
DROP INDEX MyPartitionedTable.cl_ix
CREATE CLUSTERED INDEX cl_ix
 ON MyPartitionedTable (j)
CREATE CLUSTERED INDEX cl_ix ON MyNewPartition (j, PartCol)

…showing that the partitioning column has been implicitly added to the clustered index in the partitioned table.

Note that this extra column will not be shown by sp_helpindex on the partitioned table nor by scripting the index but it is shown by sys.index_columns

I don’t know why Microsoft decided to add the column to the index automatically. I think it would be less confusing to give an error, thereby forcing the user to add the column explicitly.

Unique index

Unique indexes must contain the partitioning column as an indexed column.

CREATE PARTITION FUNCTION MyPartitionRange (INT)
AS RANGE LEFT FOR VALUES (1,2,3)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY]) 

CREATE TABLE MyPartitionedTable
       (
       i
 INT ,
       j INT ,
       s VARCHAR(MAX) ,
       PartCol INT
       )
ON MyPartitionScheme (PartCol) 

CREATE TABLE MyNewPartition
       (
       i INT ,
       j INT ,
       s VARCHAR(MAX) ,
       PartCol INT CHECK (PartCol = 3
 AND PartCol IS NOT NULL)
       )
CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (j)

…gives the error

   Msg 1908, Level 16, State 1, Line 1 olumn 'PartCol' is partitioning
column of the index 'cl_ix'. Partition columns for a unique index must
be a subset of the index key.

… one of the more explanatory error messages.

The Partitioning column must be part of the index so …

CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (j) include (PartCol)

…will also fail.

Unlike clustered indexes the partitioning column must be explicitly part of the index.
These will both work

CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (j, PartCol)
CREATE UNIQUE INDEX cl_ix
ON MyPartitionedTable (j, PartCol)

ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3
CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (PartCol , j)
CREATE UNIQUE INDEX cl_ix ON MyPartitionedTable (PartCol , j)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

Non-unique index

Non-unique indexes do not need to have the partitioning column as part of the index, it can be an INCLUDE column. If it is not explicitly included, then the column will be automatically added – again this will not appear in sp_helpindex

CREATE PARTITION FUNCTION MyPartitionRange (INT)
AS
 RANGE LEFT FOR VALUES (1,2,3)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])
CREATE TABLE MyPartitionedTable
       (
       i INT ,
       j INT ,
       s VARCHAR(MAX) ,
       PartCol INT
       )
ON MyPartitionScheme (PartCol)

CREATE TABLE MyNewPartition
       (
       i INT ,
       j INT ,
       s VARCHAR(MAX) ,
       PartCol INT CHECK (PartCol = 3 AND PartCol IS NOT NULL)
       )
CREATE INDEX cl_ix ON MyPartitionedTable (j)
CREATE INDEX cl_ix ON MyNewPartition (j) include (PartCol)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

Is successful as are…

CREATE INDEX cl_ix ON MyPartitionedTable (j, PartCol)
CREATE INDEX cl_ix ON MyNewPartition (j, PartCol)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3
CREATE INDEX cl_ix ON MyPartitionedTable (PartCol, j)
CREATE INDEX cl_ix
 ON MyNewPartition (PartCol, j)
ALTER TABLE MyNewPartition switch TO MyPartitionedTable PARTITION 3

Scenarios

Partitioning on multiple columns

Although the partitioning column must be a single column, it does not need to be numeric and it can be calculated so that the range can include multiple columns. For instance it is common to partition on datetime data by month. This will work well, because that data is usually in a single column, but what do you do if you have data for multiple companies and you also want to partition by company? For this you could use a computed column for the partitioning column. This will create a computed column using the ‘company id’ and ‘order month’ which is then used for the partitions. It will partition three companies for the first three months of 2007.

CREATE PARTITION FUNCTION MyPartitionRange (INT)
AS RANGE LEFT FOR VALUES
       (1200701,1200702,1200703,2200701,2200702,2200703,3200701,3200702,3200703)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange

ALL TO ([PRIMARY])
CREATE TABLE CompanyOrders
       (
       Company_id      INT ,
       OrderDate       datetime ,
       Item_id         INT ,
       Quantity        INT ,
       OrderValue      decimal(19,5) ,
       PartCol AS Company_id * 10000 + CONVERT(VARCHAR(
4),OrderDate,112) persisted
       )
ON MyPartitionScheme (PartCol)

The computed column must be ‘persisted’ to form the partitioning column.

We will investigate the maintaining of partitioned data in this table later.

Monthly Data – the sliding range

A common requirement is to partition by month. This means that new month partitions need to be added and possibly old data partitions removed. I will describe the process for the addition of a new partition for later data, to remove an old partition the process is the same except that you swap out two partitions, merge the range and swap in a single table.

We create a partitioned table for data by OrderDate month

CREATE PARTITION FUNCTION MyPartitionRange (datetime)
AS RANGE RIGHT FOR VALUES ('20070101', '20070201', '20070301', '20070401')
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])
CREATE TABLE Orders
       (

       OrderDate       datetime ,
       Item_id         INT ,
       Quantity        INT ,
       OrderValue      decimal(19,5)
       )
ON MyPartitionScheme (OrderDate)

This will give four partitions…

OrderDate < '20070101'
OrderDate >= '20070101' AND < '20070201'
OrderDate >= '20070201' AND < '20070301'
OrderDate >= '20070301' AND < '20070401'
OrderDate >= '20070401'

Therefore the data will be split into partitions by month.

And we insert some test data

--insert Orders select '19000101', 1, 1, 1
INSERT Orders SELECT '20070101', 1, 1, 1
INSERT Orders SELECT '20070201', 1, 1, 1
INSERT Orders SELECT '20070301', 1, 1, 1
INSERT Orders SELECT '20070401', 1, 1, 1
INSERT Orders SELECT '20070402', 1, 1, 1
INSERT Orders SELECT '20070501', 1, 1, 1

To add the next month’s partition it is possible to just split the range and let the system take care of the data. This though would mean that the data would be off-line for the duration of the operation. It is better to use the experience we have gained in switching partitions to create the new data in separate tables then switch them in. This means that the table would be off-line for a very short time – just while the switch operations are taking place.

If data is being continually added to the partitioned table then a snapshot can be taken, the new tables prepared on this and the switch-in operation will need to take the table off-line, merge the new data with the prepared snapshot data then perform the switch-in. An identity column on the table would help to identify new data added since the snapshot. This would mean longer downtime than for static data but still a lot less than splitting a populated range.

To add a new months partition of static data

  1. Create the table containing the new months data.
  2. Create the table containing the data after the new month
  3. Swap out the last month from the partitioned table
  4. Split the (empty) range in the partitioned table
  5. Swap in the new months data
  6. Swap in the table containing the data after the new month
  7. Check the result

1. Create the table containing the new months data.

In the initial discussion we created a table for this but you might find it easier to create a partitioned table for the operation.

Note that if you script the existing table and indexes make sure that you remember that not all indexed columns may appear in the index script.

Also if you take this route you will have to use a new partition function and scheme as you will need to split the range. I prefer to use non-partitioned tables for flexibility.

To population of the table will depend on where the data resides but wil only affect the production table if you need to read the data from that table.

CREATE TABLE Orders_200704
       (
       OrderDate       datetime CHECK (OrderDate >= '20070401'
                       AND OrderDate < '20070501' AND OrderDate IS NOT NULL) ,
       Item_id         INT
 ,
       Quantity        INT ,
       OrderValue      decimal(19,5)
       )

INSERT Orders_200704 SELECT * FROM Orders WHERE OrderDate >= '20070401
            AND OrderDate < '20070501'

2. Create the table containing the data after the new month

In the same way as the previous table populate with the data that is later than the new month.

CREATE TABLE
Orders_200705
       (
       OrderDate       datetime CHECK (OrderDate >= '20070501' AND OrderDate
                                           IS NOT NULL) ,
       Item_id         INT ,
       Quantity        INT ,
       OrderValue      decimal(19,5)
       )

INSERT Orders_200705 SELECT * FROM Orders WHERE OrderDate >= '20070501'

At this point take the production table off-line for the swap.

3. Swap out the last month from the partitioned table

For this you will need an empty table to swap the data into. It is tempting again to use a partitioned table for this – if so you will need to create a new partition function and scheme as you would not want to lose the data until after splitting the range on the production table.

 CREATE TABLE
Orders_200704_Old
       (
       OrderDate       datetime CHECK (OrderDate >= '20070401'
                                        AND OrderDate IS NOT NULL) ,
       Item_id         INT ,
       Quantity        INT ,
       OrderValue      decimal(19,5)
       )

ALTER TABLE Orders switch PARTITION 5 TO Orders_200704_Old

4 Split the (empty) range in the partitioned table

This is done by adding a new value to the partition function

ALTER PARTITION FUNCTION MyPartitionRange () split RANGE ('20070501')

As the partition is empty this should be quick as it just means creating a new empty partition.

5 Swap in the new months data

ALTER TABLE Orders_200704 switch TO Orders PARTITION 5

6 Swap in the table containing the data after the new month

ALTER TABLE Orders_200705 switch TO Orders PARTITION 6

7 Check the result

SELECT *
FROM sys.partitions
WHERE OBJECT_ID = OBJECT_ID('Orders')

Gives the expected result

partition_id         object_id   index_id  partition_number hobt_id              rows
——————– ———– ——— —————- ——————– —-
72057594038845440    213575799   0         1                72057594038845440    1
72057594038910976    213575799   0         2                72057594038910976    1
72057594038976512    213575799   0         3                72057594038976512    1
72057594039042048    213575799   0         4                72057594039042048    1
72057594039173120    213575799   0         5                72057594039173120    2
72057594039238656    213575799   0         6                72057594039238656    1

Now the table Orders_200704_Old can be dropped at your leisure.

Adding a new partition with a computed partition function

We return to the table partitioned by company and month.

CREATE PARTITION FUNCTION MyPartitionRange (INT)
AS RANGE LEFT FOR VALUES
       (1200701,1200702,1200703,2200701,2200702,2200703,3200701,3200702,3200703)
CREATE PARTITION SCHEME MyPartitionScheme AS
PARTITION MyPartitionRange
ALL TO ([PRIMARY])

CREATE TABLE CompanyOrders
       (
       Company_id      INT , 

OrderDate       datetime ,
       Item_id         INT ,
       Quantity        INT ,
       OrderValue      decimal(19,5) ,
       PartCol AS Company_id * 10000 + CONVERT(VARCHAR(4),OrderDate,112) persisted
       )
ON MyPartitionScheme (PartCol)

In order too add a new month to this table, you will need to split each company’s range for that month. To add a new company partition means splitting all months for that company.

There is no need to add all the partitions in one process, these operations can be performed one partition at a time. This should not affect the results of queries on the table.

The process of adding the new company or month is the same as for the sliding month data, only with many splits and swaps.

Other uses of partitioned tables

Usually, partitioned tables are used to horizontally, or vertically, partition the data. However, the partitions are sometimes used for different purposes – not partitioning the data at all.

I recently came across a reporting system that was querying a single flat table for aggregated results. The table was about 15Gb in size and could be filtered and grouped on any combination of columns. Indexing by date meant that a report for a year would take about 20 minutes – far too long (performance checked just before the release date of course), in fact anything more than 3 months was unacceptable. Normalising and using a view helped with the retrieval of the filter column data, but the actual report was limited by the amount of data it needed to aggregate.

Due to time constraints and policy, we were not allowed to create a cube and the report could not call a stored procedure. Oddly, there was a lot of flexibility in the query used to extract data but it had to access the single table.

The solution was to create a partitioned table. The first partition (partition value = 1) contained the old data table and would still be slow to access.

The second partition contained aggregated data, aggregated by filter columns that kept the table size less than 200K rows. All partitions have the same structure so those filter columns that were excluded were set to null. Accessing this partition was quick enough for any report.

The report application was then changed to check if the filter/grouping columns were all included in the second partition – if so it appended “and PartCol = 2” to the query otherwise it appended “and PartCol = 1”.

The reporting application could warn the users if they were about to do something that would take a long time.

We then checked how the reports were being used, and selected combinations of columns that could be used for other partitions. The partitions were added to the table (not affecting the system) and then, at a later time, the reporting application changed to use the new partitions.

This would have been easier with a stored procedure as the partitions could have been left as separate tables and the stored procedure could choose which to query, but a stored procedure call was not allowed by the reporting application.

Continue reading “Partitioned Tables in SQL Server 2005” »

Thank you very much for visiting this article. In case if you are not on the MSDN blogs then I would request you to please visit my blog at http://blogs.msdn.com/manisblog because at times I improve the existing articles after reading emails from people who enthusiastically provide their feedback. These improvements might not be reflected on the other blog sites who have indexed this article.


Hmm.. what am I going to cover today…. it is something that we all expect from a SQL Server database i.e. performance and I am going to write it in the same fashion that you like most i.e. simplify the contents, make it easy to understand and provide the steps with pictures.

Continue reading “Easy Table Partitions with SQL Server 2008” »

I have received call from my DBA friend who read my article SQL SERVER – 2005 – Introduction to Partitioning. He suggested that I should write simple tutorial about how to horizontal partition database table. Here is simple tutorial which explains how a table can be partitioned. Please read my articleIntroduction to Partitioning before continuing to this article.
Continue reading “SQL SERVER – 2005 – Database Table Partitioning Tutorial – How to Horizontal Partition Database Table” »

Powered by WordPress | Theme: by 85ideas. Editor by Khoanguyen