SQL Server Table Design Rules of Thumb

These are not meant to be comprehensive (and don’t get into advanced normalization or intentional denormalization) but will result in a decent table design without too much thought in most situations. A case can be made for natural primary keys but going the surrogate route is easier and works well too.

  1. Always have a primary key. Typically a surrogate key using an identity column and not necessarily clustered.
  2. Almost always have at least one unique constraint if you used a surrogate primary key
    • consider making this/one of these the clustered index as they may be the most common way rows are selected
    • these will frequently save you from bugs/errors that would result in duplicate data
    • name and abbreviation in a state table are a good example — only one state should have the abbreviation TX and only one should have the name Texas
  3. Always include create date and created by columns. If rows can be updated, include updated date, and updated by columns as well.
  4. Consider effective and expiration date columns in place of “active” flags.
  5. Consider a timestamp (not datetime, but timestamp) field if multiple simultaneous edits are possible
  6. Avoid nullable columns – they result in ISNULL/COALESCES — and use defaults instead – EG ‘’ (empty string) for varchar fields, 0 or -1 for ints, etc. A large number of necessarily nullable columns suggest the table may need to be split.
  7. Avoid nullable foreign keys — they result in outer joins
  8. Consider indexes on any foreign keys
  9. Do not store compound values in one column.
  10. If you know in advance how your table will be queried and none of the indexes created from rules 1, 2, or 7 meet requirements, consider creating indexes on those columns if they are selective (have a wide distribution of values).
  11. You can break any rule except number 1 with good reason after very careful consideration

Stored Procedures Versus LINQ to SQL for Dynamic WHERE Clauses

I was recently discussing the best way to solve the “bad plan gets cached for super-complex WHERE clause with a large number of @column IS NULL OR T.Column = @column conjuncts” problem with colleagues at work. This is another variant of the age-old “stored procedures versus code-generated SQL” (including ORMs here) debate.

I consider myself a T-SQL expert but I am also a firm believer in the “right tool for the right job.” LINQ to SQL (or even old-fashioned, manually-generated SQL) is clearly superior to stored procedures for UI-driven, single SELECT queries with large numbers of WHERE and/or ORDER BY options. In fact, I favor code-based solutions over stored procedures unless performance dictates otherwise (which, in practice, is not very often). But for the specific case I’m discussing here, I don’t think it’s even arguable that the stored procedure is the right way to go.

To make this point, I’m going to provide examples of all 4 scenarios and compare them. The base query is simplified to make the examples easier to follow. Obviously, this problem only becomes intractable using T-SQL when you have more than 3 search criteria, two tables, and two sort options. If your requirements were very stable (yeah, right) and that’s all the variability you had to deal with, you could write (3 x 2 =) 6 different “standard” stored procedures and call the appropriate one from code based on the mix of non-null parameters passed (or embed them all in one procedure within a bunch of IF statements). I’ve been the unfortunate victim (read “maintainer”) of this type of code and I still curse the person that left me that procedure to this day. Requirements aren’t usually that stable or by the time they are, you should have solved the problem another way already.

But I digress. Here’s the “standard” stored procedure in all it’s glory:

CREATE PROCEDURE StandardWhere 
	@LastName varchar(50) = null, 
	@FirstName varchar(50) = null,
	@PostingDt datetime = null,
	@OrderBy varchar(50) = null
AS
BEGIN
	SELECT T.AccountNoId, A.LastName, A.FirstName, T.PostingDt, T.BillingAmount
	FROM AccountTransaction T (nolock)
	JOIN Account A (nolock) ON A.AccountNoId = T.AccountNoId
	WHERE (@LastName IS NULL OR A.LastName LIKE @LastName + '%')
	AND (@FirstName IS NULL OR A.FirstName LIKE @FirstName + '%')
	AND (@PostingDt IS NULL OR T.PostingDt = @PostingDt)
	ORDER BY CASE 
		WHEN @OrderBy = "LastName" THEN A.LastName
		WHEN @OrderBy = "FirstName" THEN A.FirstName
		ELSE T.PostingDt
	END
END

The beauty of the standard stored procedure is that you get compile-time syntax checking and the stored procedure is only parsed once. But…you also get *very* bad query plans if the first call to the procedure includes actual parameters that aren’t representative of the way the procedure is usually called.

“Standard” stored procedures potentially have four advantages: minimizing round trips, reducing network traffic (even for a single call), compile-time syntax checking, and single parse. Only the first two advantages are maintained when you introduce dynamic SQL (or use WITH RECOMPILE which often fails to solve the problem anyway). If you only have one statement, round trips are no longer an issue and, with modern networks, the the fact that the entire SQL statement has to sent over the wire versus just the stored procedure name and the actual parameters is probably undetectable in all but the highest volume situations.

Here’s the dynamic SQL version of the stored procedure (note that I’m not going to even bother discussing the use of EXECUTE which completely sucks and should be avoided at all costs):

CREATE PROCEDURE DynamicWhere 
	@LastName varchar(50) = null,
	@FirstName varchar(50) = null,
	@PostingDt datetime = null,
	@Orderby varchar(50) = null
AS
BEGIN
	DECLARE @where nvarchar(max)
	SELECT @where = '1 = 1'

	IF @LastName IS NOT NULL
		SELECT @Where = @Where + " AND A.LastName LIKE @LastName + '%'"

	IF @FirstName IS NOT NULL
		SELECT @Where = @Where + " AND A.FirstName LIKE @FirstName + '%'"

	IF @PostingDt IS NOT NULL
		SELECT @Where = @Where + " AND T.PostingDt = @PostingDt"

	DECLARE @orderBySql nvarchar(max)
	SELECT @orderBySql = CASE
		WHEN @OrderBy = "LastName" THEN "A.LastName"
		WHEN @OrderBy = "FirstName" THEN "A.FirstName"
		ELSE "T.PostingDt"
	END

	DECLARE @sql nvarchar(max)
	SELECT @sql = "
	SELECT T.AccountNoId, A.LastName, A.FirstName, T.PostingDt, T.BillingAmount
	FROM AccountTransaction T (nolock)
	JOIN Account A (nolock) ON A.AccountNoId = T.AccountNoId
	WHERE " + @where + " 
	ORDER BY " + @orderBySql

	exec sp_executesql @sql,  N'@LastName varchar(50), @FirstName varchar(50), @PostingDt datetime', 
		@LastName, @FirstName, @PostingDt
END

Even though I’ve formatted this too look nice, if you’ve ever worked with procedures like this you realize they really suck to maintain and debug. All of the important SQL is actually in string variables and doesn’t get parsed and compiled until run-time. So when you compile this stored procedure, the only thing you know is that your T-SQL to generate the SQL that’s actually going to run at run-time is valid. This is exactly the same problem you have with SQL dynamically generated on the client. If I’m doing standard programming work like fancy string manipulation, I’d much rather do it in C# or Java than T-SQL. Particularly since I really need to unit test the hell out of this to make sure valid SQL is generated for all the different combinations of parameters. Oh, and now I get Intellisense and Resharper, at least for the SQL generating part of code. If I’m using LINQ, I get them for everything, including the query itself! But I’m getting ahead of myself.

Before we move on to manually generated dyamic SQL on the client, I want to address plan-caching and sql-injection because stored procedure zealots often bring these up. It turns out that neither of these are an issue with properly parameterized dynamic SQL, whether it’s generated in a stored procedure or off-server in C#, Java, or any other modern language. So don’t let these two canards deter you.

Now let’s see the manually-generated in C# version which is just a translation of the T-SQL code above into C#. For demo purposes, the function is self-contained but you could (and probably should) have a base class with common functions (like handling the ugly DbNull.Value).

public DataTable GetTransactions(string lastName, string firstName, DateTime? postingDt, string orderBy)
{
	string sql = "SELECT T.AccountNoId, A.LastName, A.FirstName, T.PostingDt, T.BillingAmount " 
		+ "FROM AccountTransaction T (nolock) " 
		+ "JOIN Account A (nolock) ON A.AccountNoId = T.AccountNoId " 
		+ "WHERE 1 = 1 ";
	if (lastName != null) sql += " AND A.LastName LIKE @LastName + '%' ";
	if (firstName != null) sql += " AND A.FirstName LIKE @FirstName + '%' ";
	if (postingDt != null) sql += " AND T.PostingDt = @PostingDt ";
	sql += "ORDER BY ";
	sql += orderBy == "LastName"
		? "A.LastName "
		: orderBy == "FirstName"
			? "A.FirstName "
			: "T.PostingDt ";
	DataTable ret;
	using ( var cn = new SqlConnection(_connStr) )
	{
		var cmd = new SqlCommand(sql, cn);
		cmd.Parameters.AddWithValue("@LastName", (object) lastName ?? DBNull.Value);
		cmd.Parameters.AddWithValue("@FirstName", (object) firstName ?? DBNull.Value);
		cmd.Parameters.AddWithValue("@PostingDt", (object) postingDt ?? DBNull.Value);
		var adapter = new SqlDataAdapter(cmd);
		ret = new DataTable();
		adapter.Fill(ret);
	}
	return ret;
}

Like I said above, if I’m not getting any advantages from working in T-SQL, I’d rather work in C#. But, from a language-neutral perspective, the code above is not clearly easier to maintain and debug than the corresponding T-SQL.

Here’s essentially the same code using LINQ to SQL and a couple of extension methods:

using (new TransactionScope(TransactionScopeOption.Required,
	new TransactionOptions { IsolationLevel = IsolationLevel.ReadUncommitted }))
{
	var query =
		from t in Context.AccountTransaction
		join a in Context.Account on t.AccountNoId equals a.AccountNoId
		select new { t.AccountNoId, a.LastName, a.FirstName, t.Postingdt, t.Billingamount };
	query = query.AddWhere(lastName, t => t.LastName.StartsWith(lastName));
	query = query.AddWhere(firstName, t => t.FirstName.StartsWith(firstName));
	query = query.AddWhere(postingDt, t => t.Postingdt == postingDt);
	query = query.OrderBy(orderBy);
	query.ToList();
}

I mentioned the benefits of Intellisense above. Since this is all “code”, I get to access the table and field names using Intellisense (and likewise, they are checked by the compiler/Resharper). This is a huge advantage in my book.

I prefer to use known (as opposed to anonymous) types but I wanted to keep the sample code as self-contained as possible. I end up converting anything in a DataTable to a generic List of T anyway so the fact the LINQ returns objects is a big bonus for me. If you like/want DataTables (you’re a massochist?), you could write an effecient reflection-based conversion routine.

The SQL generated by LINQ to SQL is quite good and you can see what it’s generating very easily. Once you understand how LINQ implements OUTER JOINS, it’s really quite easy to start thinking in LINQ instead of SQL. If you are persuaded by this article to give LINQ a try, I highly recommend LINQPad which provides an iterative and interactive way to develop LINQ queries similar to how I develop SQL queries in SSMS.

Posted in c#, LINQ, SQL. 3 Comments »

Check for Temp Table Existence in SQL Server

IF OBJECT_ID(‘tempdb..#sometable’) IS NOT NULL
DROP TABLE #sometable

Expressions versus Delegates in LINQ to SQL Performance

Recently, I was working on a class that stored a delegate and later used that delegate to retreive a LINQ to SQL object from either the database or an EntitySet, depending on the situation. Unfortunately, the performance for the first case was terrible — I believe it was returning the entire table and then applying the delegate to each row in memory.

Here’s the relevant code (original version):

public abstract Func<D, bool> GetDbKeyQueryFunction(E entity);
//...
D dbEntity;
if ( isRoot ) {
	dbEntity = context.GetTable<D>().SingleOrDefault(GetDbKeyQueryFunction(entity));
} else {
	dbEntity = DbParentSet.Invoke(dbParent).SingleOrDefault(GetDbKeyQueryFunction(entity));
}

Here’s a typical implementation of GetDbKeyQueryFunction:

public override Func<SomeTable, bool> GetDbKeyQueryFunction(SomeEntity entity) {
	return dbEntity => dbEntity.Id == entity.Id;
}

Now before I created this abstract class, I had specific code that used a (seemingly) identical lambda expression to retrieve a row from the database quite quickly:

int id = 11;
dataContext.SomeTable.SingleOrDefault(dbEntity => dbEntity.Id == id));

When I hovered over SingleOrDefault in the IDE, I noticed that there were two overloads, one that took Expression<Func> and another that took Func. On the surface, these don’t seem much different. In fact, the reason I ended up using Func in my abstract class was because Expression is abstract and I couldn’t just put new Expression() around my lambda expession.

In practice, the difference is huge, as I mentioned above regarding the performance. It appears that LINQ to SQL can optimize expressions but cannot optimize delegates. The tricky part is all I had to do was change:

public abstract Func<D, bool> GetDbKeyQueryFunction(E entity);

to:

public abstract Expression<Func<D, bool>> GetDbKeyQueryFunction(E entity);

Surprisingly (to me, anyway), you don’t have to do anything in the implementing code other than modify the method signature to match. The body of the method remains unchanged, but like magic, the compiler is now creating an Expression instead of a Delegate for us.

Performance for the first case (querying the database) was back up to par. However, there was another minor hurdle to overcome.

public abstract Func<D, bool> GetDbKeyQueryFunction(E entity);
//...
D dbEntity;
if ( isRoot ) {
	dbEntity = context.GetTable<D>().SingleOrDefault(GetDbKeyQueryFunction(entity));
} else {
	dbEntity = DbParentSet.Invoke(dbParent).SingleOrDefault(GetDbKeyQueryFunction(entity));
}

In the code above, DbParentSet.Invoke(dbParent) is returning an EntitySet instance. Interestingly enough, you cannot directly apply an Expression to an EntitySet, as I found documented here. However, you can compile an Expression, and in my scenario, the type of the value returned by Compile just happened to be Func (it all seems so obvious now, hehe). So here’s the final version of the code that works well for both scenarioes:

public abstract Expression<Func<D, bool>> GetDbKeyQueryFunction(E entity);
//...
D dbEntity;
if ( isRoot ) {
	dbEntity = context.GetTable().SingleOrDefault(GetDbKeyQueryFunction(entity));
} else {
	dbEntity = DbParentSet.Invoke(dbParent).SingleOrDefault(GetDbKeyQueryFunction(entity).Compile());
}

So remember, always favor Expressions over Delegates when using LINQ. If you have to have a Delegate, you can always use Compile to turn your Expression into a Delegate as needed. But you can’t turn a Delegate into an Expression (that I’m aware of).

Posted in c#, LINQ, SQL. Tags: , , . 3 Comments »

Embedded Resource Queries or How to Manage SQL Code in Your .NET Projects

I developed a technique for managing SQL code in my .NET projects using the Embedded Resource “Build Action” back in the .NET 1.0 days that I still use to this day. I’ve been meaning to share this with the world for several years now and I’m finally getting around to it.

The benefits of this technique are many. Your SQL queries are contained in individually-editable, properly-formatted files in your .NET project instead of constructed as nasty concatenated strings in your code.

However, at run-time, you don’t have to worry about distributing extra files with your application, the SQL text is an embedded resource in your executable or class libary. You get the best of both worlds. You have source control diffs available but no additional distribution burden.

Before I go further, let me address the issue of stored procedures. Stored procedures have their place. In the past, I’ve maintained 2000+ line stored procedures. I’m ashamed to admit that I’ve even written some 2000+ line stored procedures back in the heyday of client server. Stored procedures (not 2000+ line long ones, surely) have their place and can be a great way to minimize network round trips. However, for purely select-oriented SQL, I think stored procedures are an unecessary administrative burden on most .NET projects. By using this technique, you never have to worry about your .net code being out of synch with the stored procedure.

Modern database servers cache query plans equally well for parameterized ad hoc queries as they do for stored procedures. As a result, unless you are doing some strange things, you will not find a measurable performance difference between using a stored procedure and the embedded resource technique I am about to demonstrate for select-oriented SQL.

Finally, let me say that if you work in a change management environment where it is easier to recompile stored procedures than it is to push out DLLs, then this technique may not be for you. Also, if you expect your query to be re-used as is by many different types of non .NET clients (for example, an excel spreadsheet or other stored procedures) then you may want to stick with a stored procedure. Keep in mind that you can easily re-use SQL stored as embedded resources in .NET DLLs among different .NET projects.

Many times I find myself in a query tool such as Query Analyzer, SSMS Query Editor, or Oracle SQL Developer iteratively developing a SQL query that returns a single recordset of results. Once I’ve got the query working reasonably, I create a new text object in under a folder named SqlQueries in Visual Studio in my project, rename it to something.sql and paste in the SQL from the query tool unaltered. I then comment out the section at the top of the query that declares the parameter variables (if I’m using SQL server). Notice that I don’t delete them. This allows me to easily run the query in a query tool if I need to refine or debug the query.

Once the file is in the project, you have to set the build action to embedded resource as shown below:

Setting Build Action to Embedded Resource

We need a class that helps us conveniently access our embedded resources. I’ve always used a class with static methods like this one:

public class EmbeddedResource {
	private EmbeddedResource() {
	}

	public static StreamReader GetStream(System.Reflection.Assembly assembly, string name) {
		foreach (string resName in assembly.GetManifestResourceNames()) {
			if (resName.EndsWith(name)) {
				return new System.IO.StreamReader(assembly.GetManifestResourceStream(resName));
			}
		}
		return null;
	}

	public static string GetString(System.Reflection.Assembly assembly, string name) {
		System.IO.StreamReader sr = EmbeddedResource.GetStream(assembly, name);
		string data = sr.ReadToEnd();
		sr.Close();
		return data;
	}

	public static string GetString(string name)
	{
		return EmbeddedResource.GetString(typeof(EmbeddedResource).Assembly, name);
	}
}

If I wanted to retreive some embedded sql in a file named SelectOrderWithExpensiveItemInfo.sql under a project directory directory named SqlQueries as a string to a variable named sql I would do this:

string sql = EmbeddedResource.GetString("SqlQueries.SelectOrderWithExpensiveItemInfo.sql");
SqlCommand cmd = new SqlCommand(sql);

You may be wondering about the for each loop inside EmbeddedResource.GetStream. Initially, I specified the whole resource name which included what I assumed was the assembly name at the beginning of the resource specifier. Unfortunately, the default namespace from project properties is what is actually used (at least when I first wrote this way back when). I couldn’t find a reliable way to retreive the default namespace so I added the search logic. Also keep in mind that you can only use GetString without passing the assembly in if your code is executing in the same assembly in which the embedded resource is housed.

That’s all there is to it. Here’s a fully-functional demo solution showing this technique in action using the AdventureWorks database. Enjoy. I hope that you find embedded resource SQL queries as useful as I have.

Cross Joining a Table Created on the Fly Using Union All

Today I was working on a report to reconcile data between two systems that will be used to validate parallel testing as we work to move the new system into production. I mention this to provide you with some context around the two techniques I employed in one of the SQL queries I wrote that is the basis of this post. Sometimes it’s a very fine line between being clever and being obtuse, especially when you are writing SQL. Since this report will become obsolete once the new system goes to production and I will most likely be the only person that has to maintain it for a short period of time, I look at these techiques as more clever than obtuse in these circumstances.

Creating and dropping temp tables in batches (not stored procs) can be a hassle in SQL Server T-SQL. In Oracle, you only have global temporary tables and you have to deal with the select limitations of PL/SQL. Sometimes, you need a table but you don’t want to explicitly create it or you can’t because you are querying another system over which you have no control. You can create the table “on the fly” using a temporary view and the UNION ALL clause. For example, I frequently need to transform a block of time into hourly blocks and I don’t have a permanent table available and I don’t want to deal with temp tables. I can create the table I need on the fly like this:

(SELECT Hour = 0 UNION ALL SELECT Hour = 1 UNION ALL ... SELECT HOUR = 23) H

In Oracle, you would have to specify the FROM clause (good ol’ DUAL). You also can’t use the = construct but you get the idea. In my experience the performance of this “table on the fly” is pretty good. Obviously, if you are working in a stored proc, you can create a temp table in T-SQL or a TABLE varable in PL/SQL to achieve the same result. This technique is nice when you can’t use a stored procedure. It’s also useful if you only need a few rows once (say four or less). With less than four rows, it’s easier to create the table using a temporary view of UNION ALLs than it is to create the temp table or table variable.

Generally, you want to avoid CROSS JOINs (aka the dreaded Cartesian Product). However, there are many legitimate uses of CROSS JOIN. First, let me state that CROSS JOIN provides one of the most compelling arguments for using the ANSI-92 join syntax. If you see “CROSS JOIN” in a query, you can be fairly certain that the author of the query wanted a Cartesian product. If you see a FROM clause with a table that has no conditions specified in the WHERE clause, how certain can you be that the author intended a Cartesian product?

I digress. In the query I was developing today I was comparing schedules in two systems. In one of the systems, the schedule only included one side of a pair of transactions that offset each other. In the other system, both rows were there. I didn’t want the missing row from the one system to show up as a variance. Luckily, the existing query was already using a UNION to pick up this special case. I used a CROSS JOIN of a “table on the fly” to create the other row for me:

...
UNION (
SELECT
CASE WHEN D.Direction = 1 THEN 'BUY' ELSE 'SELL' END TransactionType,
A.Volume * D.Direction Volume
FROM OneSided A
CROSS JOIN (SELECT Direction = 1 UNION ALL Direction = -1) D
)
...

As you can see, I created the second row that I needed to make this system’s schedule match the other system’s schedule with just one line of SQL and minor modifications to the SELECT clause.