Data Types: Some new data types including lots of date/time types (probably the ones that were in the 2005 CTPs), File Stream (for files that are actually stored on disk) but the most interesting one was the Hierarchy Id. This allows the developer to construct real hierarchy relationships and ask questions such as who is the ancestor of x. However it does need to be manipulated via a whole set of related functions. The seminar also spent far too long on spatial types, ok map data is all the rage but who cares about this? Ever since SQL 2005 I’ve been worried about the potential explosion of CLR types in the database, and this just seems to meet my worst fears. I think you can smell a CLR type because of the extra functions, I really wonder how well (or not) they’ll perform.
Semi-Structured data: Interesting change, where you can add sparse columns to a table. The idea is that you can create many columns (wide table) to cover many attributes without overloading the table. This could be useful for storing derived types in the database. The classic example would be Animal classification. Not every animal has the same attributes but you can put the whole range in the table schema when in reality only a few will be used on any one row. I’m not sure how the Codd brigade will take to this!
Filtered Index: Didn’t really cover this but it seems that you can create an index that only includes data that meets a criteria. For example you could create a dog index for the animal table that would only contain data that matches the "dog" value.
Integration Services: The use of parallel execution and the inclusion of ADO.NET sources.
Tracking Changes: Oh yes this stuff could be great, but will it perform well? Two types; Change Tracking – aimed at synchronisation activities, conflict resolution, etc. Change Data Capture – aimed at replaying changes to a different database.
Merge: This looked very powerful, the ability to use one statement to merge data from one table to another. This seems to avoid the age old SQL problem of knowing if you need to insert a new row or update and existing one, with Merge you can do it all with one statement. Looks suspiciously like an XQuery to me, again the spectre of performance rises.
My colleagues were investigating a deadlock issue with a couple of long-ish transactions. The basic activity is:
TXA -> 1) Inserts Row X -> 2) Join using X 3) Complete
TXB -> 1) Inserts Row Y -> 2) Join using Y 3) Complete
The deadlock was occurring because (greatly simplified):
TXA -> 1) Exclusive key locks (on X) -> 2) Range Locks (X+ others)
TXB -> 2) Exclusive key locks (on Y) !Deadlock
So the problem seemed to be that the optimizer has asked for more rows to be locked in (2) than was necessary. Examining the query it was using a table var and the var usually only had one row in it, so why did it think it needed to take out such a wide lock? Drawing on my experience from OPENXML I suggested we try putting PRIMARY KEY on the table var. Previously I’d not done this because I was concerned about the overhead of creating a key for table var that would usually only have one row in it. However, the effect was dramatic. With the primary key in place the optimizer now seemed to realize that the table var would now only have key data in it, and therefore should be able to take out individual key locks rather than range locks. The result was that the deadlocks disappeared. The is obviously great new since this represents a significant boost in concurrency. I do have a warning note, should the number of rows inserted into the temp var grow to…well I don’t know and wouldn’t like to 2nd guess, the optimizer may at some point switch back to range locks. I hope not and by that time I hope to be exclusively on SQL 2005+ and can supply my own plans if necessary.
The (ahem) key point here is that don’t assume the query optimizer will get it right each time, especially on "tables" with no statistics, you need to give it as much information as you can.
SqlCommandBuilder.DeriveParameters is a helper function that returns the parameters used by a stored procedure, a bit like the old ADO Command.Refresh. However, there is a problem with it. Consider the following stored procedure…
Create Proc Bob.MyProcTest(@Name nvarchar(100))
select ‘hello’ + @Name
Create Proc Fred.MyProcTest(@Name nvarchar(100))
select ‘hello’ + @Name
So we’ve got the same procedure differing only by the owner. When you ask SqlCommandBuilder.DeriveParameters to fetch the parameters it incorrectly returns 4 parameters rather than two (remember that the return value is counted as a parameter). It’s fairly obvious why but it shouldn’t do it and can, as I’ve found to my cost, easily cause problems.
[Edit] I’ve since been informed that if you prefix the stored procedure name with the owner it works fine. It does, and that’s certainly better than nothing, but I still believe returning an appended result could never be the correct the answer. SqlCommand.DeriveParameters is there to aid in constructing calling code so I believe it should always make useful answers, so if the params match it should only return one, if they’re is more than one possible set it should raise an exception and ask you to be more specific.