We're speaking at
CFUnited 2008:
CFUnited - The Premiere ColdFusion Technical Conference

Search

Calendar

SunMonTueWedThuFriSat
    123
45678910
11121314151617
18192021222324
25262728293031

Subscribe Enter your email address to subscribe to this blog. You'll receive an email when we write a new post.

Recent Entries Come On In, Rails-The Water's Warm
Shan's Simple Examples: File uploads with Flex and ColdFusion

Recent Comments Google Calendar API - Creating a new Calendar with ColdFusion
Steve Julian said: When and where are you going to post the finished CFC's ? Thanks [more]

Three Phases of Programmer Development
Pat Branley said: I normally think of those phase 2 people as 'programmers' and the phase 3 people as 'developers'. I... [more]

New Job Title: Front End Engineer
Sean Corfield said: Well, there's always the excellent Fusion Authority Quarterly Journal... [more]

Down To The Wire: HTTP Sniffers
Brian M said: I second the mention of the Charles Web Debugging Proxy that Tariq mentioned. It is fantastic. It s... [more]

New Job Title: Front End Engineer
Patrick said: Heya Sean. Good point. I never understood how they did things over there at SysCon, and I understand... [more]

Archives By Subject Business of Software (4) [RSS]
ColdFusion (318) [RSS]
Conferences (6) [RSS]
Databases (87) [RSS]
Flex & Flash (109) [RSS]
Fusebox (87) [RSS]
General Development (29) [RSS]
Google (9) [RSS]
Hardware (5) [RSS]
JVM & Java (132) [RSS]
Linux (20) [RSS]
Miscellaneous (254) [RSS]
Performance (8) [RSS]
SeeFusion (36) [RSS]
Shan's Simple Examples (7) [RSS]
User Interface (3) [RSS]
Windows (5) [RSS]

Archives By Poster Daryl Banttari (10)
Nat Papovich (29)
Patrick Quinn (36)
Shannon Hicks (22)
Steve Nelson (21)
Tyson Vanek (3)


bottom corner

A Database Independent "Next ID"

A blog entry at: http://www.coldfusionmuse.com/index.cfm/2005/6/10/sqlInterbase reminded me of how people don't fully understand the multithreaded nature of ColdFusion with respect to databases when looking for the next ID value for a table.

People quite often do something like this in a page:

<cfquery name="qNextID" datasource="#request.dsn#">
select max(id) as lastid from sometable
</cfquery>

<cfquery datasource="#request.dsn#">
insert into sometable(id, data)
values (#lastid+1#, '#somedata#')
</cfquery>

However, this will give you duplicate IDs when two pages run concurrently. For example, request A and request B run at the same time:

Request A selects max(id) and gets 24999
Request B selects max(id) and gets 24999
Request A inserts into sometable with id 25000
Request B inserts into sometable with id 25000 and FAILS

Or worse, if there's no unique (eg, primary key) index on sometable.ID, you wind up with two identical ID values in the table.

Wrapping them in a transaction doesn't help. The default isolation level doesn't hold locks for reads, so the result is the same. Using a higher isolation level simply changes the failure mode to a deadlock. Better than a duplicate ID, but still a failure. An additional drawback of this method is that the database has to seek the index on ID every time, which for very large tables is expensive, relative to the following method.

The best cross-database method of implementing a unique ID is to create a table to store them. I usually name the table NextIDs(TableName varchar(50), NextID int)

Then, you start the transaction and do the update statement FIRST:

<cftransaction>

<cfquery datasource="#request.dsn#">
update NextIDs
set nextID = nextID + 1
where tablename = '#tablename#'
</cfquery>

<cfquery name="qNextID" datasource="#request.dsn#">
select NextID
from NextIDs
where tablename = '#tablename#'
</cfquery>

</cftransaction>

<cfquery datasource="#request.dsn#">
insert into tablename (id, data)
values (qNextID.NextID, '#data#')
</cfquery>

You /must/ do the update first, or you'll still have the race condition under the default transaction isolation level; if you set isolation to REPEATABLE_READ or SERIALIZABLE you'll get deadlocks when the race condition occurs.

To summarize:

Select max(id) is never a thread-safe way of implementing a unique identifier. If you want to implement a cross-database method of creating a unique, incrementing identifier, you should create a table for that purpose, and UPDATE before SELECTing from that table (in a transaction) to eliminate race conditions and deadlocks.

Comments
Hi Daryl, I know this is a cross-database post, but is the following okay for SQL Server only?

set nocount on
insert into ...
select @@identity as newid
set nocount off

I've been using that for years on SQL Server...
# Posted By Nat Papovich | 3/1/07 11:18 AM
Do note that this approach will single thread your entire server architecture for insert operations due to lock contention on the row/page of the single row key table. Anytime you issue the update, then select to get the next value, the RDBMS will lock the single row/page of the key table until the end of the transaction. The UPDATE lock is an exclusive lock which will prevent the next key generation call from being able to do its own UPDATE operation. This means you can only generate a single key within a single logical unit of work (i.e. critical section). Although this is indeed thread-safe it is also completely serialized. In other words, if you had an order entry system where 1,000 orders all occurred at the same time, they would all require generation of a primary key value and hence would not occur in parallel but in a purely serial FIFO fashion. There are several table based solutions folks have used including pre-allocating blocks keys into a keystaging table. With row-level locking by having a block of keys in the staging table and not bump into contention issues on the read/delete of the next available key.

Celko also has several great solutions for this issue.
# Posted By Dave Wolf | 3/1/07 11:27 AM
The @@identity trick in SQLServer is also very dubious and can return very erroneous results. Basically @@identity is a global variable, and returns back the last identity value generated. So take the case where you insert a row, and a trigger fires to populate a child table, or other metadata and that table itself has an identity value. @@identity returns the second identity value, the one caused by the insert from the trigger and not from the parent table. Since the child table could have less rows, it could very easily return a value for an already used key. @@identity is very much not thread safe.
# Posted By Dave Wolf | 3/1/07 11:34 AM
"Wrapping them in a transaction doesn't help. "

Would you mind going into more depth on that? I always thought when the docs said cftransaction "groups database queries into a unit," it pretty much meant that would be the case.
# Posted By Sam | 3/1/07 11:44 AM
Nat,

Using @@Identity is fine for SQL Server, so long as you don't have triggers that do inserts into other tables that have Identity columns.

Dave,

Any time you have any variable that you want to update consistently across threads, you MUST single-thread changes to the variable. To do so, you MUST have some sort of semaphore to control access across threads; in this case, it's the update lock. The transaction block where you're single threading is blindingly short, and should be complete in 4ms or less, so concurrency really only becomes a problem if you're doing 250+ inserts a second, in which case using something DBMS-specific is probably a good idea.

In SQL Server, @@Identity is not a "trick", it's a well defined feature, and will only give "dubious" or "very erroneous" results if you either (a) have triggers, or (b) don't do the SELECT @@IDENTITY operation before the next identity insert. One popular mistake is to do "SELECT @@IDENTITY FROM SomeTable", which simply does a Cartesian join from the constant returned by @@IDENTITY, and the rows in SomeTable. So, if SomeTable had 100,000 rows in this case, you'd get 100,000 copies of the @@IDENTITY value. And if your last identity insert was actually for SomeOtherTable, you'd get a completely unexpected value.

Also, @@IDENTITY is not a global variable, it's a connection variable. So it doesn't return the last identity value inserted, it returns the last identity value inserted /on the current connection/. Since ColdFusion pools connections on a per-page basis even if you have connection pooling turned off, it's completely safe from that perspective.

If you have triggers, the scope_identity() function is designed to return the proper identity value, like this:

<code>
<cfquery name="q" datasource="Northwind">
   INSERT INTO SomeTable
   VALUES ('foo')
</cfquery>
<cfquery name="q" datasource="Northwind">
   SELECT SCOPE_IDENTITY() AS ID
</cfquery>
</code>

Sam,

A transaction groups queries into a logical group of work, and ensures that all of the *changes* in the transaction either commit, or roll back. However, a transaction doesn't change the isolation level for reads, which by default is "READ COMMITTED". READ COMMITTED will acquire and hold read locks only long enough to read the row(s), and release the lock once the row is read. The next level, REPEATABLE READ will hold locks on any rows that were actually read until the end of the transaction, and SERIALIZABLE will go so far as to prevent inserts between ranges you selected from. But none of these prevent you from reading a value that another thread/connection has read (since read locks can be made concurrently), so the sort of read consistency you're expecting is never enforced. This is why I do the update first, as that will actually block other threads/connections from reading the value I updated until the end of the transaction. (The isolation level READ COMMITTED won't read the row until the transaction is committed.)

Hope that helps... :-)

--Daryl
# Posted By Daryl Banttari | 3/1/07 12:58 PM
Thanks, that did explain it quite well.
# Posted By Sam | 3/1/07 1:42 PM
Interesting timing on this post, Daryl. I discovered something yesterday that seems to have changed in CF although I'm not sure when. I'm developing an app on 7.0.2 w/ MSSQL and discovered that I could do the insert and select scope_identity() as newID within the same cfquery tag set and it works just fine. I remember having to use nocount on/off in the past (as Nat indicated) to pull this off. I'm not sure when this changed, just that it works now without it and in the same cfquery tag.

So:
[cfquery name="insQuery"]
INSERT INTO SomeTable
(stuff, things)
VALUES
("stuff","things")
SELECT SCOPE_IDENTITY() AS newID
[/cfquery]
will leave you with #insQuery.newID# as your new id.

While it seems to work just fine, I'm also not sure if there's a good reason to *not* do it that way so if anyone knows that to be a bad idea for some reason please enlighten. It's all done within the context of a transaction with isolation="SERIALIZABLE" (as the queryname.newID is used to populate child tables and such and performance/scalability isn't an issue in this case) so it all seems pretty safe.
# Posted By Scott Krebs | 3/1/07 3:18 PM
I wish more databases used sequences like oracle and postgres

http://www.psoug.org/reference/sequences.html
http://www.postgresql.org/docs/8.1/static/sql-crea...

really nice and simple to use
# Posted By zac spitzer | 3/1/07 6:55 PM
Scott,

SQL Server always seemed to support multiple statements in the same query, but the DataDirect drivers used by CF stopped supporting that a few releases ago. (Any query that began with the word "INSERT" was no longer allowed to return a resultset.) I didn't know that they fixed that-- good to hear.

In any case, for future compatibility reasons, I'd recommend using two CFQUERYs for that block of code. Also, SCOPE_IDENTITY() is guaranteed to be correct, so long as it's called before the next identity insert on the same connection. So, you don't need to bother with a cftransaction when using that; the transaction, in your case, only adds extra db call and locking overhead, and may contribute to deadlocking problems in the future.

Zac,

I really wish that SQL had defined a standard for getting generated sequence numbers. But then, they didn't even define how to format dates and times; that's implementation-dependent. (JDBC and ODBC define a standard for formatting dates and times; it's a responsibility of the JDBC or ODBC driver to convert dates and times in queries to a format palatable to the individual DBMS. That's why the helper function in CF is called "CreateODBCDateTime", as opposed to "CreateSQLDateTime".)

--Daryl
# Posted By Daryl Banttari | 3/1/07 10:27 PM
Welcome to our website for you World of Warcraft Gold,Wow Gold,Cheap World of Warcraft Gold,cheap wow gold,buy cheap wow gold,real wow gold,sell wow gold, ...
Here wow gold of 1000 gold at $68.99-$80.99 ,World Of Warcraft Gold,buy wow gold,sell world of warcraft gold(wow gold),buy euro gold wow Cheap wow gold,cheapest wow gold store ... <a href="http://www.33game.cn"><strong>ffxi gil</strong></a> buy euro gold wow wow gold--buy cheap wow gold,sell wow gold.welcome to buy cheap wow gold--cheap, easy, wow gold purchasing.World of Warcraft,wow gold Super ...
We can have your wow gold,buy wow gold,wow gold game,world of warcraft gold, wow Gold Cheap wow, Cheap wow gold,world of warcraft gold deal,Cheap WOW Gold ...

Welcome to our website for you World of Warcraft Gold,Wow Gold,Cheap World of Warcraft Gold,wow gold,buy cheap wow gold,real wow gold,sell wow gold, ...
Here wow gold of 1000 gold at $68.99-$80.99,World Of Warcraft Gold,buy wow gold,sell world of warcraft gold(wow gold),buy gold wow lightninghoof instock Cheap wow gold,cheapest wow gold store ...
<strong>ffxi gil</strong> wow gold--buy cheap wow gold,sell wow gold.welcome to buy cheap wow gold--cheap, easy, wow gold purchasing.World of Warcraft,wow gold Super ...
Wow gold- Gold for buy gold wow lightninghoof instock EU-Server: ...wow Gold EU: starting from 84,99?; 3000 WoW Gold EU: starting from 119,99?. wow Gold- Leveling Services: ...
We can have your wow Gold,buy wow Gold,wow Gold game,wow gold, Cheap wow Gold, Cheap World of Warcraft Gold,world of warcraft gold deal,buy cheap wow gold,Cheap WOW Gold ...

Here wow Gold of 1000 gold at $68.99-$80.99,World Of Warcraft Gold,buy wow Gold,sell world of warcraft gold(wow gold),Cheap wow gold,cheapest World of Warcraft Gold store ...
# Posted By xiaoxinwow | 4/6/07 1:08 AM
hi daryl

thanks for posting this. i wasn't aware of the
limitations that the transactional approach
to the issue has.

as a matter of fact, your explanation as to what
exactly happens in terms of locks vis-a-vis isolation levels made me realize that i'd really love to get my hands on a succinct summary of the technical basis behind database transactions.

specifically, i'm looking for a text that explains
isolation levels, locking mechanisms, performance
considerations and all that... basically all the
stuff you seem to know so well.

can you recommend a book, or a website that would help?

thanks,
franco
# Posted By franco | 10/13/07 4:38 AM

bottom corner