Execution Plan: Experiment -- How transactions are related to locking?

In one query, I select a field, wait for 2 secs, decrement the field and select it once again to see the result.

BEGIN TRAN
DECLARE @.field, @.after_wait int
SELECT @.field = field FROM users WHERE nick_name='user1'
WAITFOR DELAY '0:0:5'
SELECT @.after_wait = field FROM users WHERE nick_name='user1'
UPDATE users SET field = @.field - 5 WHERE nick_name='user1'
SELECT @.field as 'old val', @.after_wait as 'after wait', field as 'new val FROM users WHERE nick_name='user1'
COMMIT TRAN

Starting this transaction in one process, I concurrently update the field:

UPDATE users SET field =1000 WHERE nick_name='user1'
PRINT @.@.ERROR

Here is the result of the first transaction:
old val after wait new credit
90,00 1000,00 85,00

You see, the 2nd update interferes into transaction. It updates the selected field before transaction updates the field. Suppose, the 2nd request is user adding funds on their account. The money just disappear! Wrapping the 2nd request by transaction does not change the result. It seems that the updates, as opposed to selects, do lock:

SELECT @.field = field FROM users WHERE nick_name='user1'
UPDATE field = @.field FROM users WHERE nick_name='user1' -- lock the field
WAITFOR DELAY '0:0:5'
...

Doing another update while the 1st transaction is waiting, fails now. But again, another party may interfere between the select and update, since it is no atomic (which violates the notion of transaction). Should I specify some locking flags on select operation?

There are transaction locking options I have discovered:
1. TRANSACTION ISOLATION LEVEL REPEATABLE READ; and
2. SELECT WITH (HOLDLOCK, ROWLOCK).
These deadlock-fail the update in the 2nd request. Failing is better than loosing data/money. I would like to know 1) what is the differences between these locking approaches; and 2) what is the cause of the deadlock.

I think locking of only one field of the user record suffices. Other user fields could be left free for concurrent updates.

Thank you for any advices.

A select will take a shared lock and then drop it by default whether or not it is in a transaction.

The transaction you have opened will have no effect until the update statement at which point it will lock the row and hold that lock until the commit.

>> It updates the selected field before transaction updates the field. Suppose, the 2nd request is user adding funds on their account. The money just disappear!

Only if the people who wrote the application don't know what they are doing.

UPDATE users SET field = field - 5 WHERE nick_name='user1' solves the problem as it is guaranteed to be atomic - no need for an explicit transaction.

if you want to return the value

UPDATE users SET field = field - 5, @.field = field - 5 WHERE nick_name='user1' solves the problem as it is guaranteed to be atomic - no need for an explicit transaction.

(in v2005 you can return a resultset from the statement so no need for the variable).

I have seen a system written relying on the transaction holding a lock on a select. They wouldn't believe me when I said it doesn't - even when I demonstrated it.

"I read it in this book" - ok show me - "um, can't find it but it's there".

|||

>> It updates the selected field before transaction

>> updates the field. Suppose, the 2nd request is

>> user adding funds on their account. The money

>> just disappear!

> Only if the people who wrote the application don't

> know what they are doing.

I do not know/imagine any multi-threading (multiuser, distributed) application which does not share data structures between threads. Should I conclude that the programmers who try to protect shared data access are insane?

> I have seen a system written relying on the

> UPDATE users SET field = field - 5 WHERE nick_name='user1' solves the problem as it is guaranteed to be atomic - no need for an explicit transaction.

Thanks for the code. But I'm interested in checking the field (user money) before transferring them. Therefore, select must be done first. And the update must be atomic with it.

> transaction holding a lock on a select. They

> wouldn't believe me when I said it doesn't -

> even when I demonstrated it. "I read it in this

> book" - ok show me - "um, can't find it but it's

> there".

This is typical answer I occasionally hear myself. They claim that the system will do that automatically for us. People prefer redundant design and pay no attention to the concurrent access. They have to make money rather than studying proper design and preventing integrity issues.

|||Before I was not aware of the self-update:
UPDATE users SET field = func(field) WHERE id = @.user_id

I was looking for it conceiving it to be more efficient for certain reasons over the explicit select:
UPDATE users SET field = funct (SELECT field WHERE id = @.user_id) WHERE id = @.user_id

The advantages are:
1. eliminated redundancy (you are precluded from selecting one field and updating another). The other advantages usually accompany any redundancy-free solutions.
2. Performance advantage. You look up for a record only once and then you update the record "in the field". You are not only precluded from unintentionally affecting irrelevant data but also modify what is needed faster.
3. Improved readability due to reduced code bloat in case of no redundancy.

If, in addition, the single update operation is atomic then we have another benefit. I have used in many times. But I suppose, perhaps due to my low skills, that it is not panacea. Nobody responded with an example showing how to make a conditional self-update (when user has money on their account, for instance).

Since you claim that self-update is atomic in locking sense, it should be equivalent to:
UPDATE t1 SET field = (SELECT field FROM t1 WITH (HOLDLOCK, ROWLOCK) WHERE id = @.id) + @.delta WHERE id = @.id

According to MSDN the 'holdlock' is equivalent to doing select and update in a single SERIALIZED transaction.

What I would like to know about automatic locking/atomicy of UPDATE operation. If it is really atomic, the update triggers must be wrapped into a SERIALIZED transaction rather thatn default 'read uncommitted'. Perhaps, the best solution in this my example would be to move the content of stored procedure (which is in fact an add_record kind) into a corresponding insert trigger? The trigger would check the user funds automatically and fulfill the related manipulations?

I still look forward to getting feedback on the deadlock issue.
|||

>> Thanks for the code. But I'm interested in checking the field (user money) before transferring them. Therefore, select must be done first. And the update must be atomic with it.

Are you sure you need to do that?

Usually you would do the update - conditional if you were worried about going negative then check the result.

update account set bal = bal - @.val, @.done = 1 where userid = @.userid and bal > @.val

then you could check @.done to see if it had been carried out

I can't see any purpose for selecting the balance before doing the update.

if you want to do a select first try

begin tran

select bal from account with (updlock, holdlock) where userid = @.userid

update account set bal = bal - @.val where userid = @.userid

commit tran

as I say there is little point in this.

I've had people try to say it's necessary but can usually just change the design a bit to get rid of it.

Yes an update statement is atomic. All triggers will be run in the same transaction created by the update statement.

Still think you want a single statement.

|||

As to the deadlock:

Your select has taken a shared lock on the row and held it.

The update has takes an intent lock - which it can as it is compatible with the shared lock.

It now tries to convert to an exclusive lock for the update but can't due to the shared lock held by the other spid.

Now the other spid tries to get an exclusive lock on the row to do the update but can't due to the intent lock by the other spid.

So neither spid can progress due to the other and we have a deadlock.

The terminology might not be correct - can't be bothered to look up the names for the locks but that is probably what's happening.

|||Thank you for the great explanations. But which error do I report to application in the case of the update has failed? When I select and check the balance explicitly I know what precludes the operation from fulfilling. There can be other reasons for not updating the user with insufficient funds (no such user, for instance). Should I make another select to ensure the cause of not updating were insufficient funds? What if this posterior select shows that the user's balance is ok? Should I report to application then that the operation failed for unknown reason?
|||

I can't imagine a situation where there has not been a previous SELECT to display the Client's account information.

It's on the screen, now the user (clerk, ATM, etc.) wants to hand over money and deincrement the account. At this point, the Client Account is verified so there will not be a case of 'no such user' -that has previously been resolved. Even if it is not on the screen, and the result of a Card reader action, Verification has occurred before any futher activity occurs.

Then, with a single atomic UPDATE statement as Nigel explained, with the application capturing the @.@.ROWCOUNT will verify that a row was or was not updated. If it was not updated, then the WHERE criteria failed (Balance > 0, etc.). So if @.@.ROWCOUNT = 0 then the application can take whatever action is appropriate, included denying the transaction with the Client.

Of course, there is

SET TRANSACTION ISOLATION LEVEL SERIALIZIBLE

From BOL:

SERIALIZABLE

Places a range lock on the data set, preventing other users from updating or inserting rows into the data set until the transaction is complete. This is the most restrictive of the four isolation levels. Because concurrency is lower, use this option only when necessary. This option has the same effect as setting HOLDLOCK on all tables in all SELECT statements in a transaction.

And as Nigel said earlier, it is possible for other users to read into an active TRANSACTION, just not change data.

But it really doens't seem like your needs are quite so draconian... I would go with the single statement, and handle the @.@.ROWCOUNT appropriately.

|||> I can't imagine a situation where there has not been a
> previous SELECT to display the Client's account information.

Which is a copy of real data and can become incoherent at any time. The transaction, therefore, should check the user state. It also should lock the read fields to prevent incoherency during the transaction itself.

> If it was not updated, then the WHERE criteria failed (Balance > 0, etc.).

... or such user does not exist or some another (trigger generated, for instance) error. Why do you think the (bal > val) is the only failure criterion?

|||

When I am concerned about data 'coherency' while a TRANSACTION is in play, I use TRANSACTION ISOLATION LEVEL REPEATABLE READ (or) SERIALIZIBLE -depending the need to control inserts as well as updates. With these two levels, there will be no changes to the queried data between a read and update. The row doesn't 'disappear', change, morph, etc. What was read is stable until the TRANSACTION is complete. But while that works for THE TRANSACTION -it may not work for another user in another SPID...

Perhaps this will illustrate:

In one Connection, execute this code

Code Snippet

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRANSACTION

PRINT 'TRANSACTION IS Active'
SELECT
EmployeeID,
LastName,
FirstName
FROM Employees
WHERE EmployeeID = 1

PRINT 'Waiting...'
WAITFOR DELAY '00:00:15'

PRINT 'UPDATE to Active TRANSACTION'
UPDATE Employees
SET LastName = 'Davolio-Jones'
WHERE EmployeeID = 1

COMMIT TRANSACTION

PRINT 'Waiting for Other Process to Complete'
PRINT ' If it was just BLOCKED, then it will complete'
WAITFOR DELAY '00:00:05'

PRINT 'CHECK Results of TRANSACTION'
SELECT
EmployeeID,
LastName,
FirstName
FROM Employees
WHERE EmployeeID = 1

And in a separate connection, execute this code as soon as the above begins running

Code Snippet

PRINT 'This is NOT a TRANSACTION'
BEGIN
PRINT 'Read Into an Active Transaction'
SELECT
EmployeeID,
LastName,
FirstName
FROM Employees
WHERE EmployeeID = 1

PRINT 'Attempt to UPDATE into Active TRANASCTION'
UPDATE Employees
SET FirstName = 'Mildred'
WHERE EmployeeID = 1
END

GO
PRINT 'BATCH above was terminated by DEADLOCK'

PRINT 'Data State resulting from competing TRANSACTION'
SELECT
EmployeeID,
LastName,
FirstName
FROM Employees
WHERE EmployeeID = 1

While the second connection was able to read into the TRANSACTION, it was not allowed to alter the data since the TRANSACTION held LOCKS as soon as the SELECT QUERY executed. And since the two processes were competing for the same data, a DEADLOCK condition was created. Hopefully, the application would then appropriately handle a DEADLOCK (@.@.ERROR = 1205).

Typically, I recommend adding a TIMESTAMP datatype column to each table, the initial SELECT statement collects that value, and the UPDATE statement uses that value in the WHERE clause. IF it has changed, the underlying data was changed. So in many 'simple' situations, it is not necessary to even use a TRANSACTION. However, TRANSACTIONs are there for a reason, and when it is the best tool (because the set of activities are complex, or changing multiple tables, or somehow not distilled down to a simple UPDATE statement) -TRANSACTIONS are the tool to use.

And as your research clearly demonstrates, one 'should' not assume that a TRANSACTION offers some form of 'protection' without testing, testing, testing -and understanding the limitations ...

Now to the meat of your question: How can we prevent data from being read by other user processes while our process is considering that data -especially since the other user process can simply add [ WITH (NOLOCK) ] to their query to avoid all lock hints? Some folks just add something like 'RowLock' and 'LockTime' columns, and before starting an activity, they set that RowLock to the UserID and the Locktime to the current time. But again, other users can circumvent a well thought out process if the programming is sloppy. It really comes down to having a well thought out approach to serving data, and a consistent plan to manage concurrency. When there are deviations from the plan -nothing is guaranteed.

You might find Tony Rogersons' blog informative (he has several entries on locking issues) .

Also, for another, more tightly controlled approach, see my reply to your post in the 'Getting Started' forum.

Friday, March 9, 2012

Experiment -- How transactions are related to locking?

No comments:

Post a Comment

Execution Plan

Blog Archive

About Me