Wednesday, March 21, 2012

Package hangs on parallel "upserts"

I have data flow tasks, one which validates the import file and one which processes the import file if the validation passed. The validation runs and pushes the three row types to three different recordset destinations. When I enter the processing data flow task, I have three parallel trees processing each recordset saved in the previous task. I'm using a script component to generate the rows which are then sorted and merged with the production database to find existing records. Based on this, I split to an OLE DB command (running an UPDATE command) or OLE DB destination (to simply insert the records.)

In this particular case, all records are being updated and nothing is being inserted new. Two of the three trees will complete the sort but hang on the merge, split, and OLE DB command components. The other will do the same but also hang on the split.

In another case, I truncated each destination table before running the package and the package runs fine.

Are toes being stepped on in the data flow task and causing a deadlock?

Update: I removed the sort transformation and sorted the rows before pushing them to the recordsets and I still get the same results.

If you were getting a deadlock then I think SQL Server would capture this and throw an error. That's not what's happening here. Its more likely that a lock is occurring. Use sp_who2 and sp_lock to determine if blocking is occurring.

-Jamie

|||

A common issue when your modifying the same table with multiple paths in your data flow is that you will have a deadlock on the destination table. To resolve:

1) choose "Fast load" and "Don't lock table" in the OLEDB Destination

2) ensure that you don't have a race condition between the paths, i.e. that the updates depends on the inserts _in the same run_ have reached the DB before the updates.

Cheers/Kristian

|||I checked the sp_who2 stored procedure and nothing was in the BlkBy column.
Fast load is being used and I chose "don't lock table" but that didn't fix the issue.
The updates don't depend on the inserts. I merge the input with production on keys and split on NULL values in production.

I ended up splitting the three trees into three data tasks to see if that would fix the issue but I get the same results, only this time I noticed no all rows have left the merge. The merge is a left outer join on keys and in this case all records input are already in the database. So I'm joining 62k input records with the same number of (identical) production records and 9,937 rows leave the merge. Also, when running the package, that task remains yellow.
|||

The solution is pretty simple. Do the inserts and updates in seperate data-flows. You can use raw files to pass data between data-flows.

-Jamie

No comments:

Post a Comment