Monday, March 12, 2012

Package attempts to query source db constantly when editing

Trying to set up a tranform task between a mysql db using and ADO.NET connection and sql server.
My query to pull from the mysql db is something like "select x,y,z from table where last_updated" > @.User::LastUpdated. This command is set up as an expression for the Data Flow Task and is the value for the [DataReader Source].[SqlCommand]

I have two questions.

Why does the package attempt a query against the mysql database all the time?
And Why is the query attempting to pull the entire table instead of having any regards for my where clause?

I've even added where last_updated > greatest('2006-08-15', '" + @.User::LastUpdated to attempt to get it a where clause even when the parameter isn't set yet.

What is the trick? This is not feasible when pulling from multi-million row tables.

lorijean44 wrote:

Why does the package attempt a query against the mysql database all the time?

Can you qualify what you mean by "all the time"? Do you mean at execution time or design time? What is the package doing when it issues these queries?

lorijean44 wrote:


And Why is the query attempting to pull the entire table instead of having any regards for my where clause?

How have you determined that it is pulling from the entire table?

-Jamie

|||Thank you for your response.
"All the time" means while in design mode, if I click on the Data Reader Source to edit it, when I go to column mappings, when I close the data reader source gui, when I attempt to go back to the control flow window.

I determined it was pulling from the entire table by looking at "show processlist" on the mysql db. I did discover that I needed to do "show full processlist", so that I could see the where clause. But once I figured that out, I could see that it was pulling "select x,y,z from table where last_seen > '1900-01-01'". 1900-01-01 is our default value anytime we are using datetime.

I ended up cheating the system by having it do "select x,y,z from table where last_seen > greatest('2006-08-15', @.last_updated)", so that when it ran the query it would look like "select x,y,z from table where last_seen > greatest('2006-08-15', '1900-01-01')", and not pull so much.
I guess I could set the default to be something more recent. I guess I'm just trying to figure out why does it run the query so often.

Thanks for your time.
-Lori

No comments:

Post a Comment