More permanent stuff at http://www.dusbabek.org/~garyd

28 August 2004

Database and Bulk-Loading

I've been working on an application (see earlier entries) that takes flat file data, parses it, and stuffs it into a database. I got it all working, then I started to do some performance tuning.

I realized there was a bottleneck as records were getting inserted. "No surprise," I thought. I had expected that. My solution was to throw a couple threads at the problem, each with its own connection.

That did increase throughput.

...And index and foreign key constraints violations. (I am using postgres by the way.)

Then I got to thinking: databases are designed from the standpoint of holding and retrieving data, not for inserting it quickly. I mean, look at indexes for goodness sake--they LIVE to make inserting difficult. With postgres, there is the option to relax foreign keys a bit (and this helps), but I have obviously run into a brick wall at this point, or maybe a bug in postgres.

It is late at this point, and I don't feel like dealing with it any longer. I'll work on it some more later, but if I cannot come up with a solution, I am stuck with single threaded database access.

Welcome to the 21st century.

0 comments: