I've been working on an application (see earlier entries) that takes flat file data, parses it, and stuffs it into a database. I got it all working, then I started to do some performance tuning.
I realized there was a bottleneck as records were getting inserted. "No surprise," I thought. I had expected that. My solution was to throw a couple threads at the problem, each with its own connection.
That did increase throughput.
...And index and foreign key constraints violations. (I am using postgres by the way.)
Then I got to thinking: databases are designed from the standpoint of holding and retrieving data, not for inserting it quickly. I mean, look at indexes for goodness sake--they LIVE to make inserting difficult. With postgres, there is the option to relax foreign keys a bit (and this helps), but I have obviously run into a brick wall at this point, or maybe a bug in postgres.
It is late at this point, and I don't feel like dealing with it any longer. I'll work on it some more later, but if I cannot come up with a solution, I am stuck with single threaded database access.
Welcome to the 21st century.
28 August 2004
Database and Bulk-Loading
Posted by Gary Dusbabek at 23:15
Labels: technology
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment