GeoTools
  1. GeoTools
  2. GEOT-2000

Use batch of statements for big inserts

    Details

    • Type: Improvement Improvement
    • Status: Open Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 2.5.9
    • Component/s: jdbc
    • Labels:
      None

      Description

      When the FeatureStore.addFeatures(Colleciton) method is used try to use batch of statements to insert data in order to greatly speed up data transfer. This would allow for effiecient data copying towards a spatial db, especially when coupled with prepared statements.

        Activity

        Hide
        Christian Mueller added a comment -
        Very good idea to use batch statements but here is a warning.

        In my experience there are situations where batch statements fail. A logfile overflow on the db side can be a reason for this.

        Conclusion: Use batch statements where it is possible (btw, also works for non ps statements) but give the possibility to fall back
        to normal statement handling. This assures that we can handle big tables without running in trouble.
        Show
        Christian Mueller added a comment - Very good idea to use batch statements but here is a warning. In my experience there are situations where batch statements fail. A logfile overflow on the db side can be a reason for this. Conclusion: Use batch statements where it is possible (btw, also works for non ps statements) but give the possibility to fall back to normal statement handling. This assures that we can handle big tables without running in trouble.
        Hide
        Andrea Aime added a comment -
        Yeah, usually when I have to do big imports in a database independent way I use a mix of batches and multiple transactions exactly to avoid that kind of problem (of course, I need to be in a situation in which a partial import is an acceptable result). What I usually do is to batch statements in group of 100, and perform a commit every 1000-10000 items.
        Is the case in DB2 should be similar no? That is, in my experience what triggers the log issue is the big transaction, and not the fact that statements are being batched. Ah, but you say, in auto-commit mode the executeBatch call will be run in a single transaction no matter what.

        Anyways yes, I want to make the batch size configurable, just as the fetch size already is.
        Show
        Andrea Aime added a comment - Yeah, usually when I have to do big imports in a database independent way I use a mix of batches and multiple transactions exactly to avoid that kind of problem (of course, I need to be in a situation in which a partial import is an acceptable result). What I usually do is to batch statements in group of 100, and perform a commit every 1000-10000 items. Is the case in DB2 should be similar no? That is, in my experience what triggers the log issue is the big transaction, and not the fact that statements are being batched. Ah, but you say, in auto-commit mode the executeBatch call will be run in a single transaction no matter what. Anyways yes, I want to make the batch size configurable, just as the fetch size already is.
        Hide
        Christian Mueller added a comment -
        As I have experienced 2 weeks ago, batch statements do not work with auto generated keys. In this case, we have to do a fallback to non batch execution.
        Show
        Christian Mueller added a comment - As I have experienced 2 weeks ago, batch statements do not work with auto generated keys. In this case, we have to do a fallback to non batch execution.
        Hide
        Andrea Aime added a comment -
        So the DB2 jdbc driver is buggy? getGeneratedKeys should report back a list of the generated keys in the same order as the records were inserted.
        Show
        Andrea Aime added a comment - So the DB2 jdbc driver is buggy? getGeneratedKeys should report back a list of the generated keys in the same order as the records were inserted.
        Hide
        Christian Mueller added a comment -
        I found this fact here

        http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1470935&SiteID=1

        Ok, it is a M$ forum, I checked it with DB2 V 9.5 on my linux workstation (no M$ at all) .

        That is really true. If you use batchExecute the method getGeneratedKeys returns an empty ResultSet.

        For Oracle, look here

        http://kr.forums.oracle.com/forums/thread.jspa?threadID=505548&tstart=480

        So I think we have to implement the fallback.
        Show
        Christian Mueller added a comment - I found this fact here http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1470935&SiteID=1 Ok, it is a M$ forum, I checked it with DB2 V 9.5 on my linux workstation (no M$ at all) . That is really true. If you use batchExecute the method getGeneratedKeys returns an empty ResultSet. For Oracle, look here http://kr.forums.oracle.com/forums/thread.jspa?threadID=505548&tstart=480 So I think we have to implement the fallback.
        Hide
        Andrea Aime added a comment -
        Uag... I know that Hibernate uses batches of statements, so it's a good idea to go and check what they do. Ah, primary key wise, I already know, they never use getGenerateKeys(), they generate the key before hand by querying a sequence manually (and to avoid the overhead of the extra query, they have strategies to grab a value from the sequnece and use it to generate a number of keys, see for example http://www.hibernate.org/170.html)
        Show
        Andrea Aime added a comment - Uag... I know that Hibernate uses batches of statements, so it's a good idea to go and check what they do. Ah, primary key wise, I already know, they never use getGenerateKeys(), they generate the key before hand by querying a sequence manually (and to avoid the overhead of the extra query, they have strategies to grab a value from the sequnece and use it to generate a number of keys, see for example http://www.hibernate.org/170.html)

          People

          • Assignee:
            Andrea Aime
            Reporter:
            Andrea Aime
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated: