BTM
  1. BTM
  2. BTM-45

potential NPE in Recoverer.run

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.3.2
    • Fix Version/s: 1.3.3
    • Labels:
      None
    • Number of attachments :
      0

      Description

      Looking at this code in run()

                  Iterator it = ResourceRegistrar.getResourcesUniqueNames().iterator();
                  while (it.hasNext()) {
                      String name = (String) it.next();
                      registeredResources.put(name, ResourceRegistrar.get(name));
                  }
      

      Since ResourceRegistrar.getResourcesUniqueNames() returns not a view but a new HashSet, by the time ResourceRegistrar.get(name) is called that key/value pair may be gone and that get will return null.

        Activity

        Hide
        Nikita Tovstoles added a comment -

        Here's how I fixed the above locally:

        in Recoverer:

            public void run() {
                try {
                    committedCount = 0;
                    rolledbackCount = 0;
        
                    // Query resources from ResourceRegistrar and copy to registeredResources
                    //fix for BTM-45
                    registeredResources.putAll(ResourceRegistrar.getCopyOfResources());
        
                    // get list of in-flight transactions that should not be recovered and the list 
                    // of dangling journal records should be collected atomically to avoid race conditions
                    Map danglingRecords;
                    Set inFlightGtrids = new HashSet();
                    if (TransactionManagerServices.isTransactionManagerRunning()) {
                        Map inFlightTransactions = TransactionManagerServices.getTransactionManager().getInFlightTransactions();
                        synchronized (inFlightTransactions) {
                            inFlightGtrids = inFlightTransactions.keySet();
                            danglingRecords = TransactionManagerServices.getJournal().collectDanglingRecords();
                        }
                    }
                    else {
                        danglingRecords = TransactionManagerServices.getJournal().collectDanglingRecords();
                    }
        
                    // 1. call recover on all known resources
                    Set unrecoverableResourceNames = recoverAllResources(inFlightGtrids);
        
                    // 1.1. unregister unrecoverable resources
                    Iterator itUnrecoverable = unrecoverableResourceNames.iterator();
                    while (itUnrecoverable.hasNext()) {
                        String uniqueName = (String) itUnrecoverable.next();
                        if (log.isDebugEnabled()) log.debug("closing unrecoverable resource " + uniqueName);
                        XAResourceProducer producer = (XAResourceProducer) registeredResources.remove(uniqueName);
                        producer.close();
                    }
        
                    // 1.2. register a task to retry registration, if needed
                    if (isRetryUnrecoverableResourcesRegistrationEnabled()) {
                        int intervalInMinutes = TransactionManagerServices.getConfiguration().getRetryUnrecoverableResourcesRegistrationInterval();
                        TransactionManagerServices.getTaskScheduler().scheduleRetryUnrecoverableResourcesRegistration(new Date(System.currentTimeMillis() + intervalInMinutes * 60 * 1000));
                    }
                    else
                        if (log.isDebugEnabled()) log.debug("will not retry unrecoverable resources registration");
        
                    // 2. commit dangling COMMITTING transactions
                    Set committedGtrids = commitDanglingTransactions(inFlightGtrids, danglingRecords);
                    committedCount = committedGtrids.size();
        
                    // 3. rollback any remaining recovered transaction
                    rolledbackCount = rollbackAbortedTransactions(committedGtrids);
        
                    log.info("recovery committed " + committedCount + " dangling transaction(s) and rolled back " + rolledbackCount +
                            " aborted transaction(s) on " + registeredResources.size() + " resource(s) [" + getRegisteredResourcesUniqueNames() + "]" +
                            (isRetryUnrecoverableResourcesRegistrationEnabled() ? ", discarded " + unrecoverableResourceNames.size() + " unrecoverable resource(s) [" + buildUniqueNamesString(unrecoverableResourceNames) + "]": ""));
                    this.completionException = null;
                } catch (Exception ex) {
                    this.completionException = ex;
                    if (log.isDebugEnabled()) log.debug("recovery failed, registered resource(s): " + getRegisteredResourcesUniqueNames(), ex);
                }
                finally {
                    recoveredXidSets.clear();
                    registeredResources.clear();
                }
            }
        

        in ResourceRegistrar:

            //private static Map resources = new HashMap();
            final private static ConcurrentMap<String, XAResourceProducer> resources = new ConcurrentHashMap<String, XAResourceProducer>();
        
            //to address BTM-45
            public synchronized static Map<String, XAResourceProducer> getCopyOfResources() {
                return new HashMap<String, XAResourceProducer>(resources);
            }
        
        
        
        Show
        Nikita Tovstoles added a comment - Here's how I fixed the above locally: in Recoverer: public void run() { try { committedCount = 0; rolledbackCount = 0; // Query resources from ResourceRegistrar and copy to registeredResources //fix for BTM-45 registeredResources.putAll(ResourceRegistrar.getCopyOfResources()); // get list of in-flight transactions that should not be recovered and the list // of dangling journal records should be collected atomically to avoid race conditions Map danglingRecords; Set inFlightGtrids = new HashSet(); if (TransactionManagerServices.isTransactionManagerRunning()) { Map inFlightTransactions = TransactionManagerServices.getTransactionManager().getInFlightTransactions(); synchronized (inFlightTransactions) { inFlightGtrids = inFlightTransactions.keySet(); danglingRecords = TransactionManagerServices.getJournal().collectDanglingRecords(); } } else { danglingRecords = TransactionManagerServices.getJournal().collectDanglingRecords(); } // 1. call recover on all known resources Set unrecoverableResourceNames = recoverAllResources(inFlightGtrids); // 1.1. unregister unrecoverable resources Iterator itUnrecoverable = unrecoverableResourceNames.iterator(); while (itUnrecoverable.hasNext()) { String uniqueName = (String) itUnrecoverable.next(); if (log.isDebugEnabled()) log.debug("closing unrecoverable resource " + uniqueName); XAResourceProducer producer = (XAResourceProducer) registeredResources.remove(uniqueName); producer.close(); } // 1.2. register a task to retry registration, if needed if (isRetryUnrecoverableResourcesRegistrationEnabled()) { int intervalInMinutes = TransactionManagerServices.getConfiguration().getRetryUnrecoverableResourcesRegistrationInterval(); TransactionManagerServices.getTaskScheduler().scheduleRetryUnrecoverableResourcesRegistration(new Date(System.currentTimeMillis() + intervalInMinutes * 60 * 1000)); } else if (log.isDebugEnabled()) log.debug("will not retry unrecoverable resources registration"); // 2. commit dangling COMMITTING transactions Set committedGtrids = commitDanglingTransactions(inFlightGtrids, danglingRecords); committedCount = committedGtrids.size(); // 3. rollback any remaining recovered transaction rolledbackCount = rollbackAbortedTransactions(committedGtrids); log.info("recovery committed " + committedCount + " dangling transaction(s) and rolled back " + rolledbackCount + " aborted transaction(s) on " + registeredResources.size() + " resource(s) [" + getRegisteredResourcesUniqueNames() + "]" + (isRetryUnrecoverableResourcesRegistrationEnabled() ? ", discarded " + unrecoverableResourceNames.size() + " unrecoverable resource(s) [" + buildUniqueNamesString(unrecoverableResourceNames) + "]": "")); this.completionException = null; } catch (Exception ex) { this.completionException = ex; if (log.isDebugEnabled()) log.debug("recovery failed, registered resource(s): " + getRegisteredResourcesUniqueNames(), ex); } finally { recoveredXidSets.clear(); registeredResources.clear(); } } in ResourceRegistrar: //private static Map resources = new HashMap(); final private static ConcurrentMap<String, XAResourceProducer> resources = new ConcurrentHashMap<String, XAResourceProducer>(); //to address BTM-45 public synchronized static Map<String, XAResourceProducer> getCopyOfResources() { return new HashMap<String, XAResourceProducer>(resources); }
        Hide
        Ludovic Orban added a comment -

        I think this would solve the NPE but leave a race condition in the recovery: if a resource gets unregistered while the getCopyOfResources() method is running the recoverer might end up with an incorrect list of resources.

        Please also keep in mind that BTM 1.3.x still supports JDK 1.4 so your modifications including generics and ConcurrentHashMap cannot be included as-is. Even if I was convinced the issue is fixed I would still have to translate your code to JDK 1.4.

        Show
        Ludovic Orban added a comment - I think this would solve the NPE but leave a race condition in the recovery: if a resource gets unregistered while the getCopyOfResources() method is running the recoverer might end up with an incorrect list of resources. Please also keep in mind that BTM 1.3.x still supports JDK 1.4 so your modifications including generics and ConcurrentHashMap cannot be included as-is. Even if I was convinced the issue is fixed I would still have to translate your code to JDK 1.4.
        Hide
        Ludovic Orban added a comment -

        This issue has been fixed by synchronizing the resource names collection on the ResourceRegistrar's class. This will add more heat to an already know hot spot but this can be improved later on.

        Show
        Ludovic Orban added a comment - This issue has been fixed by synchronizing the resource names collection on the ResourceRegistrar's class. This will add more heat to an already know hot spot but this can be improved later on.

          People

          • Assignee:
            Ludovic Orban
            Reporter:
            Nikita Tovstoles
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: