Archiva
  1. Archiva
  2. MRM-1097

Error 500 "too many open files"

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.3
    • Fix Version/s: 1.3.3
    • Component/s: system
    • Labels:
      None
    • Environment:
      Linux RedHat, Tomcat 6 + Native Connector
    • Number of attachments :
      1

      Description

      In our start-script we have configured "ulimit -n 10000"
      But still, occassionally we get http error 500 "java.io.FileNotFoundException ... (Too many open files)" on a Maven-artifact-request.

      I suppose the files are not properly closed after they are delivered.

      1. MRM-1097.patch
        54 kB
        Maria Catherine Tan

        Issue Links

          Activity

          Hide
          Brett Porter added a comment -

          Marc, is it the same error as in MRM-1100?

          Show
          Brett Porter added a comment - Marc, is it the same error as in MRM-1100 ?
          Hide
          Marc Lustig added a comment -

          I can only say the exception looks different.
          But we seem to have fixed it on OS level.

          Show
          Marc Lustig added a comment - I can only say the exception looks different. But we seem to have fixed it on OS level.
          Hide
          Peter Lawrey added a comment - - edited

          Do you know when 1.2-M2 will be released so I can test this?
          It is unusable as it is see MRM-1103. I can reproduce the error in under 1 minute.

          BTW: I found that most of the fds left open were sockets rather than files. So hopefully this has been fixed too.

          Show
          Peter Lawrey added a comment - - edited Do you know when 1.2-M2 will be released so I can test this? It is unusable as it is see MRM-1103 . I can reproduce the error in under 1 minute. BTW: I found that most of the fds left open were sockets rather than files. So hopefully this has been fixed too.
          Hide
          Lou S added a comment -

          Can you elaborate on the OS fix? We have the same issue but what we did was crank up our FDs to 65K and it has seemed to gone away presumably because it gives Archiva more space to breath. We also changed the Apache socket timeout in front of our Archiva servers from 0 (never) to 30 minutes. The former seems a bit much considering the default is 1024 on RHEL. Thoughts?

          Show
          Lou S added a comment - Can you elaborate on the OS fix? We have the same issue but what we did was crank up our FDs to 65K and it has seemed to gone away presumably because it gives Archiva more space to breath. We also changed the Apache socket timeout in front of our Archiva servers from 0 (never) to 30 minutes. The former seems a bit much considering the default is 1024 on RHEL. Thoughts?
          Hide
          Brett Porter added a comment -

          I've been running HTTP requests from Maven, some that are proxying and hitting the web interface, and following along in dtrace and checking lsof - all the FDs are getting closed, and likewise the sockets. This is with trunk.

          I'll try hitting this with hundreds of proxy requests at a time and see if that produces some results.

          Show
          Brett Porter added a comment - I've been running HTTP requests from Maven, some that are proxying and hitting the web interface, and following along in dtrace and checking lsof - all the FDs are getting closed, and likewise the sockets. This is with trunk. I'll try hitting this with hundreds of proxy requests at a time and see if that produces some results.
          Hide
          Brett Porter added a comment -

          would you be able to test this with 1.2-M2 when it is released and see if it is still reproducible?

          Show
          Brett Porter added a comment - would you be able to test this with 1.2-M2 when it is released and see if it is still reproducible?
          Hide
          Lou S added a comment -

          We only have one environment to test right now but I could certainly give it a try when a GA release is available.

          Show
          Lou S added a comment - We only have one environment to test right now but I could certainly give it a try when a GA release is available.
          Hide
          Ben Lidgey added a comment -

          We get this happening on almost a daily basis running Archiva 1.2 on Tomcat 5.5.17.

          Show
          Ben Lidgey added a comment - We get this happening on almost a daily basis running Archiva 1.2 on Tomcat 5.5.17.
          Hide
          Maria Odea Ching added a comment -

          Is this still persistent in the latest release (1.2.2)?

          Show
          Maria Odea Ching added a comment - Is this still persistent in the latest release (1.2.2)?
          Hide
          Fredrik Jonson added a comment -

          Marc, did you confirm that the "ulimit -n 10000" comand was successfully executed?

          Normally the nofile hard limit, configured in /etc/security/limits.conf in most linux distributions, regulates how far ulimit may raise the limit on the current shell. Normally a ordinary user cannot raise the nofile limit beyond what is specified in limits.conf, and when nofile isn't explicitly specified there, the default is normally 1024.

          Show
          Fredrik Jonson added a comment - Marc, did you confirm that the "ulimit -n 10000" comand was successfully executed? Normally the nofile hard limit, configured in /etc/security/limits.conf in most linux distributions, regulates how far ulimit may raise the limit on the current shell. Normally a ordinary user cannot raise the nofile limit beyond what is specified in limits.conf, and when nofile isn't explicitly specified there, the default is normally 1024.
          Hide
          Marc Lustig added a comment -

          Hi Frederik,
          I call the ulimit-command in the inetd-script, which is itself called by user root.

          Show
          Marc Lustig added a comment - Hi Frederik, I call the ulimit-command in the inetd-script, which is itself called by user root.
          Hide
          Brent N Atkinson added a comment -

          Marc, are you using LDAP authentication/authorization when you experience this? I just patched my Archiva servers because they would halt when downloading a large number of artifacts. It turned out to be caused by LDAP connection leaks. I have fixed the leaks and can now download hundreds/thousands of artifacts without hitting the file descriptor limit. If you think you might be experiencing this, check out http://jira.codehaus.org/browse/REDBACK-185. I'll be submitting a patch soon.

          Show
          Brent N Atkinson added a comment - Marc, are you using LDAP authentication/authorization when you experience this? I just patched my Archiva servers because they would halt when downloading a large number of artifacts. It turned out to be caused by LDAP connection leaks. I have fixed the leaks and can now download hundreds/thousands of artifacts without hitting the file descriptor limit. If you think you might be experiencing this, check out http://jira.codehaus.org/browse/REDBACK-185 . I'll be submitting a patch soon.
          Hide
          Marc Lustig added a comment -

          Hi Brent, we did not use LDAP authentication.

          Show
          Marc Lustig added a comment - Hi Brent, we did not use LDAP authentication.
          Hide
          Steve Pugh added a comment - - edited

          We also get this. Our max filehandles is set to the default 1024.
          At the time the error was occuring, I did a
          ls -l /proc/PROCESSID/fd/
          and confirmed there were 1024 filehandles. It also showed that vast majority of these were sockets.

          So then I looked at the outgoing connections for this process using netstat.
          I found there were about 800 connections stuck in CLOSE_WAIT state. These were connections to addresses of the external repositories we have configured in archiva. Could it be possible that these connections are not not being closed correctly in certain circumstances?

          (Additional info: Our archiva is embedded in tomcat 6.0.16)

          Show
          Steve Pugh added a comment - - edited We also get this. Our max filehandles is set to the default 1024. At the time the error was occuring, I did a ls -l /proc/PROCESSID/fd/ and confirmed there were 1024 filehandles. It also showed that vast majority of these were sockets. So then I looked at the outgoing connections for this process using netstat. I found there were about 800 connections stuck in CLOSE_WAIT state. These were connections to addresses of the external repositories we have configured in archiva. Could it be possible that these connections are not not being closed correctly in certain circumstances? (Additional info: Our archiva is embedded in tomcat 6.0.16)
          Hide
          Brett Porter added a comment -

          thanks for the additional information. I believe we've checked that angle before, but I'll take another look.

          Show
          Brett Porter added a comment - thanks for the additional information. I believe we've checked that angle before, but I'll take another look.
          Hide
          Tim Fulmer added a comment -

          This was passed to me after asking around about the "too many open files" our Archiva server gives after running about a week. Could this be what we're seeing?

          http://coldfused.blogspot.com/2007/02/mystery-of-too-many-open-files.html

          Show
          Tim Fulmer added a comment - This was passed to me after asking around about the "too many open files" our Archiva server gives after running about a week. Could this be what we're seeing? http://coldfused.blogspot.com/2007/02/mystery-of-too-many-open-files.html
          Hide
          Andreyev Dias de Melo added a comment -

          @Tim Fulmer I guess we are having this error too, but our error isn't the './././.' but indexer cfs deleted. We opened MRM-1419 to file it.

          TIA!

          Show
          Andreyev Dias de Melo added a comment - @Tim Fulmer I guess we are having this error too, but our error isn't the './././.' but indexer cfs deleted. We opened MRM-1419 to file it. TIA!
          Hide
          Maria Odea Ching added a comment -
          Show
          Maria Odea Ching added a comment - Here's the related community discussion: http://old.nabble.com/MRM-1097---too-many-open-file-td29856351.html
          Hide
          Maria Catherine Tan added a comment -

          I found some streams that were not closed so I attached a patch for that.

          I haven't reviewed the indexing part yet.

          Show
          Maria Catherine Tan added a comment - I found some streams that were not closed so I attached a patch for that. I haven't reviewed the indexing part yet.
          Hide
          Brett Porter added a comment -

          Hi Marica,

          I haven't looked closely, but the ones for the test might certainly speed them up.

          The only relevant change I can see is the Checksum one, but that is closed by the caller of the update. Do you know if this makes a difference? Are you sure none of these cause something to be closed twice?

          Show
          Brett Porter added a comment - Hi Marica, I haven't looked closely, but the ones for the test might certainly speed them up. The only relevant change I can see is the Checksum one, but that is closed by the caller of the update. Do you know if this makes a difference? Are you sure none of these cause something to be closed twice?
          Hide
          Gwen Harold Autencio added a comment -

          As I have checked, when doing a deployment on the repository while doing a repo scan on it.
          I think archiva creates an IndexingContext for every file that is being added in the repository. Every IndexingContent may open streams for its IndexWriter and IndexReader.

          Show
          Gwen Harold Autencio added a comment - As I have checked, when doing a deployment on the repository while doing a repo scan on it. I think archiva creates an IndexingContext for every file that is being added in the repository. Every IndexingContent may open streams for its IndexWriter and IndexReader.
          Hide
          Maria Odea Ching added a comment -

          After reviewing the code for the repo scanning and indexing, these are the different scenarios where an indexing task may be executed:

          • When a repository scan is triggered, the beginScan(..) method of all the repository consumers is called. The indexing context is created at the beginScan(..) method of the indexer-consumer, where the index reader/writer is opened. For each artifact being processed by the indexer-consumer, an indexing task is created and is queued in the ArchivaIndexingTaskExecutor, which calls the Nexus indexer engine and indexes the artifact. Once the repository scan is finished, the completeScan(..) method of all the repository consumers is called. The indexing context created at the start by the indexer-consumer will then be queued as an indexing task in the ArchivaIndexingTaskExecutor, but this time the task is for closing the context.
          • For each artifact deployment and for each artifact fetched from the proxies, the RepositoryContentConsumers.executeConsumers(..) method is called. This method does the same thing as with the repository scanner where it calls the beginScan(..) method of all repo consumers then processes the deployed/proxied artifact and then calls the completeScan(..) method. This means that an indexing context is created for every deployed/proxied artifact, which in turn results to an open index reader/writer while the indexing task for the artifact is still queued.
          • It's also possible that a repo scanning may have already finished, but the indexing tasks for the artifacts in the repo are still queued. This means that the indexing context is still open because it would only be closed once all the indexing tasks for the artifacts in the repo have finished executing. Since the repo scanning has finished, a user would be able to trigger another repository scan or the repo scanning schedule would fire and execute again, thereby creating and opening another indexing context.

          If these three scenarios happen at the same time, especially when the repository being scanned is a large one, the "Too many open files" is bound to happen.

          Show
          Maria Odea Ching added a comment - After reviewing the code for the repo scanning and indexing, these are the different scenarios where an indexing task may be executed: When a repository scan is triggered, the beginScan(..) method of all the repository consumers is called. The indexing context is created at the beginScan(..) method of the indexer-consumer, where the index reader/writer is opened. For each artifact being processed by the indexer-consumer, an indexing task is created and is queued in the ArchivaIndexingTaskExecutor , which calls the Nexus indexer engine and indexes the artifact. Once the repository scan is finished, the completeScan(..) method of all the repository consumers is called. The indexing context created at the start by the indexer-consumer will then be queued as an indexing task in the ArchivaIndexingTaskExecutor , but this time the task is for closing the context. For each artifact deployment and for each artifact fetched from the proxies, the RepositoryContentConsumers.executeConsumers(..) method is called. This method does the same thing as with the repository scanner where it calls the beginScan(..) method of all repo consumers then processes the deployed/proxied artifact and then calls the completeScan(..) method. This means that an indexing context is created for every deployed/proxied artifact, which in turn results to an open index reader/writer while the indexing task for the artifact is still queued. It's also possible that a repo scanning may have already finished, but the indexing tasks for the artifacts in the repo are still queued. This means that the indexing context is still open because it would only be closed once all the indexing tasks for the artifacts in the repo have finished executing. Since the repo scanning has finished, a user would be able to trigger another repository scan or the repo scanning schedule would fire and execute again, thereby creating and opening another indexing context. If these three scenarios happen at the same time, especially when the repository being scanned is a large one, the "Too many open files" is bound to happen.
          Hide
          Maria Odea Ching added a comment -

          I made some fixes for this in trunk -r1039004. These are the changes:

          • upgrade nexus indexer to 3.0.1 + added logger bean which is needed by NexusIndexer
          • for deployments/proxied artifacts, create and close indexing context when indexing task is executed as compared to the original behavior where the indexing context is created before it is queued (so as the indexing task queue builds up, more indexing contexts are being opened/created which might be causing the too many open files problem)
          • updated unit tests with changes made
          Show
          Maria Odea Ching added a comment - I made some fixes for this in trunk -r1039004 . These are the changes: upgrade nexus indexer to 3.0.1 + added logger bean which is needed by NexusIndexer for deployments/proxied artifacts, create and close indexing context when indexing task is executed as compared to the original behavior where the indexing context is created before it is queued (so as the indexing task queue builds up, more indexing contexts are being opened/created which might be causing the too many open files problem) updated unit tests with changes made
          Hide
          Brett Porter added a comment -

          would this be able to be applied to 1.3.x cleanly? I'd like to get a 1.4 out soon, but it'd be nice to address some of the key performance concerns in 1.3.x if this helps.

          Show
          Brett Porter added a comment - would this be able to be applied to 1.3.x cleanly? I'd like to get a 1.4 out soon, but it'd be nice to address some of the key performance concerns in 1.3.x if this helps.
          Hide
          Maria Odea Ching added a comment - - edited

          I checked the codes of the classes that were changed in the fix I committed, they're mostly similar in 1.3.x and trunk. But we can't directly merge the changes using svn merge -c ... because some of the classes (like the IndexingTaskExecutor and the consumers) have been relocated to a different module in trunk.

          Show
          Maria Odea Ching added a comment - - edited I checked the codes of the classes that were changed in the fix I committed, they're mostly similar in 1.3.x and trunk. But we can't directly merge the changes using svn merge -c ... because some of the classes (like the IndexingTaskExecutor and the consumers) have been relocated to a different module in trunk.
          Hide
          Brett Porter added a comment -

          I'll try to backport the fix

          Show
          Brett Porter added a comment - I'll try to backport the fix
          Hide
          Elie Delorme added a comment - - edited

          I just received this error on our archiva standalone 1.3.4 installation which we upgraded hoping that it would resolve the problem.
          The "Too many open files" appeared after only 3 hours uptime.

          2011-04-13 20:55:22,444 [WrapperSimpleAppMain] INFO  org.apache.maven.archiva.web.startup.Banner  - _________________________
                                    __________________________________
                         /\_       /                                  \
                        /`/@),    |  On behalf of all of the alpacas   |
                        |  (~'  __| toiling away on the Apache Archiva |
                _,--.___/  |    \      project team, I would like to   |
              ,' ,     (   |     \         welcome you to Archiva      |
              |  (      \  /      |               1.3.4                |
               \  )\_/  ,_/       |                                    |
               / /   ( |/         |     http://archiva.apache.org/     |
              ( |    ( |          |     users@archiva.apache.org       |
               \|     \|           \__________________________________/
          
          
          2011-04-14 00:09:43,639 [btpool0-8] WARN  org.apache.maven.archiva.proxy.DefaultRepositoryProxyConnectors  - Transfer error from repository "terracotta" for artifact org.hibernate:ejb3-persistence:1.0.2.GA::pom, continuing to next repository. Error message: Download failure on resource [http://www.terracotta.org/download/reflector/releases/org/hibernate/ejb3-persistence/1.0.2.GA/ejb3-persistence-1.0.2.GA.pom]:Error transferring file: Too many open files (cause: java.net.SocketException: Too many open files)
          
          Show
          Elie Delorme added a comment - - edited I just received this error on our archiva standalone 1.3.4 installation which we upgraded hoping that it would resolve the problem. The "Too many open files" appeared after only 3 hours uptime. 2011-04-13 20:55:22,444 [WrapperSimpleAppMain] INFO org.apache.maven.archiva.web.startup.Banner - _________________________ __________________________________ /\_ / \ /`/@), | On behalf of all of the alpacas | | (~' __| toiling away on the Apache Archiva | _,--.___/ | \ project team, I would like to | ,' , ( | \ welcome you to Archiva | | ( \ / | 1.3.4 | \ )\_/ ,_/ | | / / ( |/ | http://archiva.apache.org/ | ( | ( | | users@archiva.apache.org | \| \| \__________________________________/ 2011-04-14 00:09:43,639 [btpool0-8] WARN org.apache.maven.archiva.proxy.DefaultRepositoryProxyConnectors - Transfer error from repository "terracotta" for artifact org.hibernate:ejb3-persistence:1.0.2.GA::pom, continuing to next repository. Error message: Download failure on resource [http://www.terracotta.org/download/reflector/releases/org/hibernate/ejb3-persistence/1.0.2.GA/ejb3-persistence-1.0.2.GA.pom]:Error transferring file: Too many open files (cause: java.net.SocketException: Too many open files)
          Hide
          Brett Porter added a comment -

          Elie, as this is closed can you file a new issue against 1.3.4? There may be an additional cause, or it may require tuning on your limits.

          Show
          Brett Porter added a comment - Elie, as this is closed can you file a new issue against 1.3.4? There may be an additional cause, or it may require tuning on your limits.
          Hide
          Manfred Schürhoff added a comment -

          Hi,

          We're running Archiva 1.3.5 under Tomcat 7.0.4. We still run into the issue of "Too many open files" when running. We cleared out the local tomcat m2 repository, and while running a build, soon we get this error.
          We've also updated the /etc/security/limits.conf file for the tomcat user to:

          tomcat soft nofile 65535
          tomcat hard nofile 65535

          but without any luck.

          Is there any solution to this issue?

          Show
          Manfred Schürhoff added a comment - Hi, We're running Archiva 1.3.5 under Tomcat 7.0.4. We still run into the issue of "Too many open files" when running. We cleared out the local tomcat m2 repository, and while running a build, soon we get this error. We've also updated the /etc/security/limits.conf file for the tomcat user to: tomcat soft nofile 65535 tomcat hard nofile 65535 but without any luck. Is there any solution to this issue?
          Hide
          Brett Porter added a comment -

          Manfred, it's probably best to pursue this through the users@ list, or a new issue if you have a way to reproduce it. This issue was about a specific thing that was fixed some time back, and we'll lose track of comments here since it is closed. Thanks!

          Show
          Brett Porter added a comment - Manfred, it's probably best to pursue this through the users@ list, or a new issue if you have a way to reproduce it. This issue was about a specific thing that was fixed some time back, and we'll lose track of comments here since it is closed. Thanks!

            People

            • Assignee:
              Maria Odea Ching
              Reporter:
              Marc Lustig
            • Votes:
              6 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: