Jetty
  1. Jetty
  2. JETTY-1374

Hadoop's TaskTracker OOM after upgrade to jetty6

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 6.1.14
    • Fix Version/s: None
    • Component/s: Servlet
    • Labels:
      None
    • Environment:
      jetty 6.1.14
    • Number of attachments :
      0

      Description

      Our TaskTracker process OOM while serving map outputs. We dump the heap and analyze it with MAT. We found one instance of "org.mortbay.thread.QueuedThreadPool" occupies 853,258,184 (72.51%) bytes. This object contain a "java.lang.Runnable[]" which has 7200 elements.

      The QueuedThreadPool of jetty6 own an array of jobs. If an idle thread is available a job is directly dispatched, otherwise the job is queued to the array. At first the size of the array is _maxThreads(40). When its full, the size grow to array.length() + _maxThreads. Because the grow has no limit, this array can occupy too many memory when there are lots of fetch request from reduce task. So is this jetty6's bug?

        Activity

        Hide
        Liyin Liang added a comment -

        Hi Thomas,
        Thanks for your suggestion. Because the Hadoop community still use Jetty6. We can't upgrade to Jetty7 to face the risks alone.
        We have decided to build jetty based on 6.1.14, with following patches to fix OOM bugs.
        JETTY-1157, Don't hold array passed in write(byte[]).
        JETTY-861,switched buffer pools to ThreadLocal implementation.
        JETTY-1188,Null old jobs in QueuedThreadPool.

        Show
        Liyin Liang added a comment - Hi Thomas, Thanks for your suggestion. Because the Hadoop community still use Jetty6. We can't upgrade to Jetty7 to face the risks alone. We have decided to build jetty based on 6.1.14, with following patches to fix OOM bugs. JETTY-1157 , Don't hold array passed in write(byte[]). JETTY-861 ,switched buffer pools to ThreadLocal implementation. JETTY-1188 ,Null old jobs in QueuedThreadPool.
        Hide
        Thomas Becker added a comment -

        Liyin, so is this a feasible workaround for you and we can close this issue here?

        Maybe we should push Hadoop to move to jetty7. Unless they've customized jetty, the migration shouldn't be too hard and is really worth it.

        Show
        Thomas Becker added a comment - Liyin, so is this a feasible workaround for you and we can close this issue here? Maybe we should push Hadoop to move to jetty7. Unless they've customized jetty, the migration shouldn't be too hard and is really worth it.
        Hide
        Liyin Liang added a comment -

        Thomas, you can close this issue now.
        Many apps in our company have upgraded to jetty7. To push Hadoop to move to jetty7, we should prove it's stable enough and has performance improvement.

        Show
        Liyin Liang added a comment - Thomas, you can close this issue now. Many apps in our company have upgraded to jetty7. To push Hadoop to move to jetty7, we should prove it's stable enough and has performance improvement.
        Hide
        Greg Wilkins added a comment -

        Liyin,

        if there are problems upgrading to 6.1.26, do please raise Jetty issues if you suspect there is a problem. However, we would be much more responsive and better able to assist if the effort was put into an upgrade to jetty-7

        cheers

        Show
        Greg Wilkins added a comment - Liyin, if there are problems upgrading to 6.1.26, do please raise Jetty issues if you suspect there is a problem. However, we would be much more responsive and better able to assist if the effort was put into an upgrade to jetty-7 cheers
        Hide
        Liyin Liang added a comment -

        Hi Greg,
        After upgrading to 6.1.26, our product cluster met two problems:
        . MAPREDUCE-2529 ---> JETTY-1342
        . MAPREDUCE-2530 ---> JETTY-937

        Both of them cause fetch failure issues that delaying the jobs. What's more, we don't know how to reproduce them. Another problem is the checkpoint (download about 10G data through jetty) become very slow.

        I'll try to do some test work with jetty7 in a Hadoop cluster. If it works well, we can raise Hadoop issues to push move Hadoop to upgrade to jetty7.

        tks

        Show
        Liyin Liang added a comment - Hi Greg, After upgrading to 6.1.26, our product cluster met two problems: . MAPREDUCE-2529 ---> JETTY-1342 . MAPREDUCE-2530 ---> JETTY-937 Both of them cause fetch failure issues that delaying the jobs. What's more, we don't know how to reproduce them. Another problem is the checkpoint (download about 10G data through jetty) become very slow. I'll try to do some test work with jetty7 in a Hadoop cluster. If it works well, we can raise Hadoop issues to push move Hadoop to upgrade to jetty7. tks

          People

          • Assignee:
            Unassigned
            Reporter:
            Liyin Liang
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: