jira.codehaus.org

  • Log In Access more options
    • Online Help
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What?s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
  • JRuby
  • JRUBY-688

Regexp and MatchData are terribly inefficient; examine all uses and make them as bare-metal to JRegex as possible

  • Log In
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Blocker Blocker
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: JRuby 1.1b1
  • Component/s: None
  • Labels:
    None

Description

Most consumers of Regexp and MatchData internally are using them wrong. They generally are re-searching for every match, when the matcher object created by java.util.regex (and now jregex) can just be told to advance to the next match. This causes performance to be very poor in many of these cases.

All consumers of Regexp and MatchData internally should be examined for improvements. In addition, any uses of Regexp and MatchData that are purely internal and never exposed to scripts should be replaced with direct usage of Pattern and Matcher.

This must be fixed by 0.9.9.

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • Work Log
  • History
  • Activity
Hide
Permalink
Charles Oliver Nutter added a comment - 04/Apr/07 12:30 AM

By 1.0. Perf isn't as high priority as correctness.

Show
Charles Oliver Nutter added a comment - 04/Apr/07 12:30 AM By 1.0. Perf isn't as high priority as correctness.
Hide
Permalink
Thomas E Enebo added a comment - 30/Apr/07 5:22 PM

Bumping

Show
Thomas E Enebo added a comment - 30/Apr/07 5:22 PM Bumping
Hide
Permalink
Marcin Mielzynski added a comment - 12/May/07 12:44 PM

opts pending... (with direct ByteList array access tuning and regexp caching)

Show
Marcin Mielzynski added a comment - 12/May/07 12:44 PM opts pending... (with direct ByteList array access tuning and regexp caching)
Hide
Permalink
Marcin Mielzynski added a comment - 23/May/07 9:50 AM

after r3650 only:
#index, #[], #each_line, #[]=, #, #= and #match String methods don't use Matcher and Pattern directly, it won't be a tough work though.
Moreover, there are some other String methods that could use ByteList buffer more efficiently (also partially done with r3650).

Show
Marcin Mielzynski added a comment - 23/May/07 9:50 AM after r3650 only: #index, #[], #each_line, #[]=, #, #= and #match String methods don't use Matcher and Pattern directly, it won't be a tough work though. Moreover, there are some other String methods that could use ByteList buffer more efficiently (also partially done with r3650).
Hide
Permalink
Charles Oliver Nutter added a comment - 30/Sep/07 10:06 AM

Marcin: do you think there's more we can do for this one right now? If not, go ahead and close it, or perhaps open specific issues for specific cases. This one's lingered too long.

Show
Charles Oliver Nutter added a comment - 30/Sep/07 10:06 AM Marcin: do you think there's more we can do for this one right now? If not, go ahead and close it, or perhaps open specific issues for specific cases. This one's lingered too long.
Hide
Permalink
Marcin Mielzynski added a comment - 30/Sep/07 3:36 PM

Yeah, there are methods left that do not deal with Matcher/Pattern directly. Though, I don't think it's worth to change them, especially since we'll have to rewire them with oni anyway (moreover, most of them should become simpler after that).

Show
Marcin Mielzynski added a comment - 30/Sep/07 3:36 PM Yeah, there are methods left that do not deal with Matcher/Pattern directly. Though, I don't think it's worth to change them, especially since we'll have to rewire them with oni anyway (moreover, most of them should become simpler after that).
Hide
Permalink
Charles Oliver Nutter added a comment - 30/Sep/07 4:46 PM

Most everything we can do has been done; real clean high-performing regexp is going to have to wait until Oniguruma is ported. Marking as fixed.

Show
Charles Oliver Nutter added a comment - 30/Sep/07 4:46 PM Most everything we can do has been done; real clean high-performing regexp is going to have to wait until Oniguruma is ported. Marking as fixed.

People

  • Assignee:
    Charles Oliver Nutter
    Reporter:
    Charles Oliver Nutter
Vote (0)
Watch (1)

Dates

  • Created:
    13/Mar/07 3:37 AM
    Updated:
    23/Apr/08 10:04 AM
    Resolved:
    30/Sep/07 4:46 PM
  • Atlassian JIRA (v5.0.4#731-sha1:3aa7374)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for Codehaus. Try JIRA - bug tracking software for your team.