JRuby (please use github issues at http://bugs.jruby.org)
  1. JRuby (please use github issues at http://bugs.jruby.org)
  2. JRUBY-3732

UTF-8 chacters don't pass through hpricot gracefully since Jruby 1.1.6

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: JRuby 1.2
    • Fix Version/s: None
    • Component/s: HelpWanted, Parser
    • Labels:
      None
    • Environment:
    • Number of attachments :
      0

      Description

      UTF-8 characters no longer pass gracefully through hpricot (after jruby 1.1.6)

      The following code sample, tested with UTF-8 encoding, has input string containing unicode mdash:

      Unable to find source-code formatter for language: ruby. Available languages are: actionscript, html, java, javascript, none, sql, xhtml, xml
      require 'rubygems'
      require 'hpricot'
      
      input = "<p>TUCSON, Ariz.  The driver</p>"
      puts input
      
      doc = Hpricot.parse( input )
      
      puts doc.inner_html
      {code:ruby}
      
      Here is comparative output:
      
      

      % ruby ./utf8_sample_2.rb
      <p>TUCSON, Ariz. The driver</p>
      <p>TUCSON, Ariz. The driver</p>
      david) /opt/dist/jruby-1.1.6/bin/jruby ./utf8_sample_2.rb
      <p>TUCSON, Ariz. The driver</p>
      <p>TUCSON, Ariz. The driver</p>
      % /opt/dist/jruby-1.2.0/bin/jruby ./utf8_sample_2.rb
      <p>TUCSON, Ariz. The driver</p>
      <p>TUCSON, Ariz. €” The driver</p>
      % /opt/dist/jruby-1.3.0/bin/jruby ./utf8_sample_2.rb
      <p>TUCSON, Ariz. The driver</p>
      <p>TUCSON, Ariz. €” The driver</p>

      
      

      Where jruby 1.2.0 and 1.3.0 show a mangled mdash (€”).

        Activity

        Hide
        Daniel Harrington added a comment -

        I'm currently testing the new version, but since it's not "officially" released yet, I'd also vote against closing the ticket now.

        Show
        Daniel Harrington added a comment - I'm currently testing the new version, but since it's not "officially" released yet, I'd also vote against closing the ticket now.
        Hide
        Matthias Brandt added a comment -

        I go with Daniel...

        Show
        Matthias Brandt added a comment - I go with Daniel...
        Hide
        Daniel Hahn added a comment -

        So will this be fixed in the jruby code or in hpricot, or both? Seeing that why disappeared and someone else took over, how will the new hpricot version be released?

        Show
        Daniel Hahn added a comment - So will this be fixed in the jruby code or in hpricot, or both? Seeing that why disappeared and someone else took over, how will the new hpricot version be released?
        Hide
        Charles Oliver Nutter added a comment -

        This is fixed in Hpricot, but I'm not sure who is responsible for releasing it. We need help to track that person down and get an Hpricot release out with the updated code.

        Show
        Charles Oliver Nutter added a comment - This is fixed in Hpricot, but I'm not sure who is responsible for releasing it. We need help to track that person down and get an Hpricot release out with the updated code.
        Hide
        Nick Sieger added a comment -

        Fixed with new Hpricot 0.8.2 release.

        Show
        Nick Sieger added a comment - Fixed with new Hpricot 0.8.2 release.

          People

          • Assignee:
            Thomas E Enebo
            Reporter:
            David Kellum
          • Votes:
            8 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: