Maven Assembly Plugin
  1. Maven Assembly Plugin
  2. MASSEMBLY-543

japanese filenames cannot be correctly assembled by maven-assembly-plugin

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.5
    • Component/s: None
    • Labels:
      None
    • Environment:
      Windows XP SP3 Japanese, Eclipse Ganymede 3.4.2
    • Number of attachments :
      1

      Description

      I am successfully created the distribution zip file but when I add a file which is in Japanese character file name, it is include in the zip file but the file name was garbled.

      How to reproduce: add a file to be included in the distribution which file name is in Japanese character. i will provide a sample file.

        Issue Links

          Activity

          Hide
          Eros Sy added a comment - - edited

          oh! my attachment is also garbled.
          i will post here the sample Japanese file name.
          the content is not essential.

          I hope it supported

          Japanese file name: 新規テキスト ドキュメント.xml

          Show
          Eros Sy added a comment - - edited oh! my attachment is also garbled. i will post here the sample Japanese file name. the content is not essential. I hope it supported Japanese file name: 新規テキスト ドキュメント.xml
          Hide
          Eros Sy added a comment -

          anybody who encountered the same issue?
          if any, would you post here on how to resolve this issue.

          thanks in advance.

          Show
          Eros Sy added a comment - anybody who encountered the same issue? if any, would you post here on how to resolve this issue. thanks in advance.
          Hide
          Georg Berky added a comment - - edited

          I have similiar issues with German umlauts such as .

          I found an article that seems to be related: https://blogs.oracle.com/xuemingshen/entry/non_utf_8_encoding_in

          Originally the ZIP standard required file names to be encoded in IBM Codepage 437, but the standard for JAR required them to be encoded in UTF-8. The Java classes to create ZIP files followed the JAR specification and used UTF-8, resulting in garbled characters, mojibake.

          The article mentions that the corresponding Java classes have new constructors now. I haven't read through the source, but is it possible that the plugin is still using the old constructors?

          Show
          Georg Berky added a comment - - edited I have similiar issues with German umlauts such as . I found an article that seems to be related: https://blogs.oracle.com/xuemingshen/entry/non_utf_8_encoding_in Originally the ZIP standard required file names to be encoded in IBM Codepage 437, but the standard for JAR required them to be encoded in UTF-8. The Java classes to create ZIP files followed the JAR specification and used UTF-8, resulting in garbled characters, mojibake. The article mentions that the corresponding Java classes have new constructors now. I haven't read through the source, but is it possible that the plugin is still using the old constructors?
          Hide
          Dmitry Pelevin added a comment - - edited

          Try to use plexus-archiver 2.4.4 as maven assembly plugin dependency. This version include fix for LEF flag PLXCOMP-222. Do not forget to run Maven with "MAVEN_OPTS=-Dfile.encoding=UTF-8" to force ZIP to use UTF-8 as it's encoding.

          Show
          Dmitry Pelevin added a comment - - edited Try to use plexus-archiver 2.4.4 as maven assembly plugin dependency. This version include fix for LEF flag PLXCOMP-222 . Do not forget to run Maven with "MAVEN_OPTS=-Dfile.encoding=UTF-8" to force ZIP to use UTF-8 as it's encoding.
          Hide
          Markus KARG added a comment -

          I tried out plexus-archiver 2.4.4 with -Dfile.encoding=UTF-8, but German umlauts still are screwed in ZIP entry names.

          Show
          Markus KARG added a comment - I tried out plexus-archiver 2.4.4 with -Dfile.encoding=UTF-8, but German umlauts still are screwed in ZIP entry names.
          Hide
          Markus KARG added a comment -

          Since today the project lead has changed, I'd like to ask the new project lead(s) to discuss this issue and post an answer on the following questions here:

          • Is some particular person known already planning a solution for this issue?
          • Is some particular problem known which prevents a solution to this issue?
          • Is it wanted for third party people to contribute a solution or do you want to provide a solution on your own?

          Depending on the answers I'd maybe chime in and start working on a solution. But not if another person already is working on this, and not if the problem is already known to be unsolveable.

          Show
          Markus KARG added a comment - Since today the project lead has changed, I'd like to ask the new project lead(s) to discuss this issue and post an answer on the following questions here: Is some particular person known already planning a solution for this issue? Is some particular problem known which prevents a solution to this issue? Is it wanted for third party people to contribute a solution or do you want to provide a solution on your own? Depending on the answers I'd maybe chime in and start working on a solution. But not if another person already is working on this, and not if the problem is already known to be unsolveable.
          Hide
          Dmitry Pelevin added a comment -

          Markus, please try to specify encoding for assembly plugin in the following way (also do not forget about "-Dfile.encoding=UTF-8"):

          <plugin>
              <artifactId>maven-assembly-plugin</artifactId>
              <version>2.4</version>
              <executions>
                  <execution>
                      <id>make-assembly</id>
                      <phase>package</phase>
                      <goals>
                          <goal>single</goal>
                      </goals>
                      <configuration>
                          <descriptors>
                              <descriptor>src/main/assembly/package-sources.xml</descriptor>
                              <descriptor>src/main/assembly/package.xml</descriptor>
                          </descriptors>
                          <archiverConfig>
                              <encoding>CP866</encoding>
                          </archiverConfig>
                      </configuration>
                  </execution>
              </executions>
              <dependencies>
                  <dependency>
                      <groupId>org.codehaus.plexus</groupId>
                      <artifactId>plexus-archiver</artifactId>
                      <version>2.4.4</version>
                      <exclusions>
                          <exclusion>
                              <groupId>org.codehaus.plexus</groupId>
                              <artifactId>plexus-component-api</artifactId>
                          </exclusion>
                      </exclusions>
                  </dependency>
              </dependencies>
          </plugin>
          

          Let me explain why this is required. ZIP is old enought format and initialli it did not know anything about Unicode encodings. Later ZIP specification was extended and support for Unicode and LEF flag was added. The problem is that not all software understand LEF flag. I can surprise you Windows exlorer still do not understand LEF flag. As I know - there is no problems with LEF at linux and MacOS. Windows is still read ZIP assuming that it contain file names encoded with non-Unicode encoding, using MSDOS encoding corresponding to user locale. So if I want to be able to see cyrilyc file names in ZIP files at Windows I need to pack ZIP using CP866 encoding. Not sure exactly, but I think you need to use CP850 encoding to be able to see file names with symbols from german abc.

          In the case if you set only "-Dfile.encoding=UTF-8" flag without specifying encoding as in my example - you will be able to see "German umlauts" using archiver which understand LEF flag 7-Zip is one of such archivers.

          I think this will solve all your problems with encoding as it did for me.

          Show
          Dmitry Pelevin added a comment - Markus, please try to specify encoding for assembly plugin in the following way (also do not forget about "-Dfile.encoding=UTF-8"): <plugin> <artifactId> maven-assembly-plugin </artifactId> <version> 2.4 </version> <executions> <execution> <id> make-assembly </id> <phase> package </phase> <goals> <goal> single </goal> </goals> <configuration> <descriptors> <descriptor> src/main/assembly/package-sources.xml </descriptor> <descriptor> src/main/assembly/package.xml </descriptor> </descriptors> <archiverConfig> <encoding> CP866 </encoding> </archiverConfig> </configuration> </execution> </executions> <dependencies> <dependency> <groupId> org.codehaus.plexus </groupId> <artifactId> plexus-archiver </artifactId> <version> 2.4.4 </version> <exclusions> <exclusion> <groupId> org.codehaus.plexus </groupId> <artifactId> plexus-component-api </artifactId> </exclusion> </exclusions> </dependency> </dependencies> </plugin> Let me explain why this is required. ZIP is old enought format and initialli it did not know anything about Unicode encodings. Later ZIP specification was extended and support for Unicode and LEF flag was added. The problem is that not all software understand LEF flag. I can surprise you Windows exlorer still do not understand LEF flag. As I know - there is no problems with LEF at linux and MacOS. Windows is still read ZIP assuming that it contain file names encoded with non-Unicode encoding, using MSDOS encoding corresponding to user locale. So if I want to be able to see cyrilyc file names in ZIP files at Windows I need to pack ZIP using CP866 encoding. Not sure exactly, but I think you need to use CP850 encoding to be able to see file names with symbols from german abc. In the case if you set only "-Dfile.encoding=UTF-8" flag without specifying encoding as in my example - you will be able to see "German umlauts" using archiver which understand LEF flag 7-Zip is one of such archivers. I think this will solve all your problems with encoding as it did for me.
          Hide
          Markus KARG added a comment -

          Thank you for this kind explanation. In fact, all works like a charm using plexus-archiver-2.4.4, file.encoding, and CP850, even in the Windows Explorer. And yes, it is actually true that 7zip shows Umlauts even when using UTF-8 instead of CP850.

          On the other hand I do not understand why the projects leads not simply add that needed dependency to plexus-archiver:2.4.4 by default and publish that fix as maven.assembly-plugin:2.4.1? Together with a short note on the plugin's web site that support for Umlauts etc. now is available by simply providing file.properties and <encoding>CP850</encoding> that would be exactly the fix everybody wants to have!?

          Regards
          -Markus

          Show
          Markus KARG added a comment - Thank you for this kind explanation. In fact, all works like a charm using plexus-archiver-2.4.4, file.encoding, and CP850, even in the Windows Explorer. And yes, it is actually true that 7zip shows Umlauts even when using UTF-8 instead of CP850. On the other hand I do not understand why the projects leads not simply add that needed dependency to plexus-archiver:2.4.4 by default and publish that fix as maven.assembly-plugin:2.4.1? Together with a short note on the plugin's web site that support for Umlauts etc. now is available by simply providing file.properties and <encoding>CP850</encoding> that would be exactly the fix everybody wants to have!? Regards -Markus
          Hide
          Dmitry Pelevin added a comment - - edited

          I am agree with you that it would nice if they release new version with plexus-arhciver-2.4.4. But it could be even better if they also accept fix for TAR archiver to let user specify encoding: https://github.com/sonatype/plexus-archiver/pull/11 - like it can be done for ZIP.

          Show
          Dmitry Pelevin added a comment - - edited I am agree with you that it would nice if they release new version with plexus-arhciver-2.4.4. But it could be even better if they also accept fix for TAR archiver to let user specify encoding: https://github.com/sonatype/plexus-archiver/pull/11 - like it can be done for ZIP.
          Kristian Rosenvold made changes -
          Field Original Value New Value
          Fix Version/s 2.5 [ 18952 ]
          Kristian Rosenvold made changes -
          Link This issue depends upon PLXCOMP-253 [ PLXCOMP-253 ]
          Kristian Rosenvold made changes -
          Resolution Fixed [ 1 ]
          Assignee Kristian Rosenvold [ krosenvold ]
          Status Open [ 1 ] Closed [ 6 ]

            People

            • Assignee:
              Kristian Rosenvold
              Reporter:
              Eros Sy
            • Votes:
              3 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: