Skip to content

Unmarshaled symbol has the wrong encoding #1329

@DavidEGrayson

Description

@DavidEGrayson

JRuby behaves differently than MRI when it is unmarshaling symbols. The symbol always seems to have the US-ASCII encoding, even if it has special unicode characters in it.

To reproduce this, I needed two separate scripts. (It seems that the state of JRuby's symbol table affects how Marshal.load behaves.)

In test1.rb, I have:

# coding: UTF-8
mu = 'µ'.to_sym
File.open('mu.dat', 'wb') { |f| f.write(Marshal.dump(mu)) }

In test2.rb, I have:

dump = File.open('mu.dat', 'rb') { |f| f.read }
p dump.bytes.to_a
mu = Marshal.load(dump)
puts mu.to_s.encoding

Here is the output I get from running these scripts, and also information about the versions of Ruby I am using:

$ jruby -v && jruby test1.rb && jruby test2.rb
jruby 1.7.9 (1.9.3p392) 2013-12-06 87b108a on Java HotSpot(TM) 64-Bit Server VM
1.7.0_07-b10 [Windows 8-amd64]
[4, 8, 73, 58, 7, 194, 181, 6, 58, 6, 69, 84]
US-ASCII
$ ruby -v && ruby test1.rb && ruby test2.rb
ruby 2.0.0p0 (2013-02-24) [x64-mingw32]
[4, 8, 73, 58, 7, 194, 181, 6, 58, 6, 69, 84]
UTF-8

From this we can see that both JRuby and MRI are marshaling the data in the same way, but when JRuby unmarshals it, it is setting the encoding to US-ASCII instead of UTF-8.

This issue came up because I am trying to use YARD to generate documentation for JRuby code that has special characters in a few method alias names. When I run "yard doc", the data about those methods is marshaled and written to the disk, and when I run "yard server --reload" it gets unmarshaled badly.

One workaround for this issue is to create a symbol with the proper encoding before running Marshal.load.

Sorry if this is a duplicate. This could be related to issue with symbol literal encoding that I just reported, #1328. I also see there is another open issue about method that is probably related to symbol encoding: #914.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions