Skip to content

Cannot make two symbols with same bytes and different encodings #1348

@DavidEGrayson

Description

@DavidEGrayson

I noticed that the symbol table in JRuby cannot seem to store two different symbols if the symbols happen to have the same bytes, even if the encodings are different.

Here is a script demonstrating the problem:

sym1 = "ab".force_encoding("UTF-16").to_sym
sym2 = "ab".to_sym
puts sym2.encoding

sym3 = "cd".to_sym
sym4 = "cd".force_encoding("UTF-16").to_sym
puts sym4.encoding

Here is the output from my shell demonstrating how MRI gets the encodings for sym2 and sym4 right and JRuby gets them wrong because of the pre-existing symbol in the symbol table:

$ ruby -v && ruby test_symbol_table.rb
ruby 2.0.0p0 (2013-02-24) [x64-mingw32]
US-ASCII
UTF-16
$ source use_jruby_179.sh
$ jruby -v && jruby test_symbol_table.rb
jruby 1.7.9 (1.9.3p392) 2013-12-06 87b108a on Java HotSpot(TM) 64-Bit Server VM
1.7.0_45-b18 [Windows 8-amd64]
UTF-16
US-ASCII

I think I will work on making a pull request to fix this; advice and objections are welcome.

I have made some progress on issue #1329 (properly setting the encoding of unmarshaled symbols), but before I can truly succeed in fixing that, I think I need to fix this and maybe a few other fundamental things about symbols.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions