While investigating fixes for #8682 I discovered that rbByteEncode does not properly handle the case of source and destination encoding being the same.
As described in #8682 (comment):
IOOutputStream calls RubyIO.write and passes only the incoming bytes, offsets, and encoding:
|
@Override |
|
public void write(final byte[] b,final int off, final int len) throws IOException { |
|
ThreadContext context = runtime.getCurrentContext(); |
|
|
|
RubyIO realIO = this.realIO; |
|
if (realIO != null) { |
|
realIO.write(context, b, off, len, encoding); |
|
} else { |
|
IRubyObject io = this.io; |
|
writeAdapter.call(context, io, io, RubyString.newStringLight(runtime, new ByteList(b, off, len, encoding, false))); |
|
} |
|
} |
This path was added as an optimization in 236f7ba to avoid constructing string objects just to immediately unwrap them for IO writes.
Unfortunately it hits a bug eventually in EncodingUtils.rbByteEncode where two identical encodings will not no-op, but instead will trigger an encoding error because it rejects encoding from and to the same encoding.
The short-circuit checks here:
|
if (encoding.isAsciiCompatible() && to.isAsciiCompatible()) { |
|
if (cr == StringSupport.CR_7BIT) { |
|
return null; |
|
} |
|
} else if (encodingEqual(sname, dname)) { |
|
return null; |
|
} |
...were never updated when the sister logic for RubyString was updated in 621369c. The correct logic, with additional refactoring, looks like this:
|
if (senc != null && senc == denc) { |
|
return strTranscodeScrub(context, forceEncoding, str, ecflags, ecopts, result, explicitlyInvalidReplace, denc, senc); |
|
} else if (is7BitCompat(str, denc, senc)) { |
|
return result.apply(context, str, denc, str); |
|
} else if (encodingEqual(sname, dname)) { |
|
if (forceEncoding.isNil()) denc = null; |
|
return result.apply(context, str, denc, str); |
|
} |
Ignoring the problems with IOOutputStream only being able to specify a single encoding for all String writes, we need to fix this issue in rbByteEncode and ensure same-encoding calls no-op the same way as rbStrEncode.
While investigating fixes for #8682 I discovered that rbByteEncode does not properly handle the case of source and destination encoding being the same.
As described in #8682 (comment):
IOOutputStream calls RubyIO.write and passes only the incoming bytes, offsets, and encoding:
jruby/core/src/main/java/org/jruby/util/IOOutputStream.java
Lines 152 to 163 in 14ec399
This path was added as an optimization in 236f7ba to avoid constructing string objects just to immediately unwrap them for IO writes.
Unfortunately it hits a bug eventually in EncodingUtils.rbByteEncode where two identical encodings will not no-op, but instead will trigger an encoding error because it rejects encoding from and to the same encoding.
The short-circuit checks here:
jruby/core/src/main/java/org/jruby/util/io/EncodingUtils.java
Lines 983 to 989 in 844c151
...were never updated when the sister logic for RubyString was updated in 621369c. The correct logic, with additional refactoring, looks like this:
jruby/core/src/main/java/org/jruby/util/io/EncodingUtils.java
Lines 1094 to 1101 in 844c151
Ignoring the problems with IOOutputStream only being able to specify a single encoding for all String writes, we need to fix this issue in rbByteEncode and ensure same-encoding calls no-op the same way as rbStrEncode.