Skip to content

Conversation

@chaokunyang
Copy link
Collaborator

What does this PR do?

add strip flag in meta string encoding spec

Related issues

#1540

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

@chaokunyang
Copy link
Collaborator Author

CI failure seems related to actions/setup-python#850

@chaokunyang chaokunyang force-pushed the add_strp_flag_in_meta_string branch from 049b1c0 to b67ad54 Compare April 24, 2024 11:47
Copy link
Member

@theweipeng theweipeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qingoba
Copy link
Contributor

qingoba commented Apr 26, 2024

In MetaStringEncoder.java, we use 0x08:

boolean stripLastChar = bytes.length * 8 >= totalBits + bitsPerChar;
if (stripLastChar) {
bytes[0] = (byte) (bytes[0] | 0x80);
}

In MetaString.java, we use 0b1:

if (encoding != Encoding.UTF_8) {
  Preconditions.checkArgument(bytes.length > 0);
  this.stripLastChar = (bytes[0] & 0b1) != 0;
} else {
  this.stripLastChar = false;
}

These two are not consistent.

@LiangliangSui
Copy link
Contributor

In MetaStringEncoder.java, we use 0x08:

boolean stripLastChar = bytes.length * 8 >= totalBits + bitsPerChar;
if (stripLastChar) {
bytes[0] = (byte) (bytes[0] | 0x80);
}

In MetaString.java, we use 0b1:

if (encoding != Encoding.UTF_8) {
  Preconditions.checkArgument(bytes.length > 0);
  this.stripLastChar = (bytes[0] & 0b1) != 0;
} else {
  this.stripLastChar = false;
}

These two are not consistent.

Thanks for pointing out this issue, this is indeed a bug and our unit tests don't cover it.

0x80 should also be used in MetaString.

if (encoding != Encoding.UTF_8) {
  Preconditions.checkArgument(bytes.length > 0);
  this.stripLastChar = (bytes[0] & 0x80) != 0;
} else {
  this.stripLastChar = false;
}

Do you want to submit a PR to fix this issue and add unit tests? @qingoba

@qingoba
Copy link
Contributor

qingoba commented Apr 26, 2024

Sure, I will do this after I finish #1566

@chaokunyang
Copy link
Collaborator Author

In MetaStringEncoder.java, we use 0x08:

boolean stripLastChar = bytes.length * 8 >= totalBits + bitsPerChar;
if (stripLastChar) {
bytes[0] = (byte) (bytes[0] | 0x80);
}

In MetaString.java, we use 0b1:

if (encoding != Encoding.UTF_8) {
  Preconditions.checkArgument(bytes.length > 0);
  this.stripLastChar = (bytes[0] & 0b1) != 0;
} else {
  this.stripLastChar = false;
}

These two are not consistent.

The MetaStringDecoder parse the flag by itself, and didn't use this flag. Currently the org.apache.fury.meta.MetaString#stripLastChar is not called by other classes. So the fury serializaiton ut still succceed.

@qingoba
Copy link
Contributor

qingoba commented Apr 26, 2024

There is another question.
If input string is empty, encoded bytes is also empty:

public MetaString encode(String input, Encoding encoding) {
  Preconditions.checkArgument(
      input.length() < Short.MAX_VALUE, "Long meta string than 32767 is not allowed");
  if (input.isEmpty()) {
    return new MetaString(input, Encoding.UTF_8, specialChar1, specialChar2, new byte[0]);
  }
}

However, decode procedure doesn't check empty. It will access bytes[0] directly:

private String decodeLowerSpecial(byte[] data) {
  StringBuilder decoded = new StringBuilder();
  int totalBits = data.length * 8; // Total number of bits in the data
  boolean stripLastChar = (data[0] & 0x80) != 0; // Check the first bit of the first byte
}

So decode procedure should check empty.

@chaokunyang
Copy link
Collaborator Author

There is another question. If input string is empty, encoded bytes is also empty:

public MetaString encode(String input, Encoding encoding) {
  Preconditions.checkArgument(
      input.length() < Short.MAX_VALUE, "Long meta string than 32767 is not allowed");
  if (input.isEmpty()) {
    return new MetaString(input, Encoding.UTF_8, specialChar1, specialChar2, new byte[0]);
  }
}

However, decode procedure doesn't check empty. It will access bytes[0] directly:

private String decodeLowerSpecial(byte[] data) {
  StringBuilder decoded = new StringBuilder();
  int totalBits = data.length * 8; // Total number of bits in the data
  boolean stripLastChar = (data[0] & 0x80) != 0; // Check the first bit of the first byte
}

So decode procedure should check empty.

Hi @qingoba , do you have time to fix this today? We plan to start release fury v0.5.0 today and include this fix. If you don't have time, I can take over it and submit a fix later

@qingoba
Copy link
Contributor

qingoba commented Apr 27, 2024

I can fix this now.

chaokunyang pushed a commit that referenced this pull request Apr 27, 2024
…1587)

## What does this PR do?
Fix two bugs:
+ fix method `MetaString.stripLastChar()` logical error
+ add empty encoded bytes check in `MetaStringDecoder.decode()`


## Related issues
- #1565 


## Does this PR introduce any user-facing change?
- [No] Does this PR introduce any public API change?
- [No] Does this PR introduce any binary protocol compatibility change?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants