Skip to content

Conversation

@gaodayue
Copy link
Contributor

@gaodayue gaodayue commented Sep 6, 2019

The following error is encountered when testing V2 table.

Page checksum mismatch, actual=3372108349 vs expect=3375987274

The reasons are that HashUtil::crc_hash doesn't generate standard CRC-32 checksum and produces wrong checksum on vector of Slice.

I have changed V2 to use CRC-32C checksum, which is the de factor file checksum algorithm nowadays that is being used in storage systems like RocksDB, Ceph, etc.

@imay
Copy link
Contributor

imay commented Sep 6, 2019

@gaodayue

I have two questions about this PR.

  1. Even if HashUtils::crc_hash doesn't generate standard CRC-32 checksum, the checksums should be equal. Because all the checksums are compute by this function.
  2. we have already had a CRC-32 implementation in be/src/olap/utils.cpp. Can you unify this and your new implementation?

@gaodayue
Copy link
Contributor Author

gaodayue commented Sep 6, 2019

@imay comments inline

Even if HashUtils::crc_hash doesn't generate standard CRC-32 checksum, the checksums should be equal. Because all the checksums are compute by this function.

The real problem is that the result of crc_hash can't be used as the input of the next crc_hash call. That is, crc_hash({"hello ", "world"}, 0) != crc_hash({"hello"}, 0).

we have already had a CRC-32 implementation in be/src/olap/utils.cpp. Can you unify this and your new implementation?

olap_adler32 and olap_crc32 are both used in V1 segment, which we can't change because of compatibility issue. I think they can be removed when we're removing V1 related code.

@imay
Copy link
Contributor

imay commented Sep 6, 2019

@imay comments inline

Even if HashUtils::crc_hash doesn't generate standard CRC-32 checksum, the checksums should be equal. Because all the checksums are compute by this function.

The real problem is that the result of crc_hash can't be used as the input of the next crc_hash call. That is, crc_hash({"hello ", "world"}, 0) != crc_hash({"hello"}, 0).

we have already had a CRC-32 implementation in be/src/olap/utils.cpp. Can you unify this and your new implementation?

olap_adler32 and olap_crc32 are both used in V1 segment, which we can't change because of compatibility issue. I think they can be removed when we're removing V1 related code.

OK, we can do it later.

Copy link
Contributor

@imay imay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@imay imay merged commit 65dcabf into apache:master Sep 6, 2019
swjtu-zhanglei pushed a commit to swjtu-zhanglei/incubator-doris that referenced this pull request Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants