Skip to content

Bug Fix+Portability: Improve portability by integral division#136

Merged
jonjoliver merged 1 commit intotrendmicro:masterfrom
a4lg:more-portable-ratios
Oct 8, 2024
Merged

Bug Fix+Portability: Improve portability by integral division#136
jonjoliver merged 1 commit intotrendmicro:masterfrom
a4lg:more-portable-ratios

Conversation

@a4lg
Copy link
Copy Markdown
Contributor

@a4lg a4lg commented Feb 20, 2024

This is a continuation after the commit 755f31e ("3.12.0") which removed the dependency to the float log.

Once we find quartile values in TLSH, we perform a float division to get q1/q3 and q2/q3 in percentage (and taken the mod 16) but lacks portability (and has a bug which the committer explains later).

Note: q1 <= q2 <= q3.

If we have a 32-bit divider, a 64-bit / 32-bit division is not that a big cost (it's theoretically one instruction in x86 because the DIV instruction performs a 64-bit / 32-bit division, not 32-bit / 32-bit).

It also fixes the problem of arithmetic overflow calculating unsigned 32-bit (q1*100) and (q2*100) (if they overflow, the ratio will be an unexpected value).

Note:
(q1*100) and (q2*100) in u32 is clearly a bug (in a normal sense) but I'm not sure whether you consider this a "specification" (as a developer of ssdeep, a fuzzy hash implementation, I acknowledge that we should be very aware of the compatibility). If this is considered a bug, I consider this is a good chance to fix this issue along with improving the portability / reproducibility.

Condition (this change may slightly change the hash value):
q2 >= 167773 (or: q2 > (1 << 24) / 100) (due to the rough "percentage" metric, it's unlikely to change the hash in practice but I found (q1 or q2, q3) pairs that would change the hash value)

Condition (the arithmetic overflow will occur):
q2 >= 42949673 (or: q2 > 0xffffffff / 100)

This is a continuation after the commit 755f31e ("3.12.0")
which removed the dependency to the float log.

Once we find quartile values in TLSH, we perform a float division to
get q1/q3 and q2/q3 in percentage (and taken the mod 16) but lacks
portability (and has a bug which the committer explains later).

If we have a 32-bit divider, a 64-bit / 32-bit division is not that a
big cost (it's theoretically one instruction in x86 because the DIV
instruction performs a 64-bit / 32-bit division).

It also fixes the problem of arithmetic overflow calculating
unsigned 32-bit (q1*100) and (q2*100) (if they overflow, the ratio
will be an unexpected value).
@a4lg a4lg force-pushed the more-portable-ratios branch from d8f8506 to eb6f8c5 Compare February 24, 2024 03:16
@jonjoliver
Copy link
Copy Markdown
Collaborator

jonjoliver commented Sep 18, 2024

Hi @a4lg

good spot! :-)
ignore my previous comment - that I just edited.
@merces has straightened out the way that outside collaborators can maintain TLSH! :-)

Cheers jono

@jonjoliver jonjoliver mentioned this pull request Sep 19, 2024
@merces merces added this to the 4.12.1 milestone Sep 19, 2024
@merces merces added the bug label Sep 19, 2024
@jonjoliver jonjoliver merged commit 809279e into trendmicro:master Oct 8, 2024
jonjoliver pushed a commit to jonjoliver/tlsh that referenced this pull request Oct 8, 2024
<PRE>
08/10/2024
        Merge pull request trendmicro#146 - Remove call to sprintf() to avoid warnings
        Merge pull request trendmicro#141 - py_ext: use PyVarObject_HEAD instead of PyObject_HEAD_INIT
        Merge pull request trendmicro#138 - Build: Define default options only on "default"
        Merge pull request trendmicro#136 - Bug Fix+Portability: Improve portability by integral division
</PRE>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants