contrib: refactor: dedup deserialization routines in utxo-to-sqlite script #32116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

theStack wants to merge 1 commit into bitcoin:master from theStack:202503-contrib-utxo_to_sqlite-dedup_deser_functions

+10 −49

Contributor

theStack commented Mar 21, 2025

The compact-size/varint/amount deserialization routines already exist in the functional test framework (with the advantage of having proper unit tests executed in CI), so we can just import them from there instead of leaving duplicated code in the repository.


          contrib: dedup deserialization routines in utxo-to-sqlite script

f535b2f

These routines already exist in the functional test framework
(with the advantage of having proper unit tests executed in CI), so we
can just import them from there instead of leaving duplicated code
in the repository.

Contributor

DrahtBot commented Mar 21, 2025 •

edited

Loading

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32116.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	rkrux, ajtowns
Concept ACK	l0rinc, yancyribbens

If your review is incorrectly listed, please copy-paste  into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#32621 (contrib: utxo_to_sqlite.py: add option to store txid/spk as BLOBs by theStack)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot added the Scripts and tools label

rkrux reviewed

View reviewed changes

Contributor

rkrux left a comment

utACK f535b2f

I did realise there'd be some duplication when I previously supported adding and using these deserialisation functions in #31907 (comment), good that you found a way to get rid of the recently introduced duplication.

l0rinc suggested changes

View reviewed changes

Contributor

l0rinc left a comment

Concept ACK - I think we can do a bit more cleanup, though

contrib/utxo-tools/utxo_to_sqlite.py

    
              def decompress_script(f):

                  """Equivalent of `DecompressScript()` (see compressor module)."""

                  size = read_varint(f)  # sizes 0-5 encode compressed script types

                  size = deser_varint(f)  # sizes 0-5 encode compressed script types

Contributor

l0rinc Mar 24, 2025

👍, read_varint and deser_varint implementations match exactly (except for description, which we may move over there)

contrib/utxo-tools/utxo_to_sqlite.py

    
                          prevout_hash = f.read(32)[::-1].hex()

                          coins_per_hash_left = read_compactsize(f)

                      prevout_index = read_compactsize(f)

                          coins_per_hash_left = deser_compact_size(f)

Contributor

l0rinc Mar 24, 2025

Seems to me deser_compact_size does and extra little endian conversion - for a single byte, which seems redundant, we might as well remove it from there to match this implementation

contrib/utxo-tools/utxo_to_sqlite.py

    
              UTXO_DUMP_MAGIC = b'utxo\xff'

              UTXO_DUMP_VERSION = 2

Contributor

l0rinc Mar 24, 2025

Could we import the MAGIC_BYTES from test_framework.messages as well (we might have to extend it with Testnet4)?

Contributor Author

theStack Mar 27, 2025

Good find, will do if we end up still following the current approach of the PR (now less likely in light of the discussion triggered by #32116 (comment)).

contrib/utxo-tools/utxo_to_sqlite.py

    
                      height = code >> 1

                      is_coinbase = code & 1

                      amount = decompress_amount(read_varint(f))

                      amount = decompress_amount(deser_varint(f))

Contributor

l0rinc Mar 24, 2025

👍, decompress_amount implementations match exactly

contrib/utxo-tools/utxo_to_sqlite.py

    
              import sys

              import time

              sys.path.insert(0, os.path.join(os.path.dirname(__file__), '../../test/functional'))

Contributor

l0rinc Mar 24, 2025

this is a bit hacky, but I don't mind if we can dedup

Contributor

rkrux Mar 24, 2025

+1, I felt the same but I feel it's worth getting rid of duplication.

Contributor

l0rinc Mar 24, 2025

Though this means this utility file isn't as self-contained anymore (i.e. you can't just copy it out of the project - not even sure that was possible before)

Contributor

yancyribbens commented Mar 25, 2025

Concept ACK

Member

laanwj commented Mar 26, 2025

Concept ~0. i understand the reasoning but not sure whether this is worth it. This isn't a lot of code, and it's very well-defined code that isn't likely to change over time. Making the script depend on the test framework means that the test framework needs to maintain an interface for external use, something we don't want to encourage in general as it limits flexibility to change. Also this use itself isn't tested. If the functions move, the script breaks.

with the advantage of having proper unit tests executed in CI

My preferred solution there would be to add a CI test for this script instead.

Contributor

rkrux commented Mar 26, 2025 •

edited

Loading

#32116 (comment) contains good points to consider.

This isn't a lot of code, and it's very well-defined code that isn't likely to change over time. Making the script depend on the test framework means that the test framework needs to maintain an interface for external use, something we don't want to encourage in general as it limits flexibility to change.

Agree, it's not a lot of code and is unlikely to change over time. Also, agree that we don't want this to be encouraged in general.

If the functions move, the script breaks.

Some brittleness is indeed introduced. Given other points, I'd ask how likely it is for these utility functions to move in the functional test framework?

My preferred solution there would be to add a CI test for this script instead.

This script is a part of the CI tests via the functional tests here:

bitcoin/test/functional/tool_utxo_to_sqlite.py

Lines 105 to 108 in c0b7159

    
           base_dir = self.config["environment"]["SRCDIR"] 
        
           utxo_to_sqlite_path = os.path.join(base_dir, "contrib", "utxo-tools", "utxo_to_sqlite.py") 
        
           subprocess.run([sys.executable, utxo_to_sqlite_path, input_filename, output_filename], 
        
                          check=True, stderr=subprocess.STDOUT)

Contributor

rkrux commented Mar 26, 2025

One reason I find deduplicating here appealing is that these (de)compress/deserialize utility functions have quite a custom implementation with lack of comments/explanation. My preference would be to have them present once only instead of seeing the implementation in multiple places in the repo.

Member

laanwj commented Mar 26, 2025 •

edited

Loading

This script is a part of the CI tests via the functional tests here:

OK, good!

My preference would be to have them present once only instead of seeing the implementation in multiple places in the repo.

Right. A cleaner approach to deduplication would be introduce a python utility module that's used by both the test framework and the tools (but is in a third location), this makes it clear that there's an interface at least.

Contributor Author

theStack commented Mar 27, 2025

@laanwj: Fair points! My reasoning to consider this (somewhat hacky) "import code from test framework" approach as acceptable was the fact that we already do it in other contrib scripts as well (e.g. the signet miner), but yeah, that doesn't necessarily mean it's a very good idea.

My preference would be to have them present once only instead of seeing the implementation in multiple places in the repo.

Right. A cleaner approach to deduplication would be introduce a python utility module that's used by both the test framework and the tools (but is in a third location), this makes it clear that there's an interface at least.

Agree. Will give this a few more thoughts on how this could look like. Related discussion: #27432 (comment) #27432 (comment)

Member

laanwj commented Mar 27, 2025

Fair points! My reasoning to consider this (somewhat hacky) "import code from test framework" approach as acceptable was the fact that we already do it in other contrib scripts as well (e.g. the signet miner), but yeah, that doesn't necessarily mean it's a very good idea.

Yes, to be fair i've used that hack plenty of times in my own code outside of bitcoin core, the test framework can be very useful to import to do low-level stuff with bitcoin from Python.

It would be quite some work to separate "this is purely testing code and depending on it is dangerously brittle" parts from "this is useful for tooling", but useful for clarity. i don't insist it needs to be done in this PR, i'm not against merging this as-is.

Agree. Will give this a few more thoughts on how this could look like.

In the past we've also avoided it because calling it anything else than "functional testing framework" may make other projects decide to depend on it, limiting the flexibility for change required for our tests. But we can be clear that it's meant for internal use only, like pretty much everything under contrib/, so placing it under there may be a good choice.

Also coming up with a convenient name could be a challenge, everything in the combinatorial namespace of "bitcoin" "lib" "python" is very likely already used 😄

Contributor

davidgumberg commented Apr 17, 2025

Another interesting user of the functional test framework is the warnet tool: https://github.com/bitcoin-dev-project/warnet.

I suspect one challenge/tradeoff with an additional common framework/interface is that this would increase maintenance burden today with only a hypothetical future benefit, the utility framework might languish featureless and unmaintained and/or duplicating things more first-class external interfaces like the kernel do, and doing things the other way around trades a future hypothetical¹ maintenance burden for zero cost today.

The other problem is that I think part of the reason the test framework is so useful for doing things in general is that the functional tests set up incentives nicely for the framework to be versatile, useful, battle-tested, and get loads of review since basically every single Bitcoin Core contributor is a contributor to the test framework.

Maybe this is my naivety showing, but why couldn't Bitcoin Core just say, "Internal use only, do not use the library, if you do you are on your own to follow us, we won't apologize if we break you, or you lose funds, and we won't accept any bug reports or pull requests related to external users of this interface." and enforce that internal users of the framework like the /contrib scripts have CI checks so they don't get accidentally broken

Likely, but not certain. ↩

DrahtBot added CI failed and removed CI failed labels

DrahtBot mentioned this pull request

contrib: utxo_to_sqlite.py: add option to store txid/spk as BLOBs #32621

Open

achow101 requested review from Eunovo and ajtowns

October 22, 2025 14:40

ajtowns removed their request for review

October 30, 2025 15:43

Contributor

ajtowns commented Oct 30, 2025

Right. A cleaner approach to deduplication would be introduce a python utility module that's used by both the test framework and the tools (but is in a third location), this makes it clear that there's an interface at least.

Big agree on this, eager to review if someone wants to do it. https://github.com/jamesob/verystable or https://pypi.org/project/verystable/ is somewhat like this, except external. It would also be nice to export it onto pypi (or whatever) for easy quick-and-dirty python/bitcoin scripting.

Tested ACK f535b2f ; this PR looks good to me.

DrahtBot requested a review from l0rinc

October 30, 2025 15:45

Member

achow101 commented Jan 14, 2026

I'm unconvinced that contrib scripts should be importing anything from the test framework. It seems like it makes the scripts less useful if they are not easily portable.

Contributor Author

theStack commented Jan 16, 2026

Closing as per #32116 (comment) and #32116 (comment).

Still not a fan of code duplication, but I agree it's not a lot and that keeping this script self-contained arguably has merit.

Right. A cleaner approach to deduplication would be introduce a python utility module that's used by both the test framework and the tools (but is in a third location), this makes it clear that there's an interface at least.

Big agree on this, eager to review if someone wants to do it. https://github.com/jamesob/verystable or https://pypi.org/project/verystable/ is somewhat like this, except external. It would also be nice to export it onto pypi (or whatever) for easy quick-and-dirty python/bitcoin scripting.

+1 as well, feel free to ping me if someone wants to tackle that.

theStack closed this

l0rinc mentioned this pull request

Add utxo_to_csv.py tool #34324

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scripts and tools