Skip to content

Conversation

@fjahr
Copy link
Contributor

@fjahr fjahr commented Nov 15, 2025

This is a second slice carved out of #28792. It contains the following changes that are crucial for the embedding of asmap data which is added the following PR in the series (probably this will remain in #28792).

The changes are:

  • Modernizes and simplifies the asmap code by operating on std::byte instead of bits
  • Unifies asmap version calculation and naming (previously it was called version and checksum interchangeably)
  • Operate on a span rather than a vector in the asmap internal to prevent holding the asmap data in memory twice
  • Add more extensive documentation to the asmap implementation
  • Unify asmap casing in implemetation function names

The first three commits were already part of #28792, the others are new.

The documentation commit came out of feedback gathered at the latest CoreDev. The primary input for the documentation was the documentation that already existed in the Python implementation (contrib/asmap/asmap.py) but there are several other comments as well. Please note: I have also asked several LLMs to provide suggestions on how to explain pieces of the implementation and better demonstrate how the parts work together. I have copied bits and pieces that I liked but everything has been edited further by me and obviously all mistakes here are my own.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 15, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33878.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK hodlinator, sedited

If your review is incorrectly listed, please copy-paste <!--meta-tag:bot-skip--> into the comment that the bot should ignore.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #34302 (fuzz: Restore SendMessages coverage in process_message(s) fuzz targets by maflcko)
  • #28792 (build: Embedded ASMap [3/3]: Build binary dump header file by fjahr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

  • " # not, failure (represented by 0) is returned. If a default instruction has been executed before, instead" -> " # not, failure (represented by 0) is returned. If a default instruction has been executed before, instead of failure the default instruction's argument is returned." [The added comment line ends with "instead" leaving the sentence incomplete and therefore unclear; completing it restores the intended meaning.]

2025-12-31

@fjahr
Copy link
Contributor Author

fjahr commented Nov 15, 2025

@sipa would love to get your eyes on the documentation commit before anyone else gets confused by the mistakes I might have made :)

@@ -21,140 +21,235 @@
#include <utility>
#include <vector>

/*
* ASMap (Autonomous System Map) Implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the documentation changes in this commit look correct to me.

One thing that may be worth mentioning early on is that it's a bit-packed format, i.e., the entire compressed mapping is treated as a sequence of bits, packed together at 8 per byte, which are concatenated encodings of instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick feedback! Added a small paragraph on the bit-packed format in the introduction comment at the top.

@fjahr
Copy link
Contributor Author

fjahr commented Nov 16, 2025

All the feedback from #28792 that applied to the commits here has been addressed here now.

@fjahr fjahr force-pushed the 2025-11-asmap-slice-2 branch 2 times, most recently from 7a2bf8e to eaf5e12 Compare November 16, 2025 22:54
@laanwj
Copy link
Member

laanwj commented Nov 17, 2025

The first three PRs were already part of #28792, the others are new.

i'm sure you mean the first three commits? 😄

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed eaf5e12

Thanks for peeling off this PR as I suggested!

@fjahr
Copy link
Contributor Author

fjahr commented Nov 19, 2025

The first three PRs were already part of #28792, the others are new.

i'm sure you mean the first three commits? 😄

Yepp, fixed :)

@fjahr fjahr force-pushed the 2025-11-asmap-slice-2 branch 2 times, most recently from 7de7c21 to 3011c84 Compare November 19, 2025 15:20
@fjahr
Copy link
Contributor Author

fjahr commented Nov 19, 2025

Rebased since #33026 was merged 🥳 and addressed all comments. Thanks for the reviews!

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed b41f5a29f7d56da8fd157770ad29b5776c3684c6

Sorry for not finding these earlier, getting closer to A-C-K.

@fjahr
Copy link
Contributor Author

fjahr commented Nov 24, 2025

Addressed the latest review comments from @hodlinator , thanks for the detailed patch suggestions and rationale, co-authorship added!

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 825d677

Operating on std::span<std::byte> rather than std::vector<bool>& enables for efficient use of asmap data embedded in the binary (#28792).

I have thoroughly reviewed and contributed patches for this refactoring up to the point of having a conflict of interest in getting it merged. :)

To build more confidence of the bool -> byte transition, I developed a C++ test with more readable asmap input data - cf34bb9 (https://github.com/hodlinator/bitcoin/tree/pr/33878_suggestions2 for interleaved branch). Not sure whether it's worth including in some form.

@fjahr
Copy link
Contributor Author

fjahr commented Nov 25, 2025

To build more confidence of the bool -> byte transition, I developed a C++ test with more readable asmap input data - cf34bb9 (https://github.com/hodlinator/bitcoin/tree/pr/33878_suggestions2 for interleaved branch). Not sure whether it's worth including in some form.

Thanks, looks like a very nice visualization at least. I will consider including it and addressing your other comments until I have to retouch here. If that doesn't happen there are follow-ups already planned where these could be included. Thanks again for the reviews :)

* * number of elements
* * for each element: index in the serialized "all new addresses"
* * asmap checksum
* * asmap version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit b89ffbc

Is version really clearer? In the asmap-data repository the entries seem to be versioned by their unix timestamp, that seems clearer to me and also conveys a temporal ordering between data dumps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the version/checksum and timestamps are two different things. My goal here was just to not have two different names (version and checksum) for the same thing in the code base, for clarity, greppability etc. The timestamps are important outside of core because we do these coordinated launches in the asmap-data project and the timestamp ties the data back to one of these events. It might be that there are other means of getting an asmap in the future where people don't care so much about the exact timestamp and they don't care about it, but that's just theoretical. The version here is a hash, so a data integrity check and I think that is important to have even though we currently only print it. We currently don't have a concept to get the timestamp of a file into bitcoin core. We could add a header to the file similar to what we have done with the assumeutxo dumps but even then the timestamp could be lying. Other than assumeutxo we don't have anything to compare and check against either the hash or the timestamp and I guess that's ok, or rather, unavoidable.

Sorry for rambling a bit. At second thought, maybe your point was rather that we should use checksum instead of version? I think I could get behind that if we say the timestamp is the "version" and the "checksum" is the integrity check :)

* asmap data.
*/
const std::span<const std::byte> m_asmap;
std::vector<std::byte> m_loaded_asmap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit dc0c431

You mention that this dual ownership structure is done to avoid holding the asmap in memory twice. Where is this currently the case? Does that only happen in test code at the moment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was the case an earlier version of the code. I guess referencing something that isn't actually there anymore isn't something we should be doing in docs, even if it's implicitly. But in this case I think there should be some documentation in this regard so it clears up why we need m_asmap and m_loaded_asmap here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more asking why keep both of them at all. Seems a bit too complicated to me. Requiring the map to be moved in seems good enough, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next PR, where a default asmap data blob becomes part of the binary, we have a constant compile-time array for that default map. Loading it into an std::vector as well means doubling its memory usage (2x ~1.5 Mb). However, to support loading a custom asmap at startup, the vector (with ownership) is needed too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next PR, where a default asmap data blob becomes part of the binary, we have a constant compile-time array for that default map. Loading it into an std::vector as well means doubling its memory usage (2x ~1.5 Mb). However, to support loading a custom asmap at startup, the vector (with ownership) is needed too.

@fjahr fjahr force-pushed the 2025-11-asmap-slice-2 branch from 825d677 to 5d65d9d Compare December 31, 2025 16:33
@fjahr
Copy link
Contributor Author

fjahr commented Dec 31, 2025

Addressed comments by @hodlinator and @sedited

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Task test max 6 ancestor commits: https://github.com/bitcoin/bitcoin/actions/runs/20623068864/job/59228690078
LLM reason (✨ experimental): CI failed due to uncommitted changes in the git index blocking the rebase step after a successful build.

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

fjahr and others added 5 commits January 1, 2026 15:48
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Calculate the asmap version only in one place: A dedicated function in util/asmap.

The version was also referred to as asmap checksum in several places. To avoid confusion call it asmap version everywhere.
This prevents holding the asmap data in memory twice.

The version hash changes due to spans being serialized without their size-prefix (unlike vectors).
Also makes minor improvement on the python implementation documentation.
This aligns it more with SanityCheckAsmap and reduces variable scope.

Also unify asmap casing in SanityCheckAsmap function name.

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
@fjahr fjahr force-pushed the 2025-11-asmap-slice-2 branch from 5d65d9d to c5251f3 Compare January 1, 2026 14:50
@maflcko maflcko removed the CI failed label Jan 1, 2026
Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK c5251f3

Changes since my first ACK (#33878 (review)) are only minor adjustments based off that review and another (#33878 (review)):

  • Made NetGroupManager-ctor temporarily take r-value
  • Avoided span-references
  • Cleaned up whitespace across commits

Copy link
Contributor

@sedited sedited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c5251f3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants