-
Notifications
You must be signed in to change notification settings - Fork 38.7k
refactor, docs: Embedded ASMap [2/3]: Refactor asmap internals and add documentation #33878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33878. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
2025-12-31 |
|
@sipa would love to get your eyes on the documentation commit before anyone else gets confused by the mistakes I might have made :) |
| @@ -21,140 +21,235 @@ | |||
| #include <utility> | |||
| #include <vector> | |||
|
|
|||
| /* | |||
| * ASMap (Autonomous System Map) Implementation | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the documentation changes in this commit look correct to me.
One thing that may be worth mentioning early on is that it's a bit-packed format, i.e., the entire compressed mapping is treated as a sequence of bits, packed together at 8 per byte, which are concatenated encodings of instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick feedback! Added a small paragraph on the bit-packed format in the introduction comment at the top.
93205dd to
9da0160
Compare
|
All the feedback from #28792 that applied to the commits here has been addressed here now. |
7a2bf8e to
eaf5e12
Compare
i'm sure you mean the first three commits? 😄 |
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed eaf5e12
Thanks for peeling off this PR as I suggested!
Yepp, fixed :) |
7de7c21 to
3011c84
Compare
|
Rebased since #33026 was merged 🥳 and addressed all comments. Thanks for the reviews! |
3011c84 to
b41f5a2
Compare
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed b41f5a29f7d56da8fd157770ad29b5776c3684c6
Sorry for not finding these earlier, getting closer to A-C-K.
b41f5a2 to
825d677
Compare
|
Addressed the latest review comments from @hodlinator , thanks for the detailed patch suggestions and rationale, co-authorship added! |
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 825d677
Operating on std::span<std::byte> rather than std::vector<bool>& enables for efficient use of asmap data embedded in the binary (#28792).
I have thoroughly reviewed and contributed patches for this refactoring up to the point of having a conflict of interest in getting it merged. :)
To build more confidence of the bool -> byte transition, I developed a C++ test with more readable asmap input data - cf34bb9 (https://github.com/hodlinator/bitcoin/tree/pr/33878_suggestions2 for interleaved branch). Not sure whether it's worth including in some form.
Thanks, looks like a very nice visualization at least. I will consider including it and addressing your other comments until I have to retouch here. If that doesn't happen there are follow-ups already planned where these could be included. Thanks again for the reviews :) |
| * * number of elements | ||
| * * for each element: index in the serialized "all new addresses" | ||
| * * asmap checksum | ||
| * * asmap version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit b89ffbc
Is version really clearer? In the asmap-data repository the entries seem to be versioned by their unix timestamp, that seems clearer to me and also conveys a temporal ordering between data dumps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the version/checksum and timestamps are two different things. My goal here was just to not have two different names (version and checksum) for the same thing in the code base, for clarity, greppability etc. The timestamps are important outside of core because we do these coordinated launches in the asmap-data project and the timestamp ties the data back to one of these events. It might be that there are other means of getting an asmap in the future where people don't care so much about the exact timestamp and they don't care about it, but that's just theoretical. The version here is a hash, so a data integrity check and I think that is important to have even though we currently only print it. We currently don't have a concept to get the timestamp of a file into bitcoin core. We could add a header to the file similar to what we have done with the assumeutxo dumps but even then the timestamp could be lying. Other than assumeutxo we don't have anything to compare and check against either the hash or the timestamp and I guess that's ok, or rather, unavoidable.
Sorry for rambling a bit. At second thought, maybe your point was rather that we should use checksum instead of version? I think I could get behind that if we say the timestamp is the "version" and the "checksum" is the integrity check :)
| * asmap data. | ||
| */ | ||
| const std::span<const std::byte> m_asmap; | ||
| std::vector<std::byte> m_loaded_asmap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit dc0c431
You mention that this dual ownership structure is done to avoid holding the asmap in memory twice. Where is this currently the case? Does that only happen in test code at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the case an earlier version of the code. I guess referencing something that isn't actually there anymore isn't something we should be doing in docs, even if it's implicitly. But in this case I think there should be some documentation in this regard so it clears up why we need m_asmap and m_loaded_asmap here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm more asking why keep both of them at all. Seems a bit too complicated to me. Requiring the map to be moved in seems good enough, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the next PR, where a default asmap data blob becomes part of the binary, we have a constant compile-time array for that default map. Loading it into an std::vector as well means doubling its memory usage (2x ~1.5 Mb). However, to support loading a custom asmap at startup, the vector (with ownership) is needed too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the next PR, where a default asmap data blob becomes part of the binary, we have a constant compile-time array for that default map. Loading it into an std::vector as well means doubling its memory usage (2x ~1.5 Mb). However, to support loading a custom asmap at startup, the vector (with ownership) is needed too.
825d677 to
5d65d9d
Compare
|
Addressed comments by @hodlinator and @sedited |
|
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Calculate the asmap version only in one place: A dedicated function in util/asmap. The version was also referred to as asmap checksum in several places. To avoid confusion call it asmap version everywhere.
This prevents holding the asmap data in memory twice. The version hash changes due to spans being serialized without their size-prefix (unlike vectors).
Also makes minor improvement on the python implementation documentation.
This aligns it more with SanityCheckAsmap and reduces variable scope. Also unify asmap casing in SanityCheckAsmap function name. Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
5d65d9d to
c5251f3
Compare
hodlinator
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK c5251f3
Changes since my first ACK (#33878 (review)) are only minor adjustments based off that review and another (#33878 (review)):
- Made
NetGroupManager-ctor temporarily take r-value - Avoided span-references
- Cleaned up whitespace across commits
sedited
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK c5251f3
This is a second slice carved out of #28792. It contains the following changes that are crucial for the embedding of asmap data which is added the following PR in the series (probably this will remain in #28792).
The changes are:
std::byteinstead of bitsspanrather than a vector in the asmap internal to prevent holding the asmap data in memory twiceThe first three commits were already part of #28792, the others are new.
The documentation commit came out of feedback gathered at the latest CoreDev. The primary input for the documentation was the documentation that already existed in the Python implementation (
contrib/asmap/asmap.py) but there are several other comments as well. Please note: I have also asked several LLMs to provide suggestions on how to explain pieces of the implementation and better demonstrate how the parts work together. I have copied bits and pieces that I liked but everything has been edited further by me and obviously all mistakes here are my own.