extension-bolt: simple taproot channels (feature 80/81)#995
extension-bolt: simple taproot channels (feature 80/81)#995Roasbeef wants to merge 19 commits intolightning:masterfrom
Conversation
d7b1fe6 to
ec8c7b4
Compare
|
Some things that came up in meatspace discussions:
|
|
I think the commit_sig should contain the sender's "remote nonce" and the revoke_and_ack contain the sender's "local nonce". Also since funding_locked will be sent repeatedly with scid-alias when that is merged and deployed, then there should probably be language to define that the nonces are only sent the first time? |
|
let's try to pick naming conventions for nonces that doesn't make me cry over the asymmetry |
|
Some points: This interacts with the 2-of-3 goal of @moneyball . If one participant uses a 2-of-3 and owns ALL 3 keys, then it is fine and we can just have MuSig2 with both channel endpoints. But the 2-of-3 goal is that one channel endpoint is really a nodelet-like setup: there is one sub-participant with 2 keys and another "server" participant with 1 key, a la GreenWallet. This requires composable MuSig2. Now I think composable MuSig2, if it can be proven safe, just requires two -- This interacts with VLS as well @ksedgwic . The nonce A similar technique may also be useful for the server in the 2-of-3 of @moneyball; rather than maintain a state for each channel of each client, the client could store the per-channel |
|
So I talked to @jonasnick, and as I understand it, we can work with just two |
|
Re recursive musig2: I'm gonna give the implementation a shot (outside the LN context, just the musig-within-musig) just to double check my assumptions re not needing to modify the (revised) nonce exchange flow. |
|
i made a pull request on this pull request with script fixes |
Why not the revocation key? When i publish an old state, the remote party can claim my output and htlcs with the key path, but not his own output, and also has to wait a block. If we set the internal key to the revocation key it will give the remote party more privacy, nobody on chain can see which outputs were to local and to remote (and htlcs if they are swept along). It will also give more consistency with other output as they also have the revocation key as internal key. it will also be cheaper (or get a higher fee rate with the same amount of sats), this only requires a signature from a taptweaked revocation key (65) instead of a signature (65), the script (36) and the controlblock (34) (incl length prefix) |
|
#995 (comment) makes it invisible for outside observers to identify the to_remote output in case of a revoked commitment. if there are some htlcs on it that are long expired and the second stage is broadcasted (like in the fee siphoning attack), the funds go to the local delayed pubkey + relative timelock. outside observers can now see which output was the to_local one, just search the output of an htlc 2nd stage tx in the commitment transaction. example ctx: 15c262aeaa0c5a44e9e5f25dd6ad51b4162ec4e23668d568dc2c6ad98ae31023 (testnet) the transaction with the expired htlc reveals the to_local output. (it is already revealed by the script, but this wouldnt be the case with a revoked taproot ctx) this can be fixed by tweaking the local delayed pubkey with the hash of EDIT: no secret is needed, instead a taptweak like tweak can be done. everywhere where a local delayed pubkey is used, it is tweaked with for clarity: htlc outputs that send funds to the local delayed pubkey use a tweaked local delayed pubkey where the output index of the htlc output on the commitment transaction is used, not the htlc success or timeout tx |
this would preserve privacy, but you'd also need to do this for the |
hmmm true, so it is either privacy, with no key reuse or no utxo set bloat. btw another idea about anchors and less utxo set bloat: |
The |
if this is a problem B can tweak the key before using it without A even knowing (also A has to do this because the lexicographically smaller key is tweaked), but i dont think it is, lnd uses a separate bip32 tree for this (separate from the wallet) (btw without taproot funding pubkeys were revealed every time a channel was closed) |
afaict the algorithm in this bip is generalized for 32 byte pubkeys and more than 2 signers, the 'simple' musig2 with the pubkey's with the parity bit known looks like this equation i used, btw i got it here https://github.com/t-bast/lightning-docs/blob/master/schnorr.md#musig2 |
instagibbs
left a comment
There was a problem hiding this comment.
Some old comments I forgot to submit
Most recent comment is noting that partial sigs are 32 bytes, so this needs explicit defining somewhere, since signature types seem to assume 64(may have missed it).
nvm this wouldn't work because keys are only revealed when swept without signature to make this problem somewhat easier i suggest to remove the now that the this special case can of course be seen from both sides:
even more rare: revocation no anchor keys are revealed here because with the revocation key the taproot key path is used. i don't know to make anchor sweepable in this case long story short:
Questions/feedback welcome! |
|
@t-bast addressed your comments, and also added JSON test vectors! The vectors are just pasted at the very end, if we think it's too large, I can move them into another file. |
t-bast
left a comment
There was a problem hiding this comment.
I have a couple of nits to fix remaining inconsistencies in option_simple_close extensions.
t-bast
left a comment
There was a problem hiding this comment.
I think there are a few mistakes in the test vectors, this is worth investigating further.
Remove the redundant nSequence paragraph from the cooperative close section. The `option_simple_close` spec in BOLT #2 already specifies that `nSequence` MUST be exactly `0xFFFFFFFD`, so restating it here with the weaker "less-than-or-equal" phrasing was both redundant and inconsistent. Also fix "next closee nonce" to "closer nonce" in the JIT nonce description for `closing_complete`, matching the terminology used throughout the rest of the spec.
…ions Update the embedded JSON test vectors to match the corrected lnd test vector generator output. The key changes are: The `remote_partial_sig` fields now contain actual 32-byte MuSig2 partial signature scalars instead of the dummy 8-byte DER stub `3006020100020100` that was leaking through from the zeroed `CommitSig` field. Each test case also now includes `local_nonce` and `remote_nonce` fields (66-byte compressed public nonces) so that verifiers can reconstruct the combined MuSig2 signature independently. The HTLC resolution transactions have been corrected in two ways: the `remote_partial_sig_hex` values are now correctly mapped to their transactions using BIP 69 output index ordering (fixing the invalid HTLC-timeout signatures that eclair reported), and the HTLC-success witness stacks now include the preimage and redeem script in the correct witness slots. The "commitment tx with some HTLCs trimmed" test case now uses dust_limit=2500 instead of 546, which properly exercises zero-fee HTLC trimming by pushing the three smallest test HTLCs (1000, 2000, 2000 sats) below the dust threshold, reducing the HTLC count from 5 to 2.
Add detailed explanatory sections to the test vectors appendix so that implementers can reproduce the vectors without needing to reverse- engineer the generator code. The new documentation covers: The complete set of key derivation labels used with the `SHA256(seed || label)` pattern, so implementations can derive all private keys from the seed independently. MuSig2 nonce generation: the deterministic signing randomness labels (`"local-signing-rand"` / `"remote-signing-rand"`) that produce reproducible nonces. Each test case now includes `local_nonce` and `remote_nonce` fields so implementations can cross-check their nonce derivation before attempting signature verification. The distinction between commitment transaction signatures (32-byte MuSig2 partial sig scalars using a keyspend path) and HTLC resolution signatures (64-byte Schnorr signatures using a tapscript spend path). The BIP 69 output index ordering convention for `htlc_descs` entries, matching the order that `htlc_signature` messages are exchanged during the commitment protocol. The full taproot witness stack layout for both HTLC-success (5 elements including preimage) and HTLC-timeout (4 elements) transactions. The zero-fee HTLC trimming semantics, explaining why trimming depends solely on the dust limit rather than the fee rate.
bolt-simple-taproot.md
Outdated
| * `htlc_timeout`: | ||
| ``` | ||
| <local_delayedpubkey> OP_CHECKSIG | ||
| <to_self_delay> OP_CHECKSEQUENCEVERIFY OP_DROP |
There was a problem hiding this comment.
The prod tapscript variants that use OP_CHECKSIGVERIFY instead of OP_CHECKSIG ... OP_DROP leave the time-lock value as the final (and only) stack element. Script success then depends on that value being non-zero (since 0 is falsy in Bitcoin script).
This affects three scripts:
- to_delay_script:
<key> OP_CHECKSIGVERIFY <to_self_delay> OP_CHECKSEQUENCEVERIFY- final stack is [to_self_delay] - Accepted HTLC timeout:
<key> OP_CHECKSIGVERIFY 1 OP_CSV OP_VERIFY <cltv_expiry> OP_CLTV- final stack is [cltv_expiry] - 2nd-level HTLC outputs:
<key> OP_CHECKSIGVERIFY <delay> OP_CSV- final stack is [delay]
Note that to_remote_script is not affected since it hardcodes 1 OP_CHECKSEQUENCEVERIFY, which always leaves a truthy [1] on the stack.
In practice this is safe, since to_self_delay must be positive per BOLT 2 negotiation, CLTV expiries are always future block heights, and CSV delays are always > 0. But unlike the legacy OP_CHECKSIG ... OP_DROP scripts where the final stack element was always the signature check result (1), these scripts have an implicit invariant that the time-lock parameters must be non-zero for the script to succeed.
Should we add a brief note in the spec calling out this invariant? Something like:
Note: because these scripts use OP_CHECKSIGVERIFY (which consumes the boolean result) followed by a terminal OP_CHECKSEQUENCEVERIFY or OP_CHECKLOCKTIMEVERIFY (which leaves its argument on
the stack), the time-lock value serves as the final truthy stack element. Implementations MUST ensure these values are non-zero.
There is also a minor specification consistency issue previously pointed out by @sstone and currently under review in an LND PR by @gijswijs. The HTLC-Timeout second-level output description (under “HTLC-Timeout Transactions”) still shows the legacy OP_CHECKSIG … OP_DROP pattern, while the test vectors use OP_CHECKSIGVERIFY for both success and timeout second-level outputs. The prose likely needs updating to match the test vectors.
There was a problem hiding this comment.
Should we add a brief note in the spec calling out this invariant?
I think this is already explained in the accepted HTLCs section (https://github.com/Roasbeef/lightning-rfc/blob/simple-taproot-chans/bolt-simple-taproot.md#accepted-htlcs)?
There was a problem hiding this comment.
I think this is already explained in the accepted HTLCs section (https://github.com/Roasbeef/lightning-rfc/blob/simple-taproot-chans/bolt-simple-taproot.md#accepted-htlcs)?
That note explains why CSV is omitted from certain HTLC paths. It doesn't address the stack semantics point. The broader issue is that across all three CHECKSIGVERIFY + terminal timelock scripts, the timelock value itself ends up as the final stack element, so script success implicitly requires it to be non-zero. That invariant isn't called out anywhere currently.
It's not strictly necessary since these values are always non-zero in practice, but a brief note would help readers who are tracing the stack logic and wondering what the final truthy element is.
In this commit, we expand the MuSig2 Nonce Generation section of the test vector documentation to describe the newly added secret nonce fields (local_sec_nonce and remote_sec_nonce). Different MuSig2 implementations may produce different nonces from the same randomness input, so providing raw 97-byte secret nonces allows interop implementations to inject them directly rather than re-derive them. The documentation now clarifies that all nonces in a test case correspond to the same commitment transaction (the local party's commitment), explains the distinction between the local verification nonce and the remote JIT signing nonce, and provides step-by-step instructions for replaying MuSig2 signing from the test vectors.
Update the embedded JSON test vectors to include the new local_sec_nonce and remote_sec_nonce fields, and correct the local_nonce values. The local_nonce now reflects the local party's verification nonce for their own commitment transaction (from LocalSession), rather than the JIT signing nonce for the remote party's commitment that was previously (incorrectly) used.
|
Gijs noticed what looks to be a typo (?) in the spec (I think incomplete search and replace when we modified the scripts). lnd uses the same scripts for the second level HTLC, but right now the spec has a combo: timeout: success: We have interop, so safe to assume we're using the uniform version (the Miniscript generated |
… form The HTLC-timeout second-level transaction script was using the non-prod form (OP_CHECKSIG + OP_CSV OP_DROP) while the HTLC-success script was already using the prod form (OP_CHECKSIGVERIFY + OP_CSV). Both scripts should be identical in structure since they share the same tapscript tree construction. This aligns the spec with the implementation which uses WithProdScripts() for all taproot channel scripts.
We went back and forth a ton re this in the past. The scripts were modified to be slight more compatible with Miniscript, which generated this variant. Optimistically pushed a commit to fix this (note that the test vectors were generated assuming uniform scripts for success+timeout). |
Good catch, I missed that one! We're always using the first version in <local_delayedpubkey> OP_CHECKSIGVERIFY So it's likely indeed just an issue in the spec, not in the implementations. I think you forgot to push your fix @Roasbeef? |
Indeed! Just pushed. |
t-bast
left a comment
There was a problem hiding this comment.
I'm getting a match on commitment transactions with a3ea39c, but I'm unable to generate the same HTLC signatures. Can you share how you're generating those signatures? I'm using deterministic schnorr sigs in eclair (not providing any auxrand data to secp256k1), are you by any chance using randomized signatures in lnd, which would make those test vectors non deterministic?
Update the embedded test vectors with HTLC signatures that use BIP-340 standard nonce derivation (zero auxrand) instead of RFC6979. This ensures HTLC second-level transaction signatures are reproducible across different Schnorr implementations (libsecp256k1, btcd, etc). Also document that implementations must use BIP-340 deterministic signing with zero auxrand to reproduce the HTLC signatures in the test vectors.
|
@t-bast yeah so lnd uses a version of RFC 6979 for nonce generation. I'll modify the test vectors to use the BIP 340 EDIT: pushed up! |
t-bast
left a comment
There was a problem hiding this comment.
ACK 13b2110, the test vectors match what eclair generates in ACINQ/eclair#3144 🎉
I think this is ready to go 🚀
This PR puts forth two concepts:
The extensions described in this document have purposefully excluded any gossip related changes, as the there doesn't yet appear to be a predominant direction we'd all like to head in (nu nu gossip vs kick the can and add schnorr).
Most of the changes here described are pretty routine: use musig2 when relevant, and create simple tapscript trees to fold in areas where the script has multiple conditional paths. The main consideration with
musig2is ofc: how to handle nonces. This document takes a very conservative stance, and simply proposes that all nonces be 100% ephemeral, and forgotten, even after a connection has been dropped. This has some non-obvious implications w.r.t the retransmission flow. Beyond that, it's mostly: piggy back the nonce set of nonces (4 public nonces total, since there're "two" messages) on a message to avoid having to add additional round trips.The other "new" thing this adds is the generation/existence of a NUMs point, which is used to ensure that certain paths can only be spent via the script spend path (like the to remote output for the remote party, as this inherits anchor outputs semantics).
This is still marked as draft, as it's just barely to the point of being readable, and still has a lot of clean ups to be done w.r.t notation, clarify, wording, and full specification.