Skip to content

docs(ADR): #113 Modular transaction hashing#2241

Merged
melekes merged 34 commits intomainfrom
anton/342-txi
Jun 20, 2024
Merged

docs(ADR): #113 Modular transaction hashing#2241
melekes merged 34 commits intomainfrom
anton/342-txi

Conversation

@melekes
Copy link
Collaborator

@melekes melekes commented Feb 5, 2024

RENDERED

Closes #342

@melekes melekes requested a review from a team as a code owner February 5, 2024 12:30
@melekes melekes requested a review from a team February 5, 2024 12:30
@melekes melekes changed the title feat(ADR): ADR-112: Modular transaction hashing docs(ADR): #112: Modular transaction hashing Feb 5, 2024
@melekes melekes changed the title docs(ADR): #112: Modular transaction hashing docs(ADR): #112 Modular transaction hashing Feb 5, 2024
@melekes melekes self-assigned this Feb 5, 2024
@adizere adizere added this to the 2024-Q1 milestone Feb 5, 2024
@adizere
Copy link
Contributor

adizere commented Feb 5, 2024

@itsdevbear @fedekunze @JayT106 super simple ADR, please have a look.

Copy link
Contributor

@adizere adizere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, thanks Anton!

I pinged a couple of users to gather some feedback on this, let's wait on their input before merging.

@cason

This comment was marked as resolved.

Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice write up.

I agree with the approach, but I think we should really restrict the scope of the proposed changes. I know, for instance, that the RPC endpoints use some weird way to "print" (i.e., representing as strings) transactions and their IDs/hashes. But this is a problem of String() method. That I am afraid that is not solved using this approach.

}
```

And then use it to calculate a transaction's hash.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific regarding that?

In the case of types/Tx, calculating the hash is hard-coded. How can we parameterize this?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case we want this change to apply to the RPC endpoints and, possibly, indexer is one thing. In this case we can use any customized hash function.

But if we want to adopt customized hash functions for our types, this is a completely different discussion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we want to adopt customized hash functions for our types, this is a completely different discussion.

I'd like for us to have both discussions.

@cason

This comment was marked as resolved.

@melekes melekes changed the title docs(ADR): #112 Modular transaction hashing docs(ADR): #113 Modular transaction hashing Feb 8, 2024
Copy link
Collaborator

@andynog andynog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall approach seems OK, but my concern is related to the scope and implications of this change.

For example, in types/tx.go the code below might affect the Merkle root hash

func (txs Txs) Hash() []byte {
	hl := txs.hashList()
	return merkle.HashFromByteSlices(hl)
}

because in txs.hashList() the Hash() is invoked

func (txs Txs) hashList() [][]byte {
	hl := make([][]byte, len(txs))
	for i := 0; i < len(txs); i++ {
		hl[i] = txs[i].Hash()
	}
	return hl
}

which in merkle.HashFromByteSlices(hl) will make the Merkle root hash different

func HashFromByteSlices(items [][]byte) []byte {
	return hashFromByteSlices(sha256.New(), items)
}

maybe this will be OK, but will this have implications later on ? Like during an upgrade, could the hash algorithm be changed ? what are the implications on verifying and validating if the Merkle is not valid.

Also, is this OK with IBC and proofs ? Like what if one chain decides to switch the hash algorithm? what are the validity of proof guarantees ?

Anyway, the scope and implications of this change seems to be a lot broader and it is a good idea to think this through. Maybe my concerns are not valid but just bringing this up 😉

@cason
Copy link

cason commented Feb 12, 2024

The overall approach seems OK, but my concern is related to the scope and implications of this change.

Fully agree with that, much better explained by @andynog. Hashing transactions is something useful for the mempool reactor and for RPC endpoints.

Other uses of this library includes block, merkle tree, evidence production and verification. For this case, being compatible is essential. A suggestion here, for instance, is to adopt ed25519 directly, so it is clear to everyone that this is the standard for blockchain hashes.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale For use by stalebot label Feb 23, 2024
@github-actions github-actions bot removed the stale For use by stalebot label Feb 27, 2024
@robert-zaremba
Copy link

NOTE: We should be able to customize the Merkle Tree hashing as well. But this has to be done in the IAVL level.


## Detailed Design

Add `hashFn HashFn` option to `NewNode` in `node.go`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be specific and add Transaction or Tx to the name and description of the methods?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to restrict the scope to just txs, yes.


```json
{
"hash": 5, // crypto.SHA256 (https://pkg.go.dev/crypto#Hash)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "5" mean here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go's stdlib Hash https://pkg.go.dev/crypto#Hash. We can also use strings like SHA256 if that's more practical

Copy link
Collaborator

@sergio-mena sergio-mena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Comment on lines +67 to +71
Hash = crypto.SHA256

stringFunc = func(bz []byte) string {
return fmt.Sprintf("%X", bz)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see things differently. A core dev will be less likely to call FmtHash instead of calling the String() func. Moreover the %s operand calls String() so loggers and print statements will be less verbose than calling the FmtHash function

using the Stringer interface also requires fewer code changes. Devs only would have to set the hasher struct once and then forget about the rest

@melekes melekes changed the title docs(ADR): #113 Modular transaction hashing docs(ADR): #113 Modular hashing Jun 6, 2024
@melekes
Copy link
Collaborator Author

melekes commented Jun 7, 2024

@fedekunze If we only allowed changing the hash function for transactions (how they are hashed in mempool and displayed, plus the Merkle root hash of transactions—DataHash in the block's header), would it suffice for your use case?

@melekes melekes changed the title docs(ADR): #113 Modular hashing docs(ADR): #113 Modular transaction hashing Jun 10, 2024
Copy link

@cason cason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A further review round.

Should we consider a prototype implementation to test the idea?


## Context

Hashing in CometBFT is currently implemented using `crypto/tmhash`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we directly import crypto/sha256 in many places:

$ grep crypto/sha256 . -R 2> /dev/null
./statesync/snapshots.go:	"crypto/sha256"
./crypto/merkle/tree.go:	"crypto/sha256"
./crypto/merkle/bench_test.go:	"crypto/sha256"
./crypto/bls12381/key_bls12381.go:	"crypto/sha256"
./crypto/secp256k1/secp256k1.go:	"crypto/sha256"
./crypto/tmhash/hash_test.go:	"crypto/sha256"
./crypto/tmhash/hash.go:	"crypto/sha256"
./crypto/tmhash/bench_test.go:	"crypto/sha256"
./crypto/hash.go:	"crypto/sha256"
./types/tx.go:	"crypto/sha256"
./test/e2e/app/state.go:	"crypto/sha256"
./p2p/conn/secret_connection.go:	"crypto/sha256"
./mempool/cache_test.go:	"crypto/sha256"
./mempool/mempool.go:	"crypto/sha256"

## Alternative Approaches

1. Do nothing => not flexible.
2. Add `HashFn` argument to `NewNode` and pass this function down the stack =>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashFn is a transaction hasher implementation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an option that we pass to NewNode and forward to mempool, tx indexer, and tx RPC endpoints. However, the use case is rare, so I chose to opt for global variables instead.


Give app developers a way to provide their own hash function.

## Detailed Design
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution I see in several places in the code base is:

  1. Introduce an interface
  2. Produce a standard implementation for the interface
  3. Enable changing the implementation for the interface, which is essentially only used for testing

So I wonder why this wouldn't be a better solution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2241 (comment)

My main argument against passing a hasher everywhere is that it's rare when somebody wants a different hashing function, so changing the interface of mempool, tx indexer and RPC endpoints is unjustified. But maybe I'm wrong. Please let me know what you think.

@cason
Copy link

cason commented Jun 10, 2024

I think we should also add a short paragraph mentioning that it is probably not possible/feasible to implement this via configuration parameter or command-line option, am I right? Because this could be seen as a trivial solution, but... unfeasible.

@cason
Copy link

cason commented Jun 19, 2024

Did we mention tendermint/tendermint#6539 and tendermint/tendermint#6773 here? It would be nice, also to see the discussions back then.

@melekes
Copy link
Collaborator Author

melekes commented Jun 19, 2024

I think we should also add a short paragraph mentioning that it is probably not possible/feasible to implement this via configuration parameter or command-line option, am I right?

It's possible but I'm against this because

a) we're not sure this will be a final design/solution
b) we're optimizing for the default hashing function (by analogy, we may support different consensus algorithms, but we're optimizing for the default one)

@melekes melekes requested a review from adizere June 19, 2024 16:04
@melekes melekes enabled auto-merge June 20, 2024 05:53
@melekes melekes removed wip Work in progress backport-to-v1.x labels Jun 20, 2024
@melekes melekes dismissed adizere’s stale review June 20, 2024 06:17

comments have been addressed

@melekes melekes added this pull request to the merge queue Jun 20, 2024
Merged via the queue into main with commit 31220bf Jun 20, 2024
@melekes melekes deleted the anton/342-txi branch June 20, 2024 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ADR for modular transaction hashing

8 participants