Headline
CVE-2023-34451: Prevent a transaction to appear twice in the mempool by otrack · Pull Request #890 · cometbft/cometbft
CometBFT is a Byzantine Fault Tolerant (BFT) middleware that takes a state transition machine and replicates it on many machines. The mempool maintains two data structures to keep track of outstanding transactions: a list and a map.
These two data structures are supposed to be in sync all the time in the sense that the map tracks the index (if any) of the transaction in the list. In v0.37.0
, and v0.37.1
, as well as in v0.34.28
, and all previous releases of the CometBFT repo2, it is possible to have them out of sync. When this happens, the list may contain several copies of the same transaction. Because the map tracks a single index, it is then no longer possible to remove all the copies of the transaction from the list. This happens even if the duplicated transaction is later committed in a block. The only way to remove the transaction is by restarting the node.
The above problem can be repeated on and on until a sizable number of transactions are stuck in the mempool, in order to try to bring down the target node. The problem is fixed in releases v0.34.29
and v0.37.2
. Some workarounds are available. Increasing the value of cache_size
in config.toml
makes it very difficult to effectively attack a full node. Not exposing the transaction submission RPC’s would mitigate the probability of a successful attack, as the attacker would then have to create a modified (byzantine) full node to be able to perform the attack via p2p.
Summary
A transaction might appear twice in the mempool, i.e., it is present more than once in txs, while txsMap only links a single of these occurrences. The bug happens when the cache overflows.
Context
The bug is related to issues #3865 and #5281 in Tendermint. Both issues are closed with this fix. One may reproduce them easily in e2e, e.g.,
./build/generator -m v0.37.1:1 -d networks/generated/
./build/runner -f networks/generated/gen-0003.toml setup
find networks/generated/gen-0003/ -iname "config.toml" | xargs sed -i s,"cache_size = 10000","cache_size = 100",
find networks/generated/gen-0003/ -iname "config.toml" | xargs sed -i s,"prepare_proposal_delay = 200ms","prepare_proposal_delay = 3000ms",
./build/runner -f networks/generated/gen-0003.toml start
curl -s http://localhost:5701/broadcast_tx_sync?tx=\"apple=100\"; for i in $(seq 1 1001); do curl -s http://localhost:5701/broadcast_tx_sync?tx=\"orange=${i}\"; done; curl -s http://localhost:5701/broadcast_tx_sync?tx=\"apple=100\"
Remark
The present fix works because the mempool receives in order and sequentially all the asynchronous ABCI callbacks. Notice that in the past, this was not the case for the grpc client, as outlined here. Sidestepping such a constraint is feasible, yet presently open.
Related news
### Impact The mempool maintains two data structures to keep track of outstanding transactions: a list and a map. These two data structures are supposed to be in sync all the time in the sense that the map tracks the index (if any) of the transaction in the list. Unfortunately, it is possible to have them out of sync. When this happens, the list may contain several copies of the same transaction. Because the map tracks a single index, it is then no longer possible to remove all the copies of the transaction from the list. This happens even if the duplicated transaction is later committed in a block. The only way to remove the transaction is by restarting the node. These are the steps to cause the above duplication problem. Everything should happen within one height, that is no `FinalizeBlock` or `BeginBlock` ABCI calls should happen while these steps are reproduced: 1. send transaction tx1 to the target full node via RPC 2. send N more different transactions to the target full no...