Introduction
In November 2024, we discovered an inflation bug in the Aleo mainnet. We immediately reported this bug to the Aleo team. The bug was identified and then quickly fixed. Fortunately, no exploitation was detected after a thorough scan.
Thanks to the Aleo team for their prompt and professional action.
In this article, I will explain the context of the bug, how it could have been exploited, and how it was fixed.
How Does Aleo Work?
Aleo is a blockchain network that utilizes ZKP (zero knowledge proof) to achieve privacy and programmability. Aleo uses the Varuna proof system that is adapted from Marlin. On Aleo, the user generates the proof of a transaction’s execution and the validator (network) verifies the validity of the transaction by verifying the proof. This approach makes the network efficient, as validators do not need to re-execute transactions. Moreover, users can keep their privacy by hiding the execution details.
Transition
Aleo’s transactions are composed of transitions. A Transition represents the execution of a function of the Aleo contract. Aleo’s contract language is Leo, a straightforward Rust-like DSL (domain-specific language). For more details about Leo read Exploring Leo: A Primer on Aleo Program Security. Below is a minimal executable function (a transition) on Aleo:
transition add_private_number(public a: u32, private b: u32) -> u32 {
let c: u32 = a + b;
return c;
}
In the code the add_private_number
accepts one public input a
and one private input b
, adds the two numbers and returns the private result c
(on Aleo, the input and output are private by default). As b
and c
are private, the values in the transaction are encrypted and not revealed to others. Only the transaction sender can decrypt the encrypted values. When a user sends a transaction calling the function, the network ensures c
is the sum of a
and b
without knowing the value of b
and c
.
The Record Type
Currently, a transition can handle trivial data types like bool
and u64
, which are clonable and not suitable for representing assets (like a token). For this purpose, Aleo introduces the Record
model. The Record
is a special data type that is basically the leaf of a shielded pool. It generalizes the design of Zcash to allow for various private objects.
A transition
can take a Record
type as input and output a Record
as well. The difference is that a Record
can only be created by returning one from a transition. Each Record
has an owner and can only be spent by the owner. A Record
is consumed when it’s passed as the input of a transition. When it is consumed, a Zcash-like nullifier is emitted, which ensures the Record
cannot be used again. In other words, a Record
represents non-clonable assets, which is similar to the Object
concept in the Move language, but private and invisible to others.
A program can define its own Record
types with custom fields. The fields are also private by default. Only the creator and owner are able to decrypt the private fields. Below is an example program used in Exploring Leo: A Primer on Aleo Program Security:
program example_program0.aleo {
record Token {
owner: address,
amount: u128
}
transition private_transfer_token(private receiver: address, private input_token: Token) -> Token {
let new_token: Token = Token {
owner: receiver,
amount: input_token.amount
};
return new_token;
}
}
The code above implements private token transfers. It defines a record called Token
with owner
and amount
fields, which are private by default. The private_transfer_token
transition takes the Token
record as input, consumes it, and outputs a new token with the same amount and a new owner. The input_token
will be consumed because it is passed as an input of the function. The new_token
will be created because it’s returned by the function. This ensures the total supply of the token remains unchanged. As private_transfer_token
is executed off-chain and the input/output are private, the network won’t learn the sender, receiver, and amount.
On-Chain Finalize Logic
The transition is useful for off-chain computation. However, it can’t access on-chain states. For example, for a decentralized exchange (DEX) to perform a swap between two tokens, it will need to fetch and update the current on-chain price. For this purpose, in Aleo a function can define a finalize logic that is executed publicly on-chain.
The finalize logic can read and write on-chain states (e.g., the data_map
in the example below). When a user creates a transaction, they execute the transition and generate the proof locally, and the transition can emit a Future
type that specifies the logic to be executed on-chain. Once the transaction is verified and accepted by the network, all the validators will execute the Future
. This approach also prevents multiple users from generating proofs for updating the same value, which could cause conflict update issue for multi-user application. Here’s an Aleo program making use of the finalize functionality:
program example_program1.aleo {
mapping data_map: address => u64;
// transition is executed off-chain
async transition square_counter(public a: u64) -> Future {
let square: u64 = a * a;
return finalize_square_counter(self.caller, square);
}
// finalize is executed on-chain
async function finalize_square_counter(caller: address, square: u64) {
let v: u64 = data_map.get_or_use(caller, 0u64);
data_map.set(caller, square + v);
}
}
The square_counter
transition accepts one input a
and computes the square of it. Then it outputs the Future
type, which contains the function (finalize_square_counter
) and parameters (self.caller
and square
). The Future
here represents the on-chain execution logic “deferred” at proving time, similar to the async task in Rust or JavaScript. In this example the on-chain execution logic is the finalize_square_counter
function defined by the contract. As the function will be executed publicly by all validators in a similar fashion to ethereum, it can read and write the on-chain data_map
mapping. Given the caller
and square
, the function retrieves the previous value in the slot of the caller
, adds the square
and writes back to the mapping.
Run the above square_counter
transition in Leo playground with leo run square_counter 1u64
you can get the result:
You can see the output of the transition is exactly a Future
type. The future contains the program id, function name and the arguments of the function. If the transition is valid, then the validator will execute the finalize logic with the given arguments.
Finally, we can get a high-level idea of the full lifecycle of a transaction that uses all of these features in Aleo. Here the user wants to generate the execution of a smart contract method which is already deployed on Aleo:
-
The user executes the transition locally, generating the transition input and output.
-
The user generates a proof of execution for that transition (using ZKPs) and includes it in a transaction along with:
- cryptographic commitments for each input used (as well as output produced)
- relevant plaintext data for each public input/output and encrypted data for each private input/output.
Finally the user sends all of that in a transaction to the network.
-
Validator verifies the proof of the transaction. That is, given the function and the commitment to the input, the commitment of the output is correct. The validator also verifies that the given input/output data correctly corresponds to the commitment.
-
If the transaction proof is valid, the network accepts the transaction. If the transition outputs a
Future
, then execute the corresponding finalize logic on-chain.
Now you know enough to understand the bug that we found. Let’s get into it!
The Vulnerability
As part of our continuous security process with Aleo, we managed to uncover an important bug. The bug was caused by an insecure way to commit to the input/output, allowing an attacker to bypass the finalization logic.
In Aleo, a transition is checked by the validator in two steps:
- A validator first checks the proof of execution contained in a user’s transaction to ensure the associated claim: “given the executed function and commitments to its inputs, the resulting output commitment is correct”.
- Then, they verify that the given input/output data correctly corresponds to the commitment.
Below is the code snippet of the second step that checks the output data. Given the output, the verify
function checks that the Output
correctly corresponds to the output commitment/hash. This is off-circuit code executed by every validator for each transaction. Can you find where the vulnerability is introduced?
/// The transition output.
#[derive(Clone, PartialEq, Eq)]
pub enum Output<N: Network> {
[...]
/// The ciphertext hash and (optional) ciphertext.
Private(Field<N>, Option<Ciphertext<N>>),
/// The output commitment of the external record. Note: This is **not** the record commitment.
ExternalRecord(Field<N>),
/// The future hash and (optional) future.
Future(Field<N>, Option<Future<N>>),
}
impl<N: Network> Output<N> {
pub fn verify(&self, function_id: Field<N>, tcm: &Field<N>, index: usize) -> bool {
// Ensure the hash of the value (if the value exists) is correct.
let result = || match self {
[...]
Output::Private(hash, Some(value)) => {
match value.to_fields() {
// Ensure the hash matches.
Ok(fields) => match N::hash_psd8(&fields) {
Ok(candidate_hash) => Ok(hash == &candidate_hash),
Err(error) => Err(error),
},
Err(error) => Err(error),
}
}
Output::Future(hash, Some(output)) => {
match output.to_fields() {
Ok(fields) => {
// Construct the (future) output index as a field element.
let index = Field::from_u16(index as u16);
// Construct the preimage as `(function ID || output || tcm || index)`.
let mut preimage = Vec::new();
preimage.push(function_id);
preimage.extend(fields);
preimage.push(*tcm);
preimage.push(index);
// Ensure the hash matches.
match N::hash_psd8(&preimage) {
Ok(candidate_hash) => Ok(hash == &candidate_hash),
Err(error) => Err(error),
}
}
Err(error) => Err(error),
}
}
[...]
Output::ExternalRecord(_) => Ok(true),
};
[...]
}
}
The vulnerability arises from the hash/commitment schema for different output types. The commitment schema of the Output
only includes the data without absorbing the subtype of it. That means we can create two Output
of different subtypes that have the same commitment. For example, Output::Future(hash, Some(output))
is checked by concatenating the data into preimage
and getting the hash of it. Output::Private(hash, Some(value))
is checked by directly hashing the value
. Then we can create Output::Private
with exactly the same hash by simply setting the value
in the Output::Private
as preimage
of the Output::Future
.
This means that the commitment schema is not binding. Unfortunately, Aleo relies solely on commitment checks for input/output validation without explicit type checks elsewhere. Therefore, an attacker can replace the output with data of a different type in the transaction and still pass the check. This design flaw introduces a critical vulnerability.
Going a step further, remember that the Future
data represents the finalized logic that is to be executed on-chain. If the Future
data is replaced by another type of data (e.g., Output::Private
), then the finalize logic won’t be executed on chain. In this way an attacker can bypass any finalize logic in its transaction. One of the impacts is that the attacker can issue an arbitrary amount of Aleo token.
Consider the following function which exists in Aleo’s token contract:
program example_program2.aleo {
record Token {
owner: address,
amount: u64
}
mapping public_balance: address => u64;
async transition transfer_public_to_private(recipient: address, amount: u64) -> (Token, Future) {
let new_token: Token = Token {
owner: recipient,
amount: amount
};
let f: Future = finalize_transfer(self.caller, amount);
return (new_token, f);
}
async function finalize_transfer(caller: address, amount: u64) {
let balance: u64 = public_balance.get_or_use(caller, 0u64);
let new_balance: u64 = balance - amount; // transaction will revert if underflow happens
public_balance.set(caller, new_balance);
}
}
The transfer_public_to_private
transfers public token (stored on-chain in the public_balance
mapping) to private token (stored by record). It issues the new token record and subtracts the same amount of balance in one transaction. If the public balance is insufficient, the transaction will revert. However, if the finalize_transfer
logic is skipped, the whole transaction is partially executed. The new token record will still be created by the transition but the on-chain balance is not subtracted. In this way an attacker can issue an arbitrary amount of the token.
Here is how the attack can be performed:
- The attacker executes the
transfer_public_to_private
transition withrecipient
as its own address and a largeamount
. Then the attacker generates the proof and transaction like normal. - The transaction contains two output of the transition: a
Output::Token
type and aOutput::Future
type. The attacker replaces theOutput::Future
with data ofOutput::Private
type that has the same commitment value. The attacker sends the transaction to the network. - The validator verifies the transaction and decides it’s valid: The proof of transaction is valid (because it is not touched) and the attached input/output data are valid (because they have correct commitment).
- The validator accepts the transaction. It creates the emitted
Token
record. It finds that there is noFuture
type in the output (which is replaced byOutput::Private
type) and then won’t execute the on-chain finalize logic. - After the transaction is executed, the new
Token
is created but the on-chain public balance is not subtracted. In this way the attacker creates a newToken
for free.
Note that if such an attack happens, we can find it by checking the input/output data mismatch.
The Fix
Upon discovering the issue, we immediately reported it to the Aleo team. The Aleo team confirmed the issue and scanned all existing transactions for signs of exploitation. Fortunately, no exploitation was found. Then the fix was proposed and merged. Since the commitment schema involves multiple parts of the protocol (including the circuit), it was challenging to modify. Therefore, it is fixed by adding explicit checks to ensure that the input/output data type of a transition is correct.
pub fn verify_execution(&self, execution: &Execution<N>) -> Result<()> {
[...]
// Ensure the input and output types are equivalent to the ones defined in the function.
// We only need to check that the variant type matches because we already check the hashes in
// the `Input::verify` and `Output::verify` functions.
let transition_input_variants = transition.inputs().iter().map(Input::variant).collect::<Vec<_>>();
let transition_output_variants = transition.outputs().iter().map(Output::variant).collect::<Vec<_>>();
ensure!(function.input_variants() == transition_input_variants, "The input variants do not match");
ensure!(function.output_variants() == transition_output_variants, "The output variants do not match");
[...]
}
Later, the fix was deployed to all validators. More tests were added to ensure that malformed inputs and outputs are detected.
Timeline
- November 24, 2024 The issue was found and reported to Aleo. Aleo confirmed the issue.
- November 25, 2024 The Aleo team scanned the entire history of the blockchain and found no evidence of exploitation.
- December 3, 2024 The fix was proposed and merged.
- December 4, 2024 The fix, along with other normal upgrades, was rolled out to all validators.
Summary
We discovered a critical vulnerability in Aleo that could have allowed arbitrary token minting. We promptly reported the issue to the Aleo team, and together, we worked to fix it.
The key takeaway from this experience is the importance of following the “TLV” pattern (Type, Length, Data) when committing and hashing data. This helps ensure that commitments are secure and prevent exploitation.
Security is a continuous and evolving process. Zero-knowledge projects that are not transparent about their security practices are more prone to vulnerabilities. By consistently raising the bar and rigorously reviewing systems, we can identify and mitigate critical issues early. As the zk ecosystem grows, we remain committed to securing zk projects and contributing to their long-term reliability and safety.