aHash is not a PRF

aHash is a fast Rust hasher. It is popular, useful, and very explicitly not cryptographic.

That warning is good. But the documentation also says something stronger: “because aHash is keyed, hashes cannot be predicted without knowing the keys”. That is not true.

For a hash table, keyed hashing usually means something much more modest. It means an attacker should not be able to precompute one pile of colliding keys and reuse it against every process on the internet. Per-map randomization is a practical defense against boring collision-flooding attacks.

But “cannot be predicted without knowing the keys” sounds like a keyed random function. It sounds PRF-like. It suggests that even after seeing hashes for inputs I choose, I still cannot predict the hash of a fresh input.

That is a much higher bar, and aHash does not clear it.

The construction is not AES

The usual fast path uses AES instructions. That detail is easy to overread.

Using AES round instructions does not make the whole construction AES, any more than using xor makes a protocol a stream cipher. The surrounding construction still has to absorb inputs safely, separate domains, and finalize with enough mixing for the security claim being made.

aHash state keeps three 128-bit values:

enc
sum
key

The basic block update is roughly:

enc = aesdec(enc, block)
sum = shuffle_and_add(sum, block)

Then finish() combines the state with a small number of AES instructions and returns 64 bits.

This is a good shape for a fast hash-table hasher. It is not AES-CMAC. It is not AES as a block cipher. It is not a standard PRF construction.

And in practice, it leaks simple key-independent relations between outputs.

aHash doesn’t securely hash transcripts

The Rust Hasher trait exposes several ways to feed data:

write(&[u8])
write_u8(...)
write_u64(...)
write_u128(...)

In aHash, the integer methods collapse into the same internal operation:

write_u8(i)   -> write_u64(i as u64)
write_u16(i)  -> write_u64(i as u64)
write_u32(i)  -> write_u64(i as u64)
write_u64(i)  -> write_u128(i as u128)
write_u128(i) -> hash_in(i)

So these are different public transcripts:

write_u8(0xab)
write_u16(0xab)
write_u32(0xab)
write_u64(0xab)
write_u128(0xab)

But they all become:

hash_in(0xab)

For every key, they produce the same final hash. There’s no context separation. Developers may not expect this.

There’s a more surprising one:

write(&[])
write_u128(0)

For non-empty byte strings, aHash adds the input length before absorbing bytes. For the empty string, it writes zero directly. The typed u128 path also hashes zero directly.

So two very different transcripts can produce the same hash, independently from the key.

That being said, the Hasher trait doesn’t really define how inputs are supposed to be handled. Being type-safe for hashing transcripts is an option, concactenating everything is another option, but anything else is technically valid as well, even if it is prone to misuse.

An integral distinguisher

Consider u128 inputs constructed that way:

X[a,b] = (a << 56) | (b << 120)

a and b each range from 0x00 to 0xff.

So these are 128-bit values where byte 7 and byte 15 vary over every possible value, and all other bytes are zero.

Now, consider every query:

h.write_u128(X[a,b])
h.finish()

That gives 65,536 different typed inputs. Their 64-bit outputs have a zero XOR sum for any key.

Again, that is not how a PRF from u128 inputs to 64-bit outputs behaves. For independent random outputs, the XOR should be zero with probability 2^-64. In aHash, it is zero every time.

The reason is that write_u128() feeds the value directly into hash_in(), and these byte positions create a balanced set through the reduced AES-based finalization. The key changes the individual hashes. It does not change the XOR relation.

This also makes the output predictable. Leave out one X[a,b], query the other 65,535 values, XOR their outputs, and the result is the missing output.

Here is a trivial PoC:

use aHash::RandomState;
use std::hash::{BuildHasher, Hasher};

fn hash_u128(state: &RandomState, x: u128) -> u64 {
    let mut h = state.build_hasher();
    h.write_u128(x);
    h.finish()
}

fn main() {
    let state = RandomState::with_seeds(1, 2, 3, 4);
    let mut xor = 0u64;
    for a in 0u8..=255 {
        for b in 0u8..=255 {
            let x = ((a as u128) << 56) | ((b as u128) << 120);
            xor ^= hash_u128(&state, x);
        }
    }
    assert_eq!(xor, 0);
    println!("xor {xor:016x}");
}

This prints:

xor 0000000000000000

Of course, typed u128 inputs are still easy to wave away. Many real keys are strings or byte slices. But they are affected by the same issue.

Transposing the distinguisher to strings

Consider this family of 16-byte strings:

S[a,b] = 00 00 00 00 00 00 00 aa 00 00 00 00 00 00 00 bb

a and b each range from 0x00 to 0xff.

That gives 65,536 different non-empty byte strings. No typed integer writes. No empty input. No repeated input. No weird transcript ambiguity.

For each string, do the ordinary thing:

h.write(&S[a,b])
h.finish()

The XOR of all 65,536 outputs is always zero, for any key.

That should not happen for a real PRF. For 65,536 independent 64-bit outputs, the XOR would be zero with probability 2^-64. Here, it happens every time.

The reason is that the varying bytes are placed as the high bytes of the two 64-bit lanes consumed by the 16-byte byte-slice path. Those positions avoid carry trouble in the lane additions, so the chosen set stays balanced before it reaches the reduced AES-based finalization. The key changes the individual outputs. It does not remove the relation.

This is the same kind of balanced-set behavior behind square/integral attacks on AES-like constructions, showing through a construction that uses too little AES-style mixing to hide it.

Pick a target:

T = S[0x42, 0xa5]

Now hash the other 65,535 strings in the family, but not T. XOR their outputs together.

The result is exactly the missing output:

prediction = xor_{(a,b) != (0x42,0xa5)} aHash_k(S[a,b])
prediction = aHash_k(T)

No key recovery. No brute force. Just a key-independent relation.

Another quick PoC

use aHash::RandomState;
use std::hash::{BuildHasher, Hasher};

fn hash(state: &RandomState, bytes: &[u8]) -> u64 {
    let mut h = state.build_hasher();
    h.write(bytes);
    h.finish()
}

fn main() {
    let state = RandomState::with_seeds(1, 2, 3, 4);
    let target = (0x42, 0xa5);
    let mut prediction = 0u64;
    let mut actual = 0u64;
    let mut s = [0u8; 16];
    for a in 0u8..=255 {
        for b in 0u8..=255 {
            s[7] = a;
            s[15] = b;
            let out = hash(&state, &s);
            if (s[7], s[15]) == target {
                actual = out;
            } else {
                prediction ^= out;
            }
        }
    }
    assert_eq!(prediction, actual);
    println!("predicted {prediction:016x}");
}

This prints:

predicted 2929121e3dfcdd2a

The exact value is not the point. The assertion is. The program never adds the target hash to prediction. It only hashes the other 65,535 strings.

What this proves

This does not prove that aHash is useless.

It does not give a giant collision set. It does not recover the key. It does not immediately produce a practical HashMap denial-of-service attack. It does not mean every application using aHash is broken.

Hash-table hashers are not MACs. They are judged by different criteria: speed, distribution, and enough randomization to make precomputed collision attacks unattractive.

But aHash is not a PRF-like keyed hash over byte strings. Some outputs are algebraically related. One output in this family can be predicted exactly from the others without knowing the key.

That is incompatible with the strong reading of “cannot be predicted without knowing the keys”.

aHash can still be a perfectly reasonable fast hash-table hasher as long as the output remains secret. The crate is also right to say that it is not cryptographically secure.

But do not use it as a MAC. Do not use it as a cryptographic keyed hash. Do not use it as a PRF. Do not use its output as randomness. Be careful with non-cryptographic designs such as sharded routing if an adversary gets to choose inputs and observe outputs.