<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Frank DENIS random thoughts.</title>
  <link href="https://00f.net/atom.xml" rel="self"/>
  <link href="https://00f.net"/>
  <updated>2026-05-04T11:06:12+02:00</updated>
  <id>https://00f.net</id>
  
  <author>
    <name>Frank Denis (Jedi/Sector One)</name>
    <email>blog at pureftpd.org</email>
  </author>
  
  
  <entry>
    <title>aHash is not a PRF</title>
    <link href="https://00f.net/2026/04/26/ahash-is-not-a-prf/"/>
   <updated>2026-04-26T00:00:00+02:00</updated>
   <id>https://00f.net/2026/04/26/ahash-is-not-a-prf</id>
   <content type="html">&lt;p&gt;&lt;a href=&quot;https://crates.io/crates/ahash&quot;&gt;aHash&lt;/a&gt; is a fast Rust hasher. It is popular, useful, and very explicitly not cryptographic.&lt;/p&gt;

&lt;p&gt;That warning is good. But the documentation also says something stronger: &lt;em&gt;“because &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aHash&lt;/code&gt; is keyed, hashes cannot be predicted without knowing the keys”&lt;/em&gt;. That is not true.&lt;/p&gt;

&lt;p&gt;For a hash table, keyed hashing usually means something much more modest. It means an attacker should not be able to precompute one pile of colliding keys and reuse it against every process on the internet. Per-map randomization is a practical defense against boring collision-flooding attacks.&lt;/p&gt;

&lt;p&gt;But “cannot be predicted without knowing the keys” sounds like a keyed random function. It sounds PRF-like. It suggests that even after seeing hashes for inputs I choose, I still cannot predict the hash of a fresh input.&lt;/p&gt;

&lt;p&gt;That is a much higher bar, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aHash&lt;/code&gt; does not clear it.&lt;/p&gt;

&lt;h2 id=&quot;the-construction-is-not-aes&quot;&gt;The construction is not AES&lt;/h2&gt;

&lt;p&gt;The usual fast path uses AES instructions. That detail is easy to overread.&lt;/p&gt;

&lt;p&gt;Using AES round instructions does not make the whole construction AES, any more than using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;xor&lt;/code&gt; makes a protocol a stream cipher. The surrounding construction still has to absorb inputs safely, separate domains, and finalize with enough mixing for the security claim being made.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aHash&lt;/code&gt; state keeps three 128-bit values:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;enc
sum
key
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The basic block update is roughly:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;enc = aesdec(enc, block)
sum = shuffle_and_add(sum, block)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;finish()&lt;/code&gt; combines the state with a small number of AES instructions and returns 64 bits.&lt;/p&gt;

&lt;p&gt;This is a good shape for a fast hash-table hasher. It is not AES-CMAC. It is not AES as a block cipher. It is not a standard PRF construction.&lt;/p&gt;

&lt;p&gt;And in practice, it leaks simple key-independent relations between outputs.&lt;/p&gt;

&lt;h2 id=&quot;ahash-doesnt-securely-hash-transcripts&quot;&gt;aHash doesn’t securely hash transcripts&lt;/h2&gt;

&lt;p&gt;The Rust &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Hasher&lt;/code&gt; trait exposes several ways to feed data:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;write(&amp;amp;[u8])
write_u8(...)
write_u64(...)
write_u128(...)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aHash&lt;/code&gt;, the integer methods collapse into the same internal operation:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;write_u8(i)   -&amp;gt; write_u64(i as u64)
write_u16(i)  -&amp;gt; write_u64(i as u64)
write_u32(i)  -&amp;gt; write_u64(i as u64)
write_u64(i)  -&amp;gt; write_u128(i as u128)
write_u128(i) -&amp;gt; hash_in(i)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So these are different public transcripts:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;write_u8(0xab)
write_u16(0xab)
write_u32(0xab)
write_u64(0xab)
write_u128(0xab)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But they all become:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;hash_in(0xab)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For every key, they produce the same final hash. There’s no context separation. Developers may not expect this.&lt;/p&gt;

&lt;p&gt;There’s a more surprising one:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;write(&amp;amp;[])
write_u128(0)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For non-empty byte strings, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aHash&lt;/code&gt; adds the input length before absorbing bytes. For the empty string, it writes zero directly. The typed &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;u128&lt;/code&gt; path also hashes zero directly.&lt;/p&gt;

&lt;p&gt;So two very different transcripts can produce the same hash, independently from the key.&lt;/p&gt;

&lt;p&gt;That being said, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Hasher&lt;/code&gt; trait doesn’t really define how inputs are supposed to be handled. Being type-safe for hashing transcripts is an option, concactenating everything is another option, but anything else is technically valid as well, even if it is prone to misuse.&lt;/p&gt;

&lt;h2 id=&quot;an-integral-distinguisher&quot;&gt;An integral distinguisher&lt;/h2&gt;

&lt;p&gt;Consider &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;u128&lt;/code&gt; inputs constructed that way:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;X[a,b] = (a &amp;lt;&amp;lt; 56) | (b &amp;lt;&amp;lt; 120)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;b&lt;/code&gt; each range from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x00&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0xff&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So these are 128-bit values where byte 7 and byte 15 vary over every possible value, and all other bytes are zero.&lt;/p&gt;

&lt;p&gt;Now, consider every query:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;h.write_u128(X[a,b])
h.finish()
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That gives 65,536 different typed inputs. Their 64-bit outputs have a zero XOR sum for any key.&lt;/p&gt;

&lt;p&gt;Again, that is not how a PRF from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;u128&lt;/code&gt; inputs to 64-bit outputs behaves. For independent random outputs, the XOR should be zero with probability &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2^-64&lt;/code&gt;. In &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aHash&lt;/code&gt;, it is zero every time.&lt;/p&gt;

&lt;p&gt;The reason is that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;write_u128()&lt;/code&gt; feeds the value directly into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hash_in()&lt;/code&gt;, and these byte positions create a balanced set through the reduced AES-based finalization. The key changes the individual hashes. It does not change the XOR relation.&lt;/p&gt;

&lt;p&gt;This also makes the output predictable. Leave out one &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;X[a,b]&lt;/code&gt;, query the other 65,535 values, XOR their outputs, and the result is the missing output.&lt;/p&gt;

&lt;p&gt;Here is a trivial PoC:&lt;/p&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;aHash&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RandomState&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;hash&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BuildHasher&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Hasher&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;hash_u128&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RandomState&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u128&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u64&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;h&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.build_hasher&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.write_u128&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.finish&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;RandomState&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;with_seeds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;xor&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0u64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0u8&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;..=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;255&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0u8&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;..=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;255&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u128&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;56&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u128&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;120&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;xor&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;^=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;hash_u128&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;assert_eq!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;xor {xor:016x}&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This prints:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;xor 0000000000000000
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Of course, typed &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;u128&lt;/code&gt; inputs are still easy to wave away. Many real keys are strings or byte slices. But they are affected by the same issue.&lt;/p&gt;

&lt;h2 id=&quot;transposing-the-distinguisher-to-strings&quot;&gt;Transposing the distinguisher to strings&lt;/h2&gt;

&lt;p&gt;Consider this family of 16-byte strings:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;S[a,b] = 00 00 00 00 00 00 00 aa 00 00 00 00 00 00 00 bb
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;a&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;b&lt;/code&gt; each range from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0x00&lt;/code&gt; to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0xff&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That gives 65,536 different non-empty byte strings. No typed integer writes. No empty input. No repeated input. No weird transcript ambiguity.&lt;/p&gt;

&lt;p&gt;For each string, do the ordinary thing:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;h.write(&amp;amp;S[a,b])
h.finish()
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The XOR of all 65,536 outputs is always zero, for any key.&lt;/p&gt;

&lt;p&gt;That should not happen for a real PRF. For 65,536 independent 64-bit outputs, the XOR would be zero with probability &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2^-64&lt;/code&gt;. Here, it happens every time.&lt;/p&gt;

&lt;p&gt;The reason is that the varying bytes are placed as the high bytes of the two 64-bit lanes consumed by the 16-byte byte-slice path. Those positions avoid carry trouble in the lane additions, so the chosen set stays balanced before it reaches the reduced AES-based finalization. The key changes the individual outputs. It does not remove the relation.&lt;/p&gt;

&lt;p&gt;This is the same kind of balanced-set behavior behind square/integral attacks on AES-like constructions, showing through a construction that uses too little AES-style mixing to hide it.&lt;/p&gt;

&lt;p&gt;Pick a target:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;T = S[0x42, 0xa5]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now hash the other 65,535 strings in the family, but not &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;T&lt;/code&gt;. XOR their outputs together.&lt;/p&gt;

&lt;p&gt;The result is exactly the missing output:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;prediction = xor_{(a,b) != (0x42,0xa5)} aHash_k(S[a,b])
prediction = aHash_k(T)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;No key recovery. No brute force. Just a key-independent relation.&lt;/p&gt;

&lt;h2 id=&quot;another-quick-poc&quot;&gt;Another quick PoC&lt;/h2&gt;

&lt;div class=&quot;language-rust highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;aHash&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RandomState&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nn&quot;&gt;hash&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BuildHasher&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Hasher&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;hash&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RandomState&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;u64&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;h&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.build_hasher&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.write&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bytes&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;h&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;.finish&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;state&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;RandomState&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;with_seeds&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0x42&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0xa5&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0u64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0u64&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;mut&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0u8&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;16&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0u8&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;..=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;255&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0u8&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;..=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;255&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;let&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;hash&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;15&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;target&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;^=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;assert_eq!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prediction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;actual&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;println!&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;predicted {prediction:016x}&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This prints:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;predicted 2929121e3dfcdd2a
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The exact value is not the point. The assertion is. The program never adds the target hash to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;prediction&lt;/code&gt;. It only hashes the other 65,535 strings.&lt;/p&gt;

&lt;h2 id=&quot;what-this-proves&quot;&gt;What this proves&lt;/h2&gt;

&lt;p&gt;This does not prove that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aHash&lt;/code&gt; is useless.&lt;/p&gt;

&lt;p&gt;It does not give a giant collision set. It does not recover the key. It does not immediately produce a practical &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HashMap&lt;/code&gt; denial-of-service attack. It does not mean every application using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aHash&lt;/code&gt; is broken.&lt;/p&gt;

&lt;p&gt;Hash-table hashers are not MACs. They are judged by different criteria: speed, distribution, and enough randomization to make precomputed collision attacks unattractive.&lt;/p&gt;

&lt;p&gt;But &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aHash&lt;/code&gt; is not a PRF-like keyed hash over byte strings. Some outputs are algebraically related. One output in this family can be predicted exactly from the others without knowing the key.&lt;/p&gt;

&lt;p&gt;That is incompatible with the strong reading of “cannot be predicted without knowing the keys”.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;aHash&lt;/code&gt; can still be a perfectly reasonable fast hash-table hasher as long as the output remains secret. The crate is also right to say that it is not cryptographically secure.&lt;/p&gt;

&lt;p&gt;But do not use it as a MAC. Do not use it as a cryptographic keyed hash. Do not use it as a PRF. Do not use its output as randomness. Be careful with non-cryptographic designs such as sharded routing if an adversary gets to choose inputs and observe outputs.&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>Swival is the AI agent I actually wanted</title>
    <link href="https://00f.net/2026/04/13/swival-ai-agent/"/>
   <updated>2026-04-13T00:00:00+02:00</updated>
   <id>https://00f.net/2026/04/13/swival-ai-agent</id>
   <content type="html">&lt;p&gt;Swival is a CLI AI agent, and &lt;a href=&quot;https://swival.dev&quot;&gt;Swival 1.0.0&lt;/a&gt; has just been tagged.&lt;/p&gt;

&lt;p&gt;People are going to ask the obvious question.&lt;/p&gt;

&lt;p&gt;Why build a new agent when Codex, Claude Code and Opencode already exist, are well established, and already look good enough for most people?&lt;/p&gt;

&lt;p&gt;Because I wanted an agent that fixes the things existing agents still get wrong in actual daily use.&lt;/p&gt;

&lt;h2 id=&quot;privacy-local-models-and-not-leaking-secrets-to-a-provider&quot;&gt;Privacy, local models, and not leaking secrets to a provider&lt;/h2&gt;

&lt;p&gt;Current agents are built around genuinely incredible models.&lt;/p&gt;

&lt;p&gt;But I still don’t trust companies such as Anthropics with my data. For open source work, fine. For closed-source work, or anything sensitive or personal, I think the default posture of most current tools just isn’t acceptable.&lt;/p&gt;

&lt;p&gt;Using an AI agent inevitably leaks internal information. Sometimes a lot of it.&lt;/p&gt;

&lt;p&gt;That includes access tokens, internal project names, URLs, company names, and all the little bits of context that look harmless until they aren’t.&lt;/p&gt;

&lt;p&gt;Mitigating that risk is something I have cared about for a long time. I even gave a &lt;a href=&quot;https://www.youtube.com/watch?v=FUpEAMUQkCA&quot;&gt;Zigtoberfest talk&lt;/a&gt; about that exact topic.&lt;/p&gt;

&lt;p&gt;So I wanted an agent that does two things properly.&lt;/p&gt;

&lt;p&gt;First, it needs mitigations for leaking secrets to providers.&lt;/p&gt;

&lt;p&gt;That means transparently encrypting secrets before sending them to providers, then decrypting them locally so that models can still reason about them without actually seeing them. And it means being able to block and redact specific strings such as internal project names, company names and URLs.&lt;/p&gt;

&lt;p&gt;The fact that current agents, including the ones heavily used by corporations, still don’t ship these features is, to me, irresponsible and unacceptable.&lt;/p&gt;

&lt;p&gt;Swival has &lt;a href=&quot;https://swival.dev/pages/secrets.html&quot;&gt;transparent secret encryption&lt;/a&gt; and &lt;a href=&quot;https://swival.dev/pages/llm-filter.html&quot;&gt;outbound LLM filters&lt;/a&gt; specifically to reduce how much information leaves your machine.&lt;/p&gt;

&lt;p&gt;Second, it needs to work well with open source models. Not as a checkbox. Actually well.&lt;/p&gt;

&lt;p&gt;Local models have predictable behavior and predictable cost. They don’t suddenly get worse because a provider changed something. They don’t suddenly get more expensive because pricing moved. You control them, not a third party.&lt;/p&gt;

&lt;p&gt;And of course, they also don’t leak sensitive data to anyone.&lt;/p&gt;

&lt;p&gt;Open source models are getting good fast. Gemma 4, Qwen 3.6, and GLM-5.1 make that pretty obvious. Plenty of exciting models are uploaded to Hugging Face every day.&lt;/p&gt;

&lt;p&gt;At the same time, efficient local inference is turning into a basic requirement for modern devices, and it’s only going to improve. Apple M5 chips are a good illustration of where this is going.&lt;/p&gt;

&lt;p&gt;No, local models aren’t a replacement for everything. But they’re already good enough for a lot of work, they have a bright future, and they can be fine-tuned.&lt;/p&gt;

&lt;p&gt;Even modern small models have surprisingly strong agentic capabilities.&lt;/p&gt;

&lt;p&gt;The frustrating part is that most agents are still optimized and tested mainly for frontier models. Even the ones that advertise support for many providers and local models usually behave badly with local models. Tools are used poorly. Context is managed poorly. Everything gets slower. Output quality gets unreliable. Then people blame the model instead of the tool.&lt;/p&gt;

&lt;p&gt;I wanted an agent that performs well with any model. From large frontier models all the way down to small local models with a small context window that anyone can run on their own machine.&lt;/p&gt;

&lt;p&gt;And if it fails, I want the first instinct to be improving the agent so that it helps the model deliver as much as it can, not immediately declaring the model useless.&lt;/p&gt;

&lt;p&gt;That’s one of the main reasons Swival &lt;a href=&quot;https://swival.dev/pages/open-models.html&quot;&gt;isn’t just for frontier models&lt;/a&gt;. A lot of that comes from &lt;a href=&quot;https://swival.dev/pages/context-management.html&quot;&gt;excellent context management&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Swival has a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/compact&lt;/code&gt; command. But in practice, it’s rarely needed, if at all.
The agent keeps trying to deliver regardless of the constraints, and the context window isn’t something you should have to babysit during a long session.&lt;/p&gt;

&lt;p&gt;And when I want to test new models, there’s nothing more convenient than HuggingFace CPU-less inference. So I wanted that to be trivial as well.&lt;/p&gt;

&lt;p&gt;With Swival, it’s as easy as:&lt;/p&gt;

&lt;div class=&quot;language-sh highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;swival &lt;span class=&quot;nt&quot;&gt;--provider&lt;/span&gt; huggingface &lt;span class=&quot;nt&quot;&gt;--model&lt;/span&gt; zai-org/GLM-5.1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;agent-to-agent-is-too-useful-to-remain-niche&quot;&gt;Agent-to-Agent is too useful to remain niche&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://a2a-protocol.org&quot;&gt;A2A protocol&lt;/a&gt; is great.&lt;/p&gt;

&lt;p&gt;People who actually use it know how powerful it is, and they usually don’t want to go back to a single isolated agent.&lt;/p&gt;

&lt;p&gt;Unfortunately, for most people, A2A is still one of these things they have vaguely heard about but never really use, mostly because mainstream tools such as Claude Code still don’t support it.&lt;/p&gt;

&lt;p&gt;That’s a shame, because A2A changes what an agent can be.&lt;/p&gt;

&lt;p&gt;With A2A, you can run multiple agents with different configurations and different models, and let tasks naturally reach for the right one.&lt;/p&gt;

&lt;p&gt;So instead of stuffing documentation and skills into one local agent, you can have a dedicated documentation agent with direct access to the material, while other agents don’t need to carry all of that context.&lt;/p&gt;

&lt;p&gt;Then, when an agent needs to know how to do something, it asks the documentation agent to research it and return a concise, accurate answer instead of dumping blind &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;grep&lt;/code&gt; results.&lt;/p&gt;

&lt;p&gt;And of course, that specialized documentation agent can run a small, cheap, local model.&lt;/p&gt;

&lt;p&gt;That’s the larger idea. Don’t restrict yourself to one model, or even a tiny set of related models. Use as many models as you want, including open source ones, depending on what needs to be done.&lt;/p&gt;

&lt;p&gt;I wanted A2A to be simple enough that people would actually use it.&lt;/p&gt;

&lt;p&gt;Swival comes with &lt;a href=&quot;https://swival.dev/pages/a2a.html&quot;&gt;built-in support for A2A&lt;/a&gt; and can act both as a server and as a client.&lt;/p&gt;

&lt;p&gt;You can set up a network of specialized agents in minutes.&lt;/p&gt;

&lt;h2 id=&quot;open-source-readable-powerful-and-not-a-circus&quot;&gt;Open source, readable, powerful, and not a circus&lt;/h2&gt;

&lt;p&gt;Existing agents have become ridiculously bloated over time.&lt;/p&gt;

&lt;p&gt;Claude Code is enormous. It flickers. It crashes. New versions keep adding gadgets I don’t care about while making the whole thing even heavier.&lt;/p&gt;

&lt;p&gt;I don’t want a kitchen sink. I want a small, reliable tool.&lt;/p&gt;

&lt;p&gt;Swival is lean, fully open source, doesn’t depend on any company, and isn’t optimized around a specific provider. It’s totally free and open, and has nothing to sell.&lt;/p&gt;

&lt;p&gt;It’s written in Python because Python is simple, readable and maintainable. Nothing is obfuscated. Anyone can read the code, understand it, and modify it for their own needs.&lt;/p&gt;

&lt;p&gt;And this isn’t a toy. It’s a workhorse. A boring one, which is exactly what I want here.&lt;/p&gt;

&lt;p&gt;It focuses on the tools a developer actually needs, not on gimmicks. But it still has the features you would expect from a modern agent, and then some.&lt;/p&gt;

&lt;h2 id=&quot;benchmarking-needs-a-real-environment&quot;&gt;Benchmarking needs a real environment&lt;/h2&gt;

&lt;p&gt;I also wanted to run benchmarks.&lt;/p&gt;

&lt;p&gt;I wanted a tool to evaluate models, settings, skills, MCP servers, and similar pieces on real-world tasks, in an environment that actually resembles how a user works.&lt;/p&gt;

&lt;p&gt;A lot of benchmarking tools aren’t designed that way.&lt;/p&gt;

&lt;p&gt;They either assume tools optimized for specific models, or they provide an environment that doesn’t feel much like the real thing.&lt;/p&gt;

&lt;p&gt;And if you want to learn anything useful from evaluations, you need traces. Detailed ones. Accurate ones.&lt;/p&gt;

&lt;p&gt;You need to be able to look at what happened and understand how a model behaved under different conditions.&lt;/p&gt;

&lt;p&gt;Swival comes with strong &lt;a href=&quot;https://swival.dev/pages/reports.html&quot;&gt;reporting&lt;/a&gt; features. Combined with &lt;a href=&quot;https://calibra.swival.dev&quot;&gt;calibra&lt;/a&gt;, you can compare traces, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;diff&lt;/code&gt; them, and run evaluations that are actually meaningful.&lt;/p&gt;

&lt;p&gt;Evaluating many configurations can burn through a lot of tokens.&lt;/p&gt;

&lt;p&gt;That’s yet another reason I cared so much about making the agent work well with open source models running locally. For evaluations, cost is often more important than wall-clock speed.&lt;/p&gt;

&lt;h2 id=&quot;llms-fail-on-the-first-try&quot;&gt;LLMs fail on the first try&lt;/h2&gt;

&lt;p&gt;The real problem is that models produce terrible code and then confidently tell you everything is fine.&lt;/p&gt;

&lt;p&gt;Watching a model generate code is impressive. It’s hard not to be impressed when you type a prompt and a feature, or sometimes a whole project, appears in one shot.&lt;/p&gt;

&lt;p&gt;And the final report is always soothing. Everything is done. Everything works.&lt;/p&gt;

&lt;p&gt;Of course.&lt;/p&gt;

&lt;p&gt;The reality is that AI-generated code is almost always poor quality.&lt;/p&gt;

&lt;p&gt;It may compile. It may appear to work. But from a correctness perspective, it’s often terrible.&lt;/p&gt;

&lt;p&gt;You may be very happy with the code generated by Claude Code with Opus 4.6 max pro high thinking max, and maybe even want to deploy it to production, merge it into open source projects, or write triumphant blog posts about it.&lt;/p&gt;

&lt;p&gt;But there’s a good chance that the code is inefficient, buggy, hard to maintain, and going to cost you later.&lt;/p&gt;

&lt;p&gt;There’s a trivial experiment anyone can try.&lt;/p&gt;

&lt;p&gt;Ask your favorite agent to generate code, or even just a plan.&lt;/p&gt;

&lt;p&gt;Then, in a separate environment, ask another AI agent, even one running the same model, to review that code or plan.&lt;/p&gt;

&lt;p&gt;It’s very likely to find issues immediately. Sometimes critical ones.&lt;/p&gt;

&lt;p&gt;As much as I like AI, in my own projects I refuse pull requests blindly generated by tools such as Claude Code for exactly that reason. And in a company context, I wouldn’t deploy that output to production either.&lt;/p&gt;

&lt;p&gt;There are two ways to significantly improve quality and confidence.&lt;/p&gt;

&lt;p&gt;First, write the tests first, then force the agent not to declare the task complete until the tests pass.&lt;/p&gt;

&lt;p&gt;The tests don’t even have to be part of the application’s formal test suite. A simple shell script with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;curl&lt;/code&gt; commands can be enough.&lt;/p&gt;

&lt;p&gt;What matters is that this becomes a contract the agent has to satisfy.&lt;/p&gt;

&lt;p&gt;That’s much stricter than a prompt, because a contract can’t be hand-waved away or interpreted creatively.&lt;/p&gt;

&lt;p&gt;Second, use a loop with an LLM-as-a-judge.&lt;/p&gt;

&lt;p&gt;Let one agent write code, documentation or a plan. Then let another agent review that work against the original instructions, and force the implementer to retry until the reviewer thinks it’s correct.&lt;/p&gt;

&lt;p&gt;Swival makes both of these approaches trivial because they’re built in.&lt;/p&gt;

&lt;p&gt;Before starting a task, you can give the agent a script that will act as a reviewer.&lt;/p&gt;

&lt;p&gt;That reviewer can be another &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;swival&lt;/code&gt; instance with a custom configuration. Or, even more simply, you can start with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--self-review&lt;/code&gt;, and tasks will be reviewed by the same instance and same model in a dedicated context.&lt;/p&gt;

&lt;p&gt;There’s nothing else to wire together.&lt;/p&gt;

&lt;p&gt;One of the most interesting things to watch is how bad the initial output of an LLM agent can be, especially for code, and how honest and picky a model can suddenly become when it’s reviewing its own output without realizing it.&lt;/p&gt;

&lt;p&gt;After a couple of iterations, the code, plan or documentation is often far better than the first attempt.&lt;/p&gt;

&lt;p&gt;This is also one of the main reasons I wrote a new agent at all.&lt;/p&gt;

&lt;p&gt;I don’t want to use AI to generate a mountain of code just so I can brag about productivity if the result is unreliable, insecure and unmaintainable.&lt;/p&gt;

&lt;p&gt;I wanted an agent that optimizes for quality rather than raw time savings.&lt;/p&gt;

&lt;p&gt;It can be slow. It can be expensive. But I want the output to be something I can trust and deploy to production.&lt;/p&gt;

&lt;h2 id=&quot;long-sessions-shouldnt-make-the-agent-dumber&quot;&gt;Long sessions shouldn’t make the agent dumber&lt;/h2&gt;

&lt;p&gt;Another thing I wanted was continuity.&lt;/p&gt;

&lt;p&gt;I wanted an agent that remembers what I did before, and what it did before.&lt;/p&gt;

&lt;p&gt;When I come back to the same project the next day, I want the agent to remember prior work without filling the live context with junk.&lt;/p&gt;

&lt;p&gt;Swival does that in a way that feels much more natural than in other agents.&lt;/p&gt;

&lt;p&gt;I also wanted it to stop making the same mistakes twice.&lt;/p&gt;

&lt;p&gt;So Swival has a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/learn&lt;/code&gt; command: at the end of a session, the agent can reflect on the issues it ran into and write concise instructions about how to avoid repeating them.
And once those learnings exist, it will keep updating them automatically.&lt;/p&gt;

&lt;p&gt;That has turned out to be much more effective than premade agent skills.
Or, more accurately, it’s an extremely effective way to produce the right skills, because the agent discovers what it actually needs from real sessions instead of from speculation.&lt;/p&gt;

&lt;h2 id=&quot;goals&quot;&gt;Goals&lt;/h2&gt;

&lt;p&gt;One of the most frustrating things about working with an agent on a non-trivial task is how often it declares victory too early.&lt;/p&gt;

&lt;p&gt;It runs for a few turns, fixes something, and then politely tells you it’s done. Except it’s not. There are still failing tests, missing pieces, or edge cases it quietly decided weren’t worth its time.&lt;/p&gt;

&lt;p&gt;You re-prompt. It does a bit more work. It tells you it’s done again. And so on, until you give up and finish the task yourself.&lt;/p&gt;

&lt;p&gt;People have been working around this with the so-called Ralph-style loop, which essentially means feeding the same prompt back to the agent in a shell loop until it stops finding things to do. It works, but it’s clumsy and wasteful.&lt;/p&gt;

&lt;p&gt;I wanted something more structured, built directly into the agent.&lt;/p&gt;

&lt;p&gt;In Swival, you set an objective with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/goal &amp;lt;objective&amp;gt;&lt;/code&gt; in the REPL, and the agent stays on task across turns. The original objective is fed back to the model after every answer, and the loop only ends when the agent itself signals that the goal is complete after a real evidence-based audit, declares a blocker that needs your input, or hits an optional token budget.&lt;/p&gt;

&lt;p&gt;The agent doesn’t get to walk away after one turn just because it feels good about its work.&lt;/p&gt;

&lt;p&gt;This makes it practical to point Swival at ambitious long-running tasks such as refactors, audits or end-to-end fixes, and let it grind for hours without giving up halfway. You can pause, resume, replace, or clear the goal at any time.&lt;/p&gt;

&lt;p&gt;It’s a small feature on paper, but in practice it changes what you can reasonably ask an agent to do.&lt;/p&gt;

&lt;h2 id=&quot;modern-features-but-without-the-usual-mess&quot;&gt;Modern features, but without the usual mess&lt;/h2&gt;

&lt;p&gt;Skills, MCP, parallel subagents and similar capabilities are table stakes for serious agent use now. Of course Swival supports all of that.&lt;/p&gt;

&lt;p&gt;But I also wanted it to avoid the usual security and reliability mistakes.&lt;/p&gt;

&lt;p&gt;So tool and MCP output are explicitly tagged as untrusted in order to reduce prompt-injection risk.&lt;/p&gt;

&lt;p&gt;And markdown comments are ignored, so what you see in a rendered skill isn’t different from what the agent actually interprets.&lt;/p&gt;

&lt;p&gt;There’s another common failure mode I have always found silly.&lt;/p&gt;

&lt;p&gt;If an MCP command or tool returns a large output, many agents either stuff the whole thing into the context window or fail in some awkward way.&lt;/p&gt;

&lt;p&gt;I wanted an agent that handles that properly. Swival writes large outputs to a temporary file and lets the agent access them in chunks later instead.&lt;/p&gt;

&lt;p&gt;I also wanted a clean way to share files such as agent memories and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AGENTS.md&lt;/code&gt; across multiple devices working on the same project, without committing them into a Git repository.&lt;/p&gt;

&lt;p&gt;Swival has &lt;a href=&quot;https://swival.dev/pages/lifecycle-hooks.html&quot;&gt;lifecycle hooks&lt;/a&gt; specifically for that sort of workflow.&lt;/p&gt;

&lt;p&gt;Arbitrary commands are easy too.
In &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;~/.config/swival/commands/&lt;/code&gt;, you can place either scripts or plain files. Then &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;! command_name&lt;/code&gt; will inject either the content of the file or the output of the script into the prompt.&lt;/p&gt;

&lt;p&gt;Yes, other agents have versions of this.
But I wanted it to be trivial from a user perspective.&lt;/p&gt;

&lt;p&gt;Not five overlapping systems with five different names for basically the same thing. Just one simple mechanism.&lt;/p&gt;

&lt;p&gt;The same goes for shell command inspection and rewriting.&lt;/p&gt;

&lt;p&gt;I didn’t want people to need to learn some complicated generic hook system.&lt;/p&gt;

&lt;p&gt;In Swival, enabling &lt;a href=&quot;https://swival.dev/pages/command-middleware.html&quot;&gt;command middleware&lt;/a&gt; is safe and straightforward.&lt;/p&gt;

&lt;p&gt;And more importantly, I wanted the agent to be usable programmatically, not just from the CLI.&lt;/p&gt;

&lt;p&gt;I also didn’t want that to require a separate SDK with its own worldview and its own behavior. 
Everything the CLI agent does should be accessible in a consistent way from Python code.&lt;/p&gt;

&lt;p&gt;This is why Swival can be used as a CLI, but also as a library. It exposes &lt;a href=&quot;https://swival.dev/pages/python-api.html&quot;&gt;a very simple API&lt;/a&gt; so anyone can build custom agents, or more general applications, on top of a batteries-included agentic environment.&lt;/p&gt;

&lt;h2 id=&quot;small-things-matter&quot;&gt;Small things matter&lt;/h2&gt;

&lt;p&gt;Some of the things I cared about aren’t glamorous.&lt;/p&gt;

&lt;p&gt;They’re just the kind of rough edges that make a daily tool annoying.&lt;/p&gt;

&lt;p&gt;For example, markdown rendering for LLM output looks nice, but I dislike the fact that copy-pasting rendered output often strips the markdown markers.&lt;/p&gt;

&lt;p&gt;I also don’t love the idea of an agent accidentally deleting files.&lt;/p&gt;

&lt;p&gt;These are small things. But they matter if you actually use the tool every day.&lt;/p&gt;

&lt;p&gt;Swival renders LLM markdown output while preserving the formatting tags.&lt;/p&gt;

&lt;p&gt;So the output looks good, but can still be copied and pasted without losing the markdown.&lt;/p&gt;

&lt;p&gt;And even in full YOLO mode, it has safety guards against dangerous commands, plus built-in support for the AgentFS copy-on-write filesystem overlay.&lt;/p&gt;

&lt;p&gt;Also, when a file is deleted using Swival’s own tools, it isn’t actually deleted. It’s moved to a Trash directory instead.&lt;/p&gt;

&lt;p&gt;I have never personally seen an agent delete the wrong file. But if it ever does happen, I want recovery to be possible.&lt;/p&gt;

&lt;h2 id=&quot;why-i-use-it&quot;&gt;Why I use it&lt;/h2&gt;

&lt;p&gt;At this point, I use Swival almost exclusively. It’s reliable, and I’m happy with the output I get from it.&lt;/p&gt;

&lt;p&gt;I use open source models as much as possible, both locally and via HuggingFace inference endpoints.&lt;/p&gt;

&lt;p&gt;But when I need a frontier model, I use it with GPT-5.5. This is a fantastic model.
It works amazingly well with a regular ChatGPT subscription, and I have never hit any usage limit.&lt;/p&gt;

&lt;p&gt;If you are happy with your current agent, there are no reasons to switch.&lt;/p&gt;

&lt;p&gt;But I would still strongly encourage you to try Swival. Maybe even use it alongside the agent you already use.&lt;/p&gt;

&lt;p&gt;Because even with the very same models, it’s likely to find bugs your other agent never found.
That isn’t magic: different agents expose different environments to models, and models behave differently in those environments.&lt;/p&gt;

&lt;p&gt;I have used &lt;a href=&quot;https://swival.dev/pages/audit.html&quot;&gt;the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;/audit&lt;/code&gt; built-in command&lt;/a&gt; and it found bugs and vulnerabilities in pretty much every code base I tried it on, including code bases that had already been audited with Codex and Claude Code.&lt;/p&gt;

&lt;p&gt;That’s the kind of difference I care about. Whether the tool helps me find real problems and ship better work.&lt;/p&gt;

&lt;p&gt;So yes, give it a try.&lt;/p&gt;

&lt;h2 id=&quot;whats-next&quot;&gt;What’s next&lt;/h2&gt;

&lt;p&gt;Version 1.0.0 doesn’t mean Swival is done.&lt;/p&gt;

&lt;p&gt;It means it now checks all the boxes from my original todo list, and it’s stable enough for daily use.&lt;/p&gt;

&lt;p&gt;The API is also unlikely to see major breaking changes, so it’s something you can reasonably rely on if you want to build applications and agents on top of it.&lt;/p&gt;

&lt;p&gt;There are still many planned changes and features.
But Swival will remain driven by real-world usage and user feedback, not by a roadmap for the sake of having a roadmap.&lt;/p&gt;

&lt;p&gt;Related projects are going to keep expanding as well.&lt;/p&gt;

&lt;p&gt;Right now, they’re:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://calibra.swival.dev&quot;&gt;Calibra&lt;/a&gt;, to evaluate models, MCP servers, skills, and related configurations&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/Swival/skillscheck&quot;&gt;Skillscheck&lt;/a&gt;, a linter for agent &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;SKILL.md&lt;/code&gt; files&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://marketplace.visualstudio.com/items?itemName=jedisct1.agent-skill-lint&quot;&gt;Agent Skill Lint&lt;/a&gt;, a Visual Studio extension for agent skills linting&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://github.com/Swival/swival-commands&quot;&gt;Swival commands&lt;/a&gt;, a repository for user-contributed commands&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Swival exists because I wanted an agent that takes privacy seriously, works with the models I actually want to run, gets better over long sessions, and pushes code quality up instead of pretending quality doesn’t matter.&lt;/p&gt;

&lt;p&gt;Most agents still optimize for marketing. I wanted one that optimizes for the work.&lt;/p&gt;

&lt;p&gt;That’s Swival.&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>Configuration flags are where software goes to rot</title>
    <link href="https://00f.net/2026/04/11/config-flags/"/>
   <updated>2026-04-11T00:00:00+02:00</updated>
   <id>https://00f.net/2026/04/11/config-flags</id>
   <content type="html">&lt;p&gt;People love configurable software.&lt;/p&gt;

&lt;p&gt;They say flexibility is always good. More flags, more knobs, more environment variables, more ways to make the software fit every possible use case.&lt;/p&gt;

&lt;p&gt;But in practice, configuration flags are often just a polite way to ship uncertainty.&lt;/p&gt;

&lt;p&gt;A feature is added, but no one is completely sure it should be enabled by default. So, it gets a flag.&lt;/p&gt;

&lt;p&gt;A behavior is changed, but backward compatibility is scary. So, it gets a flag.&lt;/p&gt;

&lt;p&gt;Two users want opposite things. So, both paths stay in the code forever, behind a flag.
At first sight, this looks reasonable.&lt;/p&gt;

&lt;p&gt;Of course, a flag can be useful. Experimental features need a way to be tested. Migrations sometimes need a temporary escape hatch. And some software is genuinely used in environments that are different enough to justify a couple switches.&lt;/p&gt;

&lt;p&gt;But temporary flags are rarely temporary.
Once a flag exists, it starts attracting dependencies.&lt;/p&gt;

&lt;p&gt;Documentation has to mention it. Support has to ask whether it is enabled. Bug reports have to include it. Tests need to cover both states. New features have to decide which side they are compatible with. And if the flag affects a file format, a protocol, or anything persisted, removing it later becomes painful.
That is the real cost.&lt;/p&gt;

&lt;p&gt;The code behind a flag is not one feature. It is two possible worlds that the maintainers now have to keep alive.&lt;/p&gt;

&lt;p&gt;This gets worse when flags are not independent.
One flag changes timeouts. Another one changes buffering. Another one changes concurrency. Individually, each sounds harmless. Together, they create a configuration space no one has actually tested.&lt;/p&gt;

&lt;p&gt;Users then report that “it doesn’t work” on a setup that technically should be supported, but only with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;FAST_MODE=0&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;LEGACY_IO=1&lt;/code&gt;, the old parser, and a kernel old enough to vote.&lt;/p&gt;

&lt;p&gt;Nobody designed that combination. It just happened.
And now it is your problem.&lt;/p&gt;

&lt;p&gt;A lot of software teams treat flags as free because adding a boolean looks cheap.
It isn’t.&lt;/p&gt;

&lt;p&gt;A boolean in the interface usually means a branching factor in maintenance.&lt;/p&gt;

&lt;p&gt;This is especially obvious in open source.
If a user asks for a niche feature, adding a flag feels like a compromise. The maintainer doesn’t have to bless the behavior as the new normal, and the user gets what they want.&lt;/p&gt;

&lt;p&gt;But what actually happened is that the maintainer accepted long-term responsibility for behavior they may never use themselves.&lt;/p&gt;

&lt;p&gt;The contributor will disappear.
The flag will stay.&lt;/p&gt;

&lt;p&gt;And five years later, some poor soul will ask why &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--compat-relaxed-fsync&lt;/code&gt; cannot be combined with the new backend on FreeBSD.
Because software has memory.&lt;/p&gt;

&lt;p&gt;The scary part is that flags often hide design problems.&lt;/p&gt;

&lt;p&gt;If users regularly need a flag to disable a subsystem, maybe that subsystem is too eager.&lt;/p&gt;

&lt;p&gt;If performance requires half a dozen tuning variables, maybe the defaults are bad or the architecture is brittle.&lt;/p&gt;

&lt;p&gt;If a migration needs three generations of compatibility toggles, maybe the old behavior was never clearly isolated from the new one.&lt;/p&gt;

&lt;p&gt;Flags can solve real problems.
But they can also keep bad design alive by preventing the moment when someone has to say: this behavior was wrong, and it has to go.&lt;/p&gt;

&lt;p&gt;Sure, removing options can upset people.
But keeping everything forever quietly upsets the maintainers instead.&lt;/p&gt;

&lt;p&gt;That cost is less visible, because it shows up as hesitation, slower releases, defensive coding, weird bugs, and documentation that reads like legal terms.&lt;/p&gt;

&lt;p&gt;So, should software have no flags at all?
Obviously not.&lt;/p&gt;

&lt;p&gt;But flags should have the same status as debt: sometimes necessary, never free, and always suspicious.&lt;/p&gt;

&lt;p&gt;Every new flag should come with an expiration story.&lt;/p&gt;

&lt;p&gt;Why does it exist? Who needs it? What breaks if it goes away? When will that be acceptable?
If nobody can answer these questions, the flag is probably not a feature.&lt;/p&gt;

&lt;p&gt;It’s a fossil in progress.&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>Fast sorting, branchless by design</title>
    <link href="https://00f.net/2026/02/17/sorting-without-leaking-secrets/"/>
   <updated>2026-02-17T00:00:00+01:00</updated>
   <id>https://00f.net/2026/02/17/sorting-without-leaking-secrets</id>
   <content type="html">&lt;p&gt;Sorting is one of the most studied problems in computer science. Every language ships a built-in sort, and for most applications, picking the right one is a matter of performance. Quicksort, mergesort, pdqsort. They all produce the correct output. The main question is how fast they get there.&lt;/p&gt;

&lt;p&gt;But there’s a class of applications where correctness and speed aren’t enough. When the data being sorted is sensitive, the sort itself can become a security vulnerability. Not because it produces the wrong result, but because an attacker who observes execution time can learn something about the data.&lt;/p&gt;

&lt;p&gt;This is not theoretical. Post-quantum cryptosystems like Classic McEliece and NTRU Prime use sorting as a core subroutine, and if the sort leaks timing information, the entire cryptosystem is compromised.&lt;/p&gt;

&lt;p&gt;There is, however, a family of algorithms that eliminates this problem entirely, and when implemented well, can even outperform traditional sorts.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;why-sorting-leaks-information&quot;&gt;Why sorting leaks information&lt;/h2&gt;

&lt;p&gt;Consider quicksort.&lt;/p&gt;

&lt;p&gt;It picks a pivot, partitions the array around it, and recurses. Which pivot gets chosen, how the partitioning plays out, and how deep the recursion goes all depend on the values in the array. Different inputs produce different branch patterns, different memory access sequences, and different execution times.&lt;/p&gt;

&lt;p&gt;An attacker who can measure execution time, even remotely over a network, can use those timing differences to deduce information about what’s being sorted. This is a timing side-channel attack.&lt;/p&gt;

&lt;p&gt;The problem isn’t specific to quicksort. Any sort whose control flow depends on the data is vulnerable. Mergesort’s merge step branches on comparisons. Insertion sort’s inner loop length depends on how far each element needs to move. Even pdqsort, which combines several strategies, makes data-dependent decisions at every level.&lt;/p&gt;

&lt;p&gt;What we need is a sort where the sequence of operations is completely fixed: determined only by the array length, never by the values. The same comparisons, the same memory accesses, the same number of instructions, regardless of whether the array contains &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[1, 2, 3]&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[3, 1, 2]&lt;/code&gt;.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;sorting-networks&quot;&gt;Sorting networks&lt;/h2&gt;

&lt;p&gt;A sorting network does exactly this. It’s a fixed circuit of compare-and-swap operations wired together in a specific pattern. Each comparator takes two values and outputs them in order: the smaller one goes to one wire, the larger one to the other. The entire network is determined at construction time, before any data is seen.&lt;/p&gt;

&lt;p&gt;The simplest sorting network sorts two elements. It’s a single comparator: take two values, output &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(min, max)&lt;/code&gt;. Done. For three elements, you need three comparators. For four, you need five. The pattern grows, but the key property remains: the comparisons don’t depend on the data. The same wires get compared in the same order every time.&lt;/p&gt;

&lt;p&gt;This is what makes sorting networks attractive for cryptography. The comparison pattern is data-oblivious: an attacker watching the execution sees the exact same sequence of operations regardless of what’s being sorted.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;bubble-sort-as-a-sorting-network&quot;&gt;Bubble sort as a sorting network&lt;/h2&gt;

&lt;p&gt;It turns out that bubble sort, often dismissed as the worst sorting algorithm, is already a sorting network.&lt;/p&gt;

&lt;p&gt;Think about what bubble sort actually does. On each pass, it compares adjacent pairs from left to right: elements &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt;, then &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;1&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2&lt;/code&gt;, then &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;2&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;3&lt;/code&gt;, and so on. If a pair is out of order, it swaps them. After &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n-1&lt;/code&gt; passes over an array of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n&lt;/code&gt; elements, the array is sorted.&lt;/p&gt;

&lt;p&gt;The comparison pattern is always the same. It doesn’t matter what values are in the array: bubble sort always compares the same pairs in the same order. It’s a fixed schedule of compare-and-swap operations. That’s exactly the definition of a sorting network.&lt;/p&gt;

&lt;p&gt;The problem is efficiency. Bubble sort needs &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n)&lt;/code&gt; passes, each with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n)&lt;/code&gt; comparisons, for a total of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n²)&lt;/code&gt; comparators. For tiny arrays that fit in L1/L2 caches, bubble sort can be surprisingly fine in practice. But for large arrays, it’s far too slow.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;odd-even-transposition-sort&quot;&gt;Odd-even transposition sort&lt;/h2&gt;

&lt;p&gt;A natural improvement is to parallelize. In odd-even transposition sort, instead of sweeping left-to-right, you alternate between two phases:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Phase A: compare and swap pairs &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(0,1), (2,3), (4,5), ...&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;Phase B: compare and swap pairs &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(1,2), (3,4), (5,6), ...&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Within each phase, all the comparisons are independent: they operate on disjoint pairs of elements, so they can all happen simultaneously.
And after &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n&lt;/code&gt; phases, the array is sorted.&lt;/p&gt;

&lt;p&gt;On a single processor, this is no faster than bubble sort. But it demonstrates something important about sorting networks: they expose parallelism. All comparisons within a phase are independent, which is a property that data-dependent sorts like quicksort fundamentally cannot offer, because each comparison depends on the results of previous ones.&lt;/p&gt;

&lt;p&gt;Still, odd-even transposition sort has &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n)&lt;/code&gt; depth (sequential phases) and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n²)&lt;/code&gt; total comparators. We can do much better.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;bitonic-sequences&quot;&gt;Bitonic sequences&lt;/h2&gt;

&lt;p&gt;The key insight that leads to efficient sorting networks comes from a special kind of sequence: a bitonic sequence.&lt;/p&gt;

&lt;p&gt;A bitonic sequence is one that first increases and then decreases like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[1, 4, 7, 5, 2]&lt;/code&gt;. It can also be a cyclic rotation of such a sequence, so &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;[5, 2, 1, 4, 7]&lt;/code&gt; is also bitonic (it falls, then rises, which is the same shape rotated).&lt;/p&gt;

&lt;p&gt;Why do we care? Because a bitonic sequence of length &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n&lt;/code&gt; can be sorted by a fixed network with only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(log n)&lt;/code&gt; depth.&lt;/p&gt;

&lt;p&gt;The trick is called a bitonic merger, and it works like this: compare each element in the first half with the corresponding element in the second half (elements at distance &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n/2&lt;/code&gt;). This single layer of comparators splits the bitonic sequence into two smaller bitonic sequences, one in each half. Recurse on each half, halving the distance each time. After &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;log n&lt;/code&gt; layers, the sequence is sorted.&lt;/p&gt;

&lt;p&gt;A bitonic merger is a small, efficient, fixed-structure circuit. It is the building block for sorting any array, not just bitonic ones.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;batchers-bitonic-sort&quot;&gt;Batcher’s bitonic sort&lt;/h2&gt;

&lt;p&gt;In 1968, Ken Batcher &lt;a href=&quot;https://www.cs.kent.edu/~batcher/sort.pdf&quot;&gt;showed&lt;/a&gt; how to use the bitonic merger to build a full sorting network. The construction is recursive:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Split the array in half.&lt;/li&gt;
  &lt;li&gt;Recursively sort the first half in ascending order and the second half in descending order.&lt;/li&gt;
  &lt;li&gt;The result is a bitonic sequence. Merge it with the bitonic merger.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For clarity, assume &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n&lt;/code&gt; is a power of two. Practical implementations handle arbitrary lengths with padding or tailored networks.&lt;/p&gt;

&lt;p&gt;Here’s a concrete example with 8 elements. Start with:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[6, 3, 7, 1, 5, 2, 8, 4]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After recursively sorting the first half ascending and the second half descending (which itself applies the same construction on smaller subarrays), we get:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[1, 3, 6, 7,  8, 5, 4, 2]
 ascending    descending
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is a bitonic sequence: it rises then falls. Now we apply the bitonic merger. The rule is simple: for each pair &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;(i, i+d)&lt;/code&gt;, keep the minimum at position &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i&lt;/code&gt; and the maximum at position &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i+d&lt;/code&gt;. Halve &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;d&lt;/code&gt; after each layer:&lt;/p&gt;

&lt;div class=&quot;language-text highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;    positions: [0, 1, 2, 3, 4, 5, 6, 7]

        input: [1, 3, 6, 7, 8, 5, 4, 2]

d = 4   pairs: (0,4), (1,5), (2,6), (3,7)
          out: [1, 3, 4, 2, 8, 5, 6, 7]

d = 2   pairs: (0,2), (1,3), (4,6), (5,7)
          out: [1, 2, 4, 3, 6, 5, 8, 7]

d = 1   pairs: (0,1), (2,3), (4,5), (6,7)
          out: [1, 2, 3, 4, 5, 6, 7, 8]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Three layers, four comparisons each, and the array is sorted. The comparison schedule was the same regardless of what values were in the array: only the array length determined which pairs got compared.&lt;/p&gt;

&lt;p&gt;How many comparisons does this take? Each level of recursion halves the array, so there are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(log n)&lt;/code&gt; levels. At each level, the mergers collectively cover all &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n&lt;/code&gt; elements across &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(log n)&lt;/code&gt; layers. That’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n log n)&lt;/code&gt; comparisons per level, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(log n)&lt;/code&gt; levels, for a total of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n log²n)&lt;/code&gt; comparators.&lt;/p&gt;

&lt;p&gt;This is more work than quicksort’s &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n log n)&lt;/code&gt;. But the comparison pattern is completely fixed, which is exactly the property we need for constant-time sorting.
And &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n log²n)&lt;/code&gt; is close enough to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n log n)&lt;/code&gt; to be practical. The extra log factor is a modest price for side-channel resistance. More importantly, layers are made of nothing but simple, disjoint operations that can run in parallel.&lt;/p&gt;

&lt;p&gt;This structure is also a natural fit for GPUs. GPUs execute threads in lockstep groups, and when threads within a group take different branches, performance collapses. Data-dependent sorts like quicksort suffer from this. Sorting networks avoid it entirely: every thread executes the same compare-and-swap regardless of the data. Bitonic sort has been one of the standard GPU sorting algorithms for this reason, and is included in NVIDIA’s CUDA samples as a textbook example.&lt;/p&gt;

&lt;p&gt;Note that there exists a theoretically optimal &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n log n)&lt;/code&gt; sorting network, the AKS network, but its constants are so enormous that it’s unusable in practice. As Knuth put it, Batcher’s method is better unless &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;n&lt;/code&gt; exceeds the total memory capacity of all computers on earth.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;vectorizing-sorting-networks&quot;&gt;Vectorizing sorting networks&lt;/h2&gt;

&lt;p&gt;Vectorized merge-exchange sorting networks have been around since the 1980s. The idea is natural: since all comparisons within a layer operate on disjoint pairs, the CPU can exploit this independence in two ways: SIMD instructions process multiple pairs per instruction, and out-of-order execution overlaps independent scalar operations. The sorting network’s structure exposes both forms of parallelism naturally.&lt;/p&gt;

&lt;p&gt;Daniel J. Bernstein’s &lt;a href=&quot;https://sorting.cr.yp.to/&quot;&gt;sorting library&lt;/a&gt; takes this further, squeezing more speed out of these networks on current CPUs and adding formal verification tooling that operates on the compiled binary.&lt;/p&gt;

&lt;p&gt;In a vectorized Batcher network, the wire ordering is rearranged to be SIMD-friendly.&lt;/p&gt;

&lt;p&gt;The core primitive is still the compare-and-swap, but there are two ways to implement it.&lt;/p&gt;

&lt;p&gt;The fast path uses hardware min/max instructions. On modern CPUs, packed SIMD min and max operate on entire vectors of elements at once and are inherently branchless. One instruction compares 4, 8, or 16 pairs simultaneously with no data-dependent control flow.&lt;/p&gt;

&lt;p&gt;The fallback path uses pure arithmetic. It subtracts the two values to get a difference, extracts the sign bit via a right shift, negates it to produce either an all-ones or all-zeros mask, then XOR-swaps the values conditioned on that mask. Unless the compiler is able to understand and optimize that pattern, we’ll get no branches, no data-dependent memory accesses, and only operations that are expected to run in constant time on the target architecture. The same operations execute regardless of whether the pair was already in order.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;a-zig-implementation&quot;&gt;A Zig implementation&lt;/h2&gt;

&lt;p&gt;I &lt;a href=&quot;https://github.com/jedisct1/zig-ctsort&quot;&gt;ported djbsort to Zig&lt;/a&gt;. The library exposes two functions.&lt;/p&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sort&lt;/code&gt; is the fast path for native numeric types: integers and floats of any width:&lt;/p&gt;

&lt;div class=&quot;language-zig highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctsort&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;@import&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;ctsort&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;data&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;42&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;13&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;99&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ctsort&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sort&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;asc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;data&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;// data is now { -99, -7, 0, 13, 42 }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sortWith&lt;/code&gt; handles arbitrary types, including structs. It follows the same interface as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;std.sort.pdq&lt;/code&gt;. You provide a comparison function and an optional context:&lt;/p&gt;

&lt;div class=&quot;language-zig highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Point&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;i32&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;points&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Point&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;};&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ctsort&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;sortWith&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Point&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;points&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{},&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lessThan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;void&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Point&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Point&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;bool&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;lessThan&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Both functions use the same Batcher sorting network topology. They differ in how the compare-and-swap is implemented.&lt;/p&gt;

&lt;p&gt;The native path has two levels of optimization. On modern CPUs with SIMD support, it processes elements in vector-wide chunks using packed min/max. For the leftover elements that don’t fill a full vector, it falls back to a scalar compare-and-swap.&lt;/p&gt;

&lt;p&gt;The scalar path is gated by a comptime flag called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;has_ct_minmax&lt;/code&gt;. When side-channel mitigations are disabled (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;side_channels_mitigations == .none&lt;/code&gt;), it’s always true. In that specific case, we don’t need constant-time code, so &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@min&lt;/code&gt;/&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@max&lt;/code&gt; are fine on any architecture. But when mitigations are enabled (and by default, they are), it’s only true on x86/x86_64 and aarch64, where &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@min&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;@max&lt;/code&gt; have been verified to lower to branchless instructions (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;cmov&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;csel&lt;/code&gt;). On other architectures, the code falls back to the arithmetic sign-bit extraction and XOR swap:&lt;/p&gt;

&lt;div class=&quot;language-zig highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;order&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;asc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b_int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a_int&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a_int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b_int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sign_bit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;u1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;@truncate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;@as&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;UWInt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;@bitCast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;diff&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;@intCast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bits&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mask_word&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;@as&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-%&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;@as&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sign_bit&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The subtraction produces a negative result when the pair is out of order. The sign bit is extracted via a right shift, then spread to a full-width mask via &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;0 -% sign_bit&lt;/code&gt; (wrapping subtraction: 0 minus 1 wraps to all-ones, 0 minus 0 stays zero). The mask drives an XOR swap of the raw bytes.&lt;/p&gt;

&lt;p&gt;The generic path (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sortWith&lt;/code&gt;) behaves differently depending on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;std.options.side_channels_mitigations&lt;/code&gt;. When mitigations are disabled, it uses a plain conditional swap: a branch and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;std.mem.swap&lt;/code&gt;. Fast, but not necessarily constant-time. When mitigations are enabled, it uses &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ctCondSwap&lt;/code&gt;, which XOR-masks the raw byte representation in machine-word-sized chunks. Even with mitigations enabled, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sortWith&lt;/code&gt; is only timing-safe if the caller-supplied comparison function is itself constant-time. A data-dependent &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;lessThan&lt;/code&gt; would leak through the comparator, not the swap.&lt;/p&gt;

&lt;p&gt;Both paths use the same optimization barrier to prevent the compiler from undoing the constant-time code:&lt;/p&gt;

&lt;div class=&quot;language-zig highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;mask_word&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;asm&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;volatile&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mask&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;=r&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;usize&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;0&quot;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mask_word&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is a no-op at the hardware level. It generates no instructions. But it tells the compiler that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mask_word&lt;/code&gt; might have been modified, which prevents LLVM from reasoning about its value and converting the branchless XOR swap back into a conditional branch. This is more efficient than &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;std.mem.doNotOptimizeAway(&amp;amp;mask)&lt;/code&gt; which includes a memory clobber.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;float-ordering&quot;&gt;Float ordering&lt;/h2&gt;

&lt;p&gt;IEEE 754 floats don’t have a total order. NaN is unordered. It’s not less than, equal to, or greater than anything, including itself. And &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-0.0&lt;/code&gt; compares equal to &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;+0.0&lt;/code&gt;, even though they have different bit representations.&lt;/p&gt;

&lt;p&gt;For a sorting network, we need a total order. The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sort&lt;/code&gt; function imposes one: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-NaN &amp;lt; -inf &amp;lt; ... &amp;lt; -0.0 &amp;lt; +0.0 &amp;lt; ... &amp;lt; +inf &amp;lt; +NaN&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The solution is djbsort’s “useint” technique: don’t sort the floats at all. Instead, reinterpret their bits as signed integers, apply a transform that makes integer comparison match the desired float ordering, sort using the integer sorting network, and transform back.&lt;/p&gt;

&lt;p&gt;The transform is a function called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;floatSortKey&lt;/code&gt;. Positive floats already sort correctly as integers: larger floats have larger bit values. Negative floats sort in the wrong direction: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-1.0&lt;/code&gt; has a larger bit pattern than &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-2.0&lt;/code&gt;, but should come after it. The fix is to arithmetic-right-shift the sign bit to fill the entire word (all-ones for negative, all-zeros for positive), AND with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;maxInt&lt;/code&gt; to clear the sign bit, and XOR. This flips the magnitude bits of negative values, reversing their order while leaving positive values unchanged:&lt;/p&gt;

&lt;div class=&quot;language-zig highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;fn&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;floatSortKey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;comptime&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SInt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SInt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SInt&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mask&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;@bitSizeOf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SInt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;^&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mask&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;math&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;maxInt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SInt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The transform is self-inverse: applying it twice gives back the original value. So the whole float sort is just a thin wrapper around the integer sort:&lt;/p&gt;

&lt;div class=&quot;language-zig highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;@typeInfo&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SInt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;std&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;meta&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;signed&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;@bitSizeOf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;int_items&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SInt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;@ptrCast&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;items&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;py&quot;&gt;ptr&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;int_slice&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;int_items&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;n&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;];&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;applyFloatSortKey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SInt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;int_slice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;  &lt;span class=&quot;c&quot;&gt;// O(n) transform, vectorized&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;sort&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SInt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;order&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;int_slice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;        &lt;span class=&quot;c&quot;&gt;// reuse the integer sort&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;applyFloatSortKey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SInt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;int_slice&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;  &lt;span class=&quot;c&quot;&gt;// O(n) reverse (self-inverse)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The pre/post transform is itself vectorized, processing elements in SIMD-width chunks. The cost is two linear passes over the array, negligible next to the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n log²n)&lt;/code&gt; sorting work. In benchmarks, float sorting runs within a few percent of integer sorting speed.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;Sorting networks trade a modest increase in total work, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n log²n)&lt;/code&gt; instead of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n log n)&lt;/code&gt;, for a property that mainstream sorts like quicksort and pdqsort cannot offer: a fixed, data-independent comparison schedule. SIMD vectorization and the instruction-level parallelism inherent in sorting networks make this practical.&lt;/p&gt;

&lt;p&gt;On Apple Silicon, the SIMD path (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sort&lt;/code&gt;) runs 2x to 5x faster than the standard library’s &lt;a href=&quot;https://github.com/orlp/pdqsort&quot;&gt;pdqsort&lt;/a&gt; across all types and sizes tested, from 16 to 1M elements. For &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;i32&lt;/code&gt; at 1M elements, the vectorized sorting network is still about 3x faster.&lt;/p&gt;

&lt;p&gt;On AMD Zen4, the speedups are even larger at small and medium sizes, up to 5.5x at 16K elements. At 1M elements the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;O(n log²n)&lt;/code&gt; work catches up, but the sorting network still wins by about 1.3x.&lt;/p&gt;

&lt;p&gt;Floats sort at the same speed as their integer counterparts on both architectures, within a few percent. The generic path (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;sortWith&lt;/code&gt;), which cannot use SIMD, beats pdq up to around 65K elements, then gradually loses as the extra log factor dominates.&lt;/p&gt;

&lt;p&gt;For a data-oblivious sort, being faster than a mainstream data-dependent sort across most practical sizes is a welcome bonus. And for the cryptographic use cases that motivate constant-time sorting, arrays tend to be well within that range.&lt;/p&gt;

&lt;p&gt;The code is on &lt;a href=&quot;https://github.com/jedisct1/zig-ctsort&quot;&gt;GitHub&lt;/a&gt;.&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>Don't pass on small block ciphers</title>
    <link href="https://00f.net/2026/02/10/small-block-ciphers/"/>
   <updated>2026-02-10T00:00:00+01:00</updated>
   <id>https://00f.net/2026/02/10/small-block-ciphers</id>
   <content type="html">&lt;p&gt;Although they are omnipresent in constrained environments and lightweight protocols, small (32-bit, 64-bit) block ciphers have a bad reputation. They are perceived as something antiquated, insecure, and to stay away from for any new application, especially in software implementations.&lt;/p&gt;

&lt;p&gt;To some extent, thinking that way is safe, and as a general building block, larger block ciphers are more versatile and provide improved security and usage limits than small block ciphers.
In fact, due to modern applications and protocols, the trend in some contexts is to realize that 128 bits are not enough, and shift to large block ciphers such as Rijndael-256 and &lt;a href=&quot;https://eprint.iacr.org/2025/976&quot;&gt;Vistrutah&lt;/a&gt; (&lt;a href=&quot;https://csrc.nist.gov/files/pubs/sp/800/197/iprd/docs/2_vistrutah.pdf&quot;&gt;NIST submission&lt;/a&gt;), part of NIST’s &lt;a href=&quot;https://csrc.nist.gov/pubs/sp/800/197/iprd&quot;&gt;wide-block standardization effort&lt;/a&gt;, or public permutations such as Keccak.&lt;/p&gt;

&lt;p&gt;The main problem with small block ciphers is that well… they are small. Generally, we want the set of possible inputs and outputs to be large enough that it’s practically impossible to enumerate. If that set is small, that doesn’t hold true. Generally, we also want the output of a cipher to be indistinguishable from random for an adversary. But with a 32-bit block cipher, collisions become likely around 2&lt;sup&gt;16&lt;/sup&gt; blocks for a given key, and the distinguishing advantage only grows from there, which is ridiculously low. Using them securely is possible, but requires very careful design of application-specific protocols.&lt;/p&gt;

&lt;p&gt;Nonetheless, small block ciphers can remain very useful, even in modern applications because in spite of their limitations, they remain block ciphers.&lt;/p&gt;

&lt;h2 id=&quot;block-ciphers&quot;&gt;Block ciphers&lt;/h2&gt;

&lt;p&gt;A block cipher is a fundamental building block in symmetric cryptography.&lt;/p&gt;

&lt;p&gt;Imagine a list of N elements. Then, you put them all in a bag, shake the bag really well, and randomly grab all the elements one by one, to form a new list. There are still N elements, just in a completely different order. The first one may now be the 23948234th one, etc.
We just applied a random-looking permutation.&lt;/p&gt;

&lt;p&gt;If we know what the permutation is, we can easily invert the process, and map element 23948234 back to the first one.&lt;/p&gt;

&lt;p&gt;A block cipher is a set of two functions:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;A function &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;P(k, x) -&amp;gt; x&apos;&lt;/code&gt; that takes a secret key &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;k&lt;/code&gt;, generates a random-looking permutation from it, and returns the image of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; under that permutation.&lt;/li&gt;
  &lt;li&gt;A function &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;P&apos;(k, x&apos;) -&amp;gt; x&lt;/code&gt; that applies the inverse permutation and recovers &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; from &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;P(k, x)&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without the secret key &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;k&lt;/code&gt;, the permutation cannot be recovered, so given &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&apos;&lt;/code&gt;, guessing what &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;x&lt;/code&gt; is requires brute-forcing the key space or, for small block sizes, exhausting the (limited) set of possible inputs, as discussed below.&lt;/p&gt;

&lt;h2 id=&quot;counters-and-other-small-inputs&quot;&gt;Counters and other small inputs&lt;/h2&gt;

&lt;p&gt;A common way to use block ciphers is to encrypt a counter. With a 32-bit block cipher, you can safely encrypt a counter from 0 to 2&lt;sup&gt;32&lt;/sup&gt; without getting a collision, as long as the counter doesn’t wrap around and the same key isn’t reused for a different counter domain. This is still counting, just in a shuffled order.&lt;/p&gt;

&lt;p&gt;And in real-world applications, counters are everywhere: packet/frame indices, account identifiers, message numbers, version numbers, and anything using a monotonically incrementing key in a database.&lt;/p&gt;

&lt;p&gt;The value of some of these counters may reveal sensitive information. Creating accounts on a website and observing changes in account identifiers is a pretty reliable way to guess how many customers the company has, and if that number is increasing or decreasing. Quite sensitive data for a public company. Similarly, opening support tickets and observing the ticket number leaks information about the company’s business.&lt;/p&gt;

&lt;p&gt;A common mitigation is to replace counters with UUID or random identifiers, even in environments providing strong consistency guarantees.&lt;/p&gt;

&lt;p&gt;That works, with large enough identifiers.&lt;/p&gt;

&lt;p&gt;With UUIDv4’s 122 random bits, generating ~2&lt;sup&gt;48&lt;/sup&gt; identifiers gives a collision probability around 2&lt;sup&gt;-27&lt;/sup&gt;, which is small but not negligible at scale.&lt;/p&gt;

&lt;p&gt;But with a 64-bit block cipher and a counter, up to 2&lt;sup&gt;64&lt;/sup&gt; identifiers can be safely generated with a collision probability of exactly 0.&lt;/p&gt;

&lt;p&gt;If more than 2&lt;sup&gt;32&lt;/sup&gt; identifiers are not necessary for your application, a 32-bit block cipher will do the same job, and keep storage requirements down to a minimum.&lt;/p&gt;

&lt;p&gt;Keep using a regular counter, encrypt it with a block cipher before making it public, and decrypt that later if needed to recover the original counter value.&lt;/p&gt;

&lt;p&gt;No trusted source of randomness is necessary, and no accurate clock is necessary. The encrypted values themselves won’t be ordered, but you can keep the original counter for internal indexing and only expose the encrypted form publicly.&lt;/p&gt;

&lt;h2 id=&quot;uuids-and-small-block-ciphers&quot;&gt;UUIDs and small block ciphers&lt;/h2&gt;

&lt;p&gt;What about UUIDs?&lt;/p&gt;

&lt;p&gt;UUIDv1 allows adversaries to recover the exact creation time (down to 100 ns resolution), the clock sequence behavior, and a unique identifier of the machine that generated it. The timestamp alone is enough to leak traffic patterns, record creation rates, and sometimes even enable correlation across systems. It’s also predictable. Because UUIDv1 is basically “timestamp + sequence,” once an attacker sees a few UUIDs, they can often predict nearby values. That makes UUIDv1 completely unsuitable as public identifiers and in anything used in URLs or APIs where guessing matters.&lt;/p&gt;

&lt;p&gt;UUIDv4 is 122 random bits (the remaining 6 are fixed version and variant markers).&lt;/p&gt;

&lt;p&gt;UUIDv6 is “ordered UUIDv1,” mostly for legacy. RFC 9562 basically says: v6 exists to improve DB locality for environments already using v1, but if you’re not tied to v1, you should use v7 instead.&lt;/p&gt;

&lt;p&gt;So what about UUIDv7? It’s explicitly time-ordered using a Unix-epoch millisecond timestamp, with the remaining bits used for randomness and/or a counter to keep monotonicity within the same millisecond.&lt;/p&gt;

&lt;p&gt;This gives you much better insertion locality than UUIDv4 in many B-tree-style indexes. However, this requires properly synchronized clocks, and v7 leaks creation time. RFC 9562 calls out that timestamps create a (small) attack surface, and if UUIDs are used for security-sensitive purposes, it recommends UUIDv4 instead.&lt;/p&gt;

&lt;p&gt;So, you can use UUIDv7 as the database primary key, but don’t treat it as secret.&lt;/p&gt;

&lt;p&gt;Meanwhile, the output of a block cipher hides the underlying value, while being much smaller than a UUID, which definitely matters at scale in databases. Being deterministic, it still leaks equality (same input produces same output), distinctness, and frequency patterns, but the actual values remain hidden.&lt;/p&gt;

&lt;p&gt;Small block ciphers work well with counters, but they work equally well with anything else that fits within the block size.&lt;/p&gt;

&lt;p&gt;And timestamps are no exception: if they represent sensitive information, a small block cipher is a simple, efficient, usually format-preserving way to encrypt and decrypt them.&lt;/p&gt;

&lt;p&gt;If you need UUIDs, why not combine them with a small block cipher, so that the timestamp is not leaked, while the UUIDs’ properties are retained?&lt;/p&gt;

&lt;p&gt;UUIDv7 is essentially a 48-bit timestamp followed by randomness plus version/variant bits, and optionally a counter to ensure monotonicity within the same millisecond.&lt;/p&gt;

&lt;p&gt;Concatenate a 48-bit timestamp with either a 16-bit counter or randomness, and 64 bits of extra randomness to form your UUIDs.&lt;/p&gt;

&lt;p&gt;Then, encrypt the first 64 bits with a 64-bit block cipher before exposing them publicly.&lt;/p&gt;

&lt;p&gt;The encrypted portion won’t be lexically ordered, but the original UUIDv7 can still serve as the internal primary key for indexing. Expose only the UUID with the encrypted prefix publicly, and decrypt when you need the original back.&lt;/p&gt;

&lt;h2 id=&quot;downsides&quot;&gt;Downsides&lt;/h2&gt;

&lt;p&gt;So far, small block ciphers sound pretty useful. But they do come with real limitations that you need to understand before reaching for one.&lt;/p&gt;

&lt;p&gt;The obvious downside of a block cipher is that, for a given key, after having observed M distinct outputs (encryption of M distinct inputs), we know that if we encrypt inputs that we didn’t see before, their encryption will not be part of the outputs we already observed.&lt;/p&gt;

&lt;p&gt;So, for a set of N possible outputs, after having observed M distinct outputs, when a new input is encrypted, the probability to guess its encryption is 1/(N-M) per attempt.&lt;/p&gt;

&lt;p&gt;This is where small block ciphers come short. If an adversary’s goal is to decrypt a ciphertext, and they have an oracle telling them if a guess is right, depending on the protocol, they may be able to do it.&lt;/p&gt;

&lt;p&gt;Small block ciphers are thus generally a bad idea against active adversaries.&lt;/p&gt;

&lt;p&gt;However, they can be very useful against passive adversaries whose capability is limited to observing identifiers, who are then unable to map them to the original value.&lt;/p&gt;

&lt;p&gt;Usage of small block ciphers is also not incompatible with traditional mechanisms for tamper detection such as MACs.&lt;/p&gt;

&lt;p&gt;Another downside is that, for any cipher, the secret key must remain secure. With random identifiers, only the random number generator needs to be secure.&lt;/p&gt;

&lt;h2 id=&quot;so-what-small-block-ciphers-should-i-use&quot;&gt;So, what small block ciphers should I use?&lt;/h2&gt;

&lt;p&gt;There are plenty of small block ciphers designed for hardware, that don’t perform as well in software.&lt;/p&gt;

&lt;p&gt;In software on general-purpose CPUs, and as scary as it may sound, the best options were designed by the NSA: the &lt;a href=&quot;https://eprint.iacr.org/2013/404.pdf&quot;&gt;SIMON and SPECK&lt;/a&gt; Families of Lightweight Block Ciphers.&lt;/p&gt;

&lt;p&gt;While their design is not radically different from well-known, well-trusted ciphers, there was originally a lot of pushback due to their origin.&lt;/p&gt;

&lt;p&gt;And, as a side effect, they got a ton of 3rd-party analysis. Due to their simplicity, they are also the first targets of new tools and techniques for cryptanalysis. However, after more than 13 years of 3rd-party analysis, there are still zero practical attacks against them, and they still have a decent security margin. New cryptanalysis techniques have been developed, but they are generic techniques that aren’t vastly more effective on these block ciphers than other ciphers of the same size.&lt;/p&gt;

&lt;p&gt;And if the NSA really knows of completely novel techniques to break these ciphers that no one in academia ever explored, which is very unlikely, they can apply the same techniques to other widely used ciphers sharing the same construction.&lt;/p&gt;

&lt;p&gt;Unlike the Dual-DRBG backdoor, there’s little room to hide a backdoor in plain sight in such ciphers. SIMON’s round constants are generated by a simple LFSR with a fixed primitive polynomial. The base value doesn’t come with a justification like “digits of pi”, but in that context, anything not showing signs of symmetry is fine. SPECK avoids constants almost entirely.&lt;/p&gt;

&lt;p&gt;TLDR: after more than 13 years, extensive cryptanalysis exists, no practical backdoors have been found, and attacks generally align with what the designers claimed. The only expectations they don’t meet are cultural. After all that time, it’s time to move on and consider them for their properties and applications.&lt;/p&gt;

&lt;p&gt;SIMON and SPECK are small, simple, and fast. They can be implemented in ~20 lines of code in any programming language, and will have good performance on a wide range of CPUs.&lt;/p&gt;

&lt;p&gt;They are versatile, supporting block sizes of 32, 48, 64, 96 and 128 bits, with a key size from 64 to 256 bits. For a 128-bit block cipher, you’d better use AES. But for something smaller, they are a solid choice.&lt;/p&gt;

&lt;p&gt;And if in spite of existing analysis, you still feel nervous about structural attacks and some edge-case distinguishers, there’s a simple, cheap addition you can make: key whitening (the FX/Even-Mansour construction). XOR the key before the first round and after the last round. This has been formally shown to improve security bounds under idealized models. Done.&lt;/p&gt;

&lt;p&gt;Want to give them a spin? Try, for example, these easy to use implementations of SPECK, SIMON and SIMECK in &lt;a href=&quot;https://github.com/jedisct1/zig-lwbc32&quot;&gt;Zig&lt;/a&gt; and &lt;a href=&quot;https://github.com/jedisct1/rust-lwbc&quot;&gt;Rust&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;should-you-use-small-block-ciphers&quot;&gt;Should you use small block ciphers?&lt;/h2&gt;

&lt;p&gt;Generally, no. They are not generic ciphers that are safe to use in a wide range of scenarios. If they are large enough and the random number generator can be trusted, random identifiers are also a more secure option. Maybe dates, frame numbers, account identifiers and other numbers publicly leaked are not considered sensitive data, so you don’t have to worry about them at all. It all depends on your applications and protocols.&lt;/p&gt;

&lt;p&gt;However, small block ciphers represent something to keep in your toolbox. You don’t have to systematically reach out to AES-GCM. But even if you do, maybe the public nonce leaks information? In that case, there’s a simple way to hide it. Use a small block cipher.&lt;/p&gt;
</content>
  </entry>
  
</feed>

