<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
 <channel>
  <title>Frank DENIS random thoughts.</title>
  <link>https://00f.net</link>
  <description>Frank DENIS blog</description>
  <language>en-US</language>
  <lastBuildDate>2026-06-01T11:36:11+02:00</lastBuildDate>
  <ttl>50000</ttl>
  
  <author>Frank Denis (Jedi/Sector One)</author>
  

  
  <item>
    <title>AI slop is hard to fork</title>
    <link>https://00f.net/2026/05/31/ai-slop-is-hard-to-fork/</link>
    <pubDate>2026-05-31T00:00:00+02:00</pubDate>
    <content:encoded><![CDATA[ <p>If you’ve ever maintained a fork of a project, or a pull request that takes forever to land, you’ve been there.</p>

<p>At some point, upstream moves. Then you run <code class="language-plaintext highlighter-rouge">git rebase</code> or <code class="language-plaintext highlighter-rouge">git merge</code>, and suddenly your nice isolated change is becomes a merge conflicts mess.</p>

<p>Sometimes this is manageable. A nearby function changed. A file moved. Someone renamed a type. Easy enough. You spend a few minutes fixing things, run the tests again, and move on.</p>

<p>Sometimes it’s super annoying and time consuming.</p>

<p>The maintainer reformatted the entire project. Or reorganized every module. Or rewrote internal structures your change depends on. Now your fork isn’t really a fork any more. It’s an archaeology project.</p>

<p>This is painful.</p>

<p>It’s not just boring mechanical work. It’s also risky work. The version you originally wrote was tested against a specific upstream state. You understood the code around it. You probably had a coherent reason for every line.</p>

<p>But after three painful rebases, the goal quietly changes.</p>

<p>You’re no longer trying to preserve the quality of the original change. You’re just trying to make the damn thing compile again. You’re trying to make tests pass again. You’re trying not to lose your mind while resolving the same conceptual conflict in five slightly different files.</p>

<p>And boom, bugs and vulnerabilities that didn’t originally exist get introduced. The quality of the fork degrades after every rebase.</p>

<p>The annoying part is that this used to be relatively rare.</p>

<p>Big, sweeping upstream changes happened, but they cost real human time. A maintainer had to decide that a massive refactor was worth the pain. They had to do the work. So most projects had some natural friction.</p>

<p>But AI removes a lot of that friction. Prompting is all you need.</p>

<p>Cheap experimentation is useful. But it also changes the shape of commits.</p>

<p>In vibe-coded projects, individual commits tend to be large. Way larger than regular commits traditionally made by humans. A single prompt often produces a diff that touches a lot of surface. The maintainer looks at the result, runs the tests, likes the direction, and commits it. Single commit, large changes.</p>

<p>The cost of producing that diff was tiny. The cost of everyone else integrating with it was not.</p>

<p>A large AI-generated refactor may be cheap for the person pressing enter, but it can be extremely expensive for anybody maintaining local changes, a downstream patch set, or a long-running pull request.</p>

<p>The project didn’t just change behavior. It changed shape.</p>

<p>And forks depend on shape.</p>

<p>A fork isn’t only a copy of code. It’s a set of assumptions about where things live, how functions are split, which internal boundaries are stable enough to build on, and which files can be changed without touching the rest of the world.</p>

<p>When every other upstream commit reshuffles that shape, the fork loses its anchor. This is especially bad for changes that are important but not immediately mergeable.</p>

<p>Maybe the maintainer agrees with the idea but wants a different API. Maybe the change needs more testing. Maybe it’s useful for one deployment but too specific for upstream. Maybe the project moves slowly on review because everybody is busy.</p>

<p>And if the upstream project is constantly being rewritten by prompts, the window for a sane merge gets much smaller. Either your work lands quickly, or it starts rotting immediately. Not because the logic became wrong. Because the surrounding code was churned into another shape.</p>

<p>After a while, maintaining the fork becomes virtually impossible.</p>

<p>You can stop updating from upstream, which means the fork slowly becomes its own project.</p>

<p>You can keep rebasing, which means spending more and more time repairing damage caused by unrelated global edits.</p>

<p>Or you can give up and ask an AI to generate your own competing version from scratch.</p>

<p>That last option sucks, but it’s exactly where the incentives point. If upstream treats code as disposable text that can be globally regenerated whenever the mood changes, downstream users will eventually treat upstream the same way.</p>

<p>Why maintain a careful fork of something that refuses to keep a stable shape?</p>

<p>There’s a difference between intentional large changes and casual churn.</p>

<p>A human refactor usually carries some scar tissue. You can see the maintainer trying to minimize damage. The diff has boundaries. The commit message explains why this had to happen. Compatibility layers appear. Old paths survive for a while. Reviewers ask whether this will hurt downstream users.</p>

<p>AI slop doesn’t naturally care about any of that.</p>

<p>It optimizes for satisfying the prompt in the current checkout. It doesn’t know which patch series exists in someone’s fork. It doesn’t care that a small function rename creates conflicts in ten open pull requests. It doesn’t feel the social cost of making everybody else redo work.</p>

<p>That’s why I don’t even bother contributing to vibe-coded projects anymore.</p>

<p>Forkability is (was?) a project quality. AI makes it easier than ever to destroy it.</p>
 ]]></content:encoded>
    <guid isPermaLink="true">https://00f.net/2026/05/31/ai-slop-is-hard-to-fork</guid>
  </item>
  
  <item>
    <title>Pushing to a pull request that isn't yours</title>
    <link>https://00f.net/2026/05/27/pushing-to-someone-elses-pr/</link>
    <pubDate>2026-05-27T00:00:00+02:00</pubDate>
    <content:encoded><![CDATA[ <p>I’ve been using Git and GitHub for 18 years.</p>

<p>But I had to push to someone else’s pull request today, and I couldn’t do it without asking an LLM, and it even didn’t do what I wanted. Yay, that’s embarrassing.</p>

<p>To modify files already changed by the PR, this is trivial and can be done directly through the GitHub UI. But if additional files need to be modified, some local Git command magic is required.</p>

<p>So this is the reference I wish I’d had open in a tab.</p>

<h2 id="find-the-source-branch">Find the source branch</h2>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gh <span class="nb">pr </span>view 42 <span class="nt">--repo</span> user/project <span class="se">\</span>
  <span class="nt">--json</span> headRepository,headRefName,isCrossRepository
</code></pre></div></div>

<p>You need two things from that output: the repository that owns the PR branch, and the branch name.</p>

<p>If the PR came from the same repository, the branch lives on <code class="language-plaintext highlighter-rouge">origin</code>.</p>

<p>If the PR came from a fork, the branch lives on the fork. <code class="language-plaintext highlighter-rouge">origin</code> is still the upstream repository. Pushing to <code class="language-plaintext highlighter-rouge">origin some-local-name:their-branch</code> will create or update a branch in the upstream repository, not in the contributor’s fork.</p>

<h2 id="fetch-the-pr">Fetch the PR</h2>

<p><code class="language-plaintext highlighter-rouge">pull/&lt;pr-number&gt;/head:&lt;pr-branch-name&gt;</code> as the remote branch name is the key.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git fetch origin pull/42/head:pr-42
git switch pr-42
</code></pre></div></div>

<p>Now make your edits and commit them normally.</p>

<h2 id="push-to-the-source-branch">Push to the source branch</h2>

<p>For a PR branch in the same repository:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git push origin HEAD:their-branch
</code></pre></div></div>

<p>For a PR branch in a fork:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git push git@github.com:contributor/project.git HEAD:their-branch
</code></pre></div></div>

<p>You can add the fork as a remote if you prefer:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git remote add contributor git@github.com:contributor/project.git
git push contributor HEAD:their-branch
</code></pre></div></div>

<p>The important part is the destination repository. The left side is your local branch or commit. The right side is the branch that GitHub is using as the pull request’s head.</p>

<h2 id="refuse-to-force-push">Refuse to force push</h2>

<p>After a while, you may want to squash commits and force push. Be careful.</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git fetch git@github.com:contributor/project.git their-branch
git merge-base <span class="nt">--is-ancestor</span> FETCH_HEAD HEAD <span class="se">\</span>
  <span class="o">&amp;&amp;</span> <span class="nb">echo</span> <span class="s2">"fast-forward ok"</span>
</code></pre></div></div>

<p>If it isn’t a fast-forward, stop. A force push would silently delete the author’s commits. Talk to them first.</p>

<p>Hope this helps.</p>
 ]]></content:encoded>
    <guid isPermaLink="true">https://00f.net/2026/05/27/pushing-to-someone-elses-pr</guid>
  </item>
  
  <item>
    <title>Bun's problem may be developing in the open</title>
    <link>https://00f.net/2026/05/17/developping-in-the-open/</link>
    <pubDate>2026-05-17T00:00:00+02:00</pubDate>
    <content:encoded><![CDATA[ <p>As soon as people found a Bun branch mentioning an experiment to use an LLM to port the existing Zig code to Rust, they went mad.</p>

<p>It was a personal experiment, on a non-default Git branch, not announced anywhere.</p>

<p>That didn’t matter. Rust advocates popped champagne. Zig people were shocked. Actual Bun users got confused. Everybody fought like kids in a kindergarten playground.</p>

<p>Jarred Sumner has issues with Bun. Maybe the root cause is original technical decisions rather than anything inherent to Zig. Maybe moving to another language could help with some of them. Maybe not. But trying is not insane.</p>

<p>Rewriting software is often a good way to revisit how the code works, remove technical debt, and produce something better. Especially when the mature shape of the product is already known. You don’t have to rediscover every edge case, every awkward feature, and every hack that got added because the original design didn’t anticipate reality.</p>

<p>Rewriting Bun would still be a significant task.</p>

<p>But despite everybody talking about a Bun rewrite, that was not really the plan.</p>

<p>The plan looked more like this:</p>

<ol>
  <li>Transpile the Zig code to Rust. Every file, function, and structure should remain as close as possible to the original, so that a Zig file and its Rust equivalent are functionally interchangeable.</li>
  <li>Incrementally tweak, rewrite, and refactor individual functions and types until the code becomes idiomatic, maintainable Rust.</li>
  <li>Eventually, get a Rust rewrite.</li>
</ol>

<p>Bun completed phase 1 only.</p>

<p>This is transpiled code. Automatically transpiled code. Since there are no Zig-to-Rust transpilers, Jarred used an AI prompt.</p>

<p>And it worked.</p>

<p>As in: the tests passed.</p>

<p>And that is not very surprising. LLMs are good at translating text, and code is text with a lot of structure. If you give them a large test suite to keep them from going off the rails, this kind of mechanical translation is exactly the sort of thing they can do well.</p>

<p>Anthropic can use it to show off Claude, but most coding models could probably do something similar with the right instructions.</p>

<p>The resulting Rust code was of poor quality.</p>

<p>Of course it was.</p>

<p>An experienced Rust developer would not accept code like that in their own project. And obviously, it was mocked, including by Rust advocates who had originally rejoiced at the idea of Bun being “rewritten” in Rust.</p>

<p>But idiomatic Rust was not the goal at that point.</p>

<p>Phase 1 was meant to be a literal, direct translation of the Zig code. In that context, Rust is almost an intermediate representation. The source of truth is still Zig. The Rust output is there to see whether the translation pipeline can preserve behavior before anyone starts improving the result.</p>

<p>Phase 1 was not about producing beautiful Rust.</p>

<p>It was about answering a narrow question: can a large Zig code base be translated automatically, with enough fidelity for the test suite to pass?</p>

<p>Apparently, yes.</p>

<p>With AI, that is a simple experiment to run. It is quick. It is cheap when you don’t pay for tokens. Failure is acceptable because it doesn’t require much commitment. That is exactly why Jarred described it as an experiment.</p>

<p>From the outside, though, the story looked very different.</p>

<p>People saw a leaked intention to do something controversial. Then they saw the controversial thing materialize. Then they saw horrific-looking generated Rust. In a few days, Bun went from a tool people trusted and used in production to, in their minds, a tool being blindly rewritten by AI with minimal quality control.</p>

<p>So they panicked.</p>

<p>Here is the uncomfortable part: the root cause of the drama may be that Jarred acted transparently.</p>

<p>He could have kept the Rust branch private until phase 3 was complete. And if the experiment failed, he could have never mentioned it.</p>

<p>Then, much later, he could have announced a new Bun version, actually rewritten in Rust, with code everyone agreed was good and stability numbers better than the original implementation.</p>

<p>Language advocates would still have fought. Of course. But actual users would mostly have judged the result.</p>

<p>Does it work? Is it reliable? Is the new version better than the previous one?</p>

<p>If yes, cool. If no, that is a good reason to leave.</p>

<p>Most users do not really care how software is built. They care whether the tool they rely on keeps working.</p>

<p>But Jarred did not do the safe corporate thing. He worked in the open, including the earliest and ugliest part of the experiment.</p>

<p>And that backfired.</p>

<p>Unfortunately, this is a common open source problem.</p>

<p>Maintainers need to experiment. That is a normal part of working on a project. It is fine to write unfinished, ugly, unsafe, half-broken code while exploring an idea. It is also convenient to push these branches to GitHub.</p>

<p>Yes, alternatives exist. But if you work from multiple devices, or want to share the branch with one person, pushing it to the same remote as everything else is by far the easiest workflow. Anything else adds friction.</p>

<p>The problem is that every bit pushed to a public repository becomes material for interpretation.</p>

<p>People will look at it. They will judge it. They will assign intent to it. They will write the missing context themselves, usually in the least charitable way possible.</p>

<p>If a project is open source, maintainers are somehow expected to make every public commit look clean, coherent, and strategically final.</p>

<p>That is absurd.</p>

<p>It also changes behavior. The more popular a project becomes, the less freedom maintainers have to think in public. A random branch stops being a scratchpad and becomes a press release. A quick experiment becomes a roadmap. A bad intermediate result becomes proof that the project has lost its mind.</p>

<p>So maintainers learn the obvious lesson: hide the messy parts.</p>

<p>That is bad for everyone.</p>

<p>The lesson for Bun is not that AI rewrites are good. It is not that Rust would fix Bun. It is not that Zig is the problem. None of that is proven by this experiment.</p>

<p>The lesson is that public development and public experimentation are not the same thing.</p>

<p>Users want transparency, but they also punish unfinished work. Maintainers want openness, but they also need private space to try stupid ideas before deciding whether they are stupid.</p>

<p>For large projects, maybe the practical answer is boring: keep experiments private until they have enough context to be understood, or label them so aggressively that nobody can pretend they are production plans.</p>

<p>That is less romantic than developing everything in the open.</p>

<p>But if every ugly branch becomes drama, more maintainers will stop showing ugly branches.</p>

<p>And then everybody will complain that open source became less open.</p>
 ]]></content:encoded>
    <guid isPermaLink="true">https://00f.net/2026/05/17/developping-in-the-open</guid>
  </item>
  
  <item>
    <title>aHash is not a PRF</title>
    <link>https://00f.net/2026/04/26/ahash-is-not-a-prf/</link>
    <pubDate>2026-04-26T00:00:00+02:00</pubDate>
    <content:encoded><![CDATA[ <p><a href="https://crates.io/crates/ahash">aHash</a> is a fast Rust hasher. It is popular, useful, and very explicitly not cryptographic.</p>

<p>That warning is good. But the documentation also says something stronger: <em>“because <code class="language-plaintext highlighter-rouge">aHash</code> is keyed, hashes cannot be predicted without knowing the keys”</em>. That is not true.</p>

<p>For a hash table, keyed hashing usually means something much more modest. It means an attacker should not be able to precompute one pile of colliding keys and reuse it against every process on the internet. Per-map randomization is a practical defense against boring collision-flooding attacks.</p>

<p>But “cannot be predicted without knowing the keys” sounds like a keyed random function. It sounds PRF-like. It suggests that even after seeing hashes for inputs I choose, I still cannot predict the hash of a fresh input.</p>

<p>That is a much higher bar, and <code class="language-plaintext highlighter-rouge">aHash</code> does not clear it.</p>

<h2 id="the-construction-is-not-aes">The construction is not AES</h2>

<p>The usual fast path uses AES instructions. That detail is easy to overread.</p>

<p>Using AES round instructions does not make the whole construction AES, any more than using <code class="language-plaintext highlighter-rouge">xor</code> makes a protocol a stream cipher. The surrounding construction still has to absorb inputs safely, separate domains, and finalize with enough mixing for the security claim being made.</p>

<p><code class="language-plaintext highlighter-rouge">aHash</code> state keeps three 128-bit values:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>enc
sum
key
</code></pre></div></div>

<p>The basic block update is roughly:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>enc = aesdec(enc, block)
sum = shuffle_and_add(sum, block)
</code></pre></div></div>

<p>Then <code class="language-plaintext highlighter-rouge">finish()</code> combines the state with a small number of AES instructions and returns 64 bits.</p>

<p>This is a good shape for a fast hash-table hasher. It is not AES-CMAC. It is not AES as a block cipher. It is not a standard PRF construction.</p>

<p>And in practice, it leaks simple key-independent relations between outputs.</p>

<h2 id="ahash-doesnt-securely-hash-transcripts">aHash doesn’t securely hash transcripts</h2>

<p>The Rust <code class="language-plaintext highlighter-rouge">Hasher</code> trait exposes several ways to feed data:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>write(&amp;[u8])
write_u8(...)
write_u64(...)
write_u128(...)
</code></pre></div></div>

<p>In <code class="language-plaintext highlighter-rouge">aHash</code>, the integer methods collapse into the same internal operation:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>write_u8(i)   -&gt; write_u64(i as u64)
write_u16(i)  -&gt; write_u64(i as u64)
write_u32(i)  -&gt; write_u64(i as u64)
write_u64(i)  -&gt; write_u128(i as u128)
write_u128(i) -&gt; hash_in(i)
</code></pre></div></div>

<p>So these are different public transcripts:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>write_u8(0xab)
write_u16(0xab)
write_u32(0xab)
write_u64(0xab)
write_u128(0xab)
</code></pre></div></div>

<p>But they all become:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>hash_in(0xab)
</code></pre></div></div>

<p>For every key, they produce the same final hash. There’s no context separation. Developers may not expect this.</p>

<p>There’s a more surprising one:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>write(&amp;[])
write_u128(0)
</code></pre></div></div>

<p>For non-empty byte strings, <code class="language-plaintext highlighter-rouge">aHash</code> adds the input length before absorbing bytes. For the empty string, it writes zero directly. The typed <code class="language-plaintext highlighter-rouge">u128</code> path also hashes zero directly.</p>

<p>So two very different transcripts can produce the same hash, independently from the key.</p>

<p>That being said, the <code class="language-plaintext highlighter-rouge">Hasher</code> trait doesn’t really define how inputs are supposed to be handled. Being type-safe for hashing transcripts is an option, concactenating everything is another option, but anything else is technically valid as well, even if it is prone to misuse.</p>

<h2 id="an-integral-distinguisher">An integral distinguisher</h2>

<p>Consider <code class="language-plaintext highlighter-rouge">u128</code> inputs constructed that way:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>X[a,b] = (a &lt;&lt; 56) | (b &lt;&lt; 120)
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">b</code> each range from <code class="language-plaintext highlighter-rouge">0x00</code> to <code class="language-plaintext highlighter-rouge">0xff</code>.</p>

<p>So these are 128-bit values where byte 7 and byte 15 vary over every possible value, and all other bytes are zero.</p>

<p>Now, consider every query:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>h.write_u128(X[a,b])
h.finish()
</code></pre></div></div>

<p>That gives 65,536 different typed inputs. Their 64-bit outputs have a zero XOR sum for any key.</p>

<p>Again, that is not how a PRF from <code class="language-plaintext highlighter-rouge">u128</code> inputs to 64-bit outputs behaves. For independent random outputs, the XOR should be zero with probability <code class="language-plaintext highlighter-rouge">2^-64</code>. In <code class="language-plaintext highlighter-rouge">aHash</code>, it is zero every time.</p>

<p>The reason is that <code class="language-plaintext highlighter-rouge">write_u128()</code> feeds the value directly into <code class="language-plaintext highlighter-rouge">hash_in()</code>, and these byte positions create a balanced set through the reduced AES-based finalization. The key changes the individual hashes. It does not change the XOR relation.</p>

<p>This also makes the output predictable. Leave out one <code class="language-plaintext highlighter-rouge">X[a,b]</code>, query the other 65,535 values, XOR their outputs, and the result is the missing output.</p>

<p>Here is a trivial PoC:</p>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">aHash</span><span class="p">::</span><span class="n">RandomState</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">hash</span><span class="p">::{</span><span class="n">BuildHasher</span><span class="p">,</span> <span class="n">Hasher</span><span class="p">};</span>

<span class="k">fn</span> <span class="nf">hash_u128</span><span class="p">(</span><span class="n">state</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">RandomState</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="nb">u128</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">u64</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">h</span> <span class="o">=</span> <span class="n">state</span><span class="nf">.build_hasher</span><span class="p">();</span>
    <span class="n">h</span><span class="nf">.write_u128</span><span class="p">(</span><span class="n">x</span><span class="p">);</span>
    <span class="n">h</span><span class="nf">.finish</span><span class="p">()</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">state</span> <span class="o">=</span> <span class="nn">RandomState</span><span class="p">::</span><span class="nf">with_seeds</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">xor</span> <span class="o">=</span> <span class="mi">0u64</span><span class="p">;</span>
    <span class="k">for</span> <span class="n">a</span> <span class="k">in</span> <span class="mi">0u8</span><span class="o">..=</span><span class="mi">255</span> <span class="p">{</span>
        <span class="k">for</span> <span class="n">b</span> <span class="k">in</span> <span class="mi">0u8</span><span class="o">..=</span><span class="mi">255</span> <span class="p">{</span>
            <span class="k">let</span> <span class="n">x</span> <span class="o">=</span> <span class="p">((</span><span class="n">a</span> <span class="k">as</span> <span class="nb">u128</span><span class="p">)</span> <span class="o">&lt;&lt;</span> <span class="mi">56</span><span class="p">)</span> <span class="p">|</span> <span class="p">((</span><span class="n">b</span> <span class="k">as</span> <span class="nb">u128</span><span class="p">)</span> <span class="o">&lt;&lt;</span> <span class="mi">120</span><span class="p">);</span>
            <span class="n">xor</span> <span class="o">^=</span> <span class="nf">hash_u128</span><span class="p">(</span><span class="o">&amp;</span><span class="n">state</span><span class="p">,</span> <span class="n">x</span><span class="p">);</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="nd">assert_eq!</span><span class="p">(</span><span class="n">xor</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"xor {xor:016x}"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This prints:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>xor 0000000000000000
</code></pre></div></div>

<p>Of course, typed <code class="language-plaintext highlighter-rouge">u128</code> inputs are still easy to wave away. Many real keys are strings or byte slices. But they are affected by the same issue.</p>

<h2 id="transposing-the-distinguisher-to-strings">Transposing the distinguisher to strings</h2>

<p>Consider this family of 16-byte strings:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>S[a,b] = 00 00 00 00 00 00 00 aa 00 00 00 00 00 00 00 bb
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">b</code> each range from <code class="language-plaintext highlighter-rouge">0x00</code> to <code class="language-plaintext highlighter-rouge">0xff</code>.</p>

<p>That gives 65,536 different non-empty byte strings. No typed integer writes. No empty input. No repeated input. No weird transcript ambiguity.</p>

<p>For each string, do the ordinary thing:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>h.write(&amp;S[a,b])
h.finish()
</code></pre></div></div>

<p>The XOR of all 65,536 outputs is always zero, for any key.</p>

<p>That should not happen for a real PRF. For 65,536 independent 64-bit outputs, the XOR would be zero with probability <code class="language-plaintext highlighter-rouge">2^-64</code>. Here, it happens every time.</p>

<p>The reason is that the varying bytes are placed as the high bytes of the two 64-bit lanes consumed by the 16-byte byte-slice path. Those positions avoid carry trouble in the lane additions, so the chosen set stays balanced before it reaches the reduced AES-based finalization. The key changes the individual outputs. It does not remove the relation.</p>

<p>This is the same kind of balanced-set behavior behind square/integral attacks on AES-like constructions, showing through a construction that uses too little AES-style mixing to hide it.</p>

<p>Pick a target:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>T = S[0x42, 0xa5]
</code></pre></div></div>

<p>Now hash the other 65,535 strings in the family, but not <code class="language-plaintext highlighter-rouge">T</code>. XOR their outputs together.</p>

<p>The result is exactly the missing output:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>prediction = xor_{(a,b) != (0x42,0xa5)} aHash_k(S[a,b])
prediction = aHash_k(T)
</code></pre></div></div>

<p>No key recovery. No brute force. Just a key-independent relation.</p>

<h2 id="another-quick-poc">Another quick PoC</h2>

<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">aHash</span><span class="p">::</span><span class="n">RandomState</span><span class="p">;</span>
<span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">hash</span><span class="p">::{</span><span class="n">BuildHasher</span><span class="p">,</span> <span class="n">Hasher</span><span class="p">};</span>

<span class="k">fn</span> <span class="nf">hash</span><span class="p">(</span><span class="n">state</span><span class="p">:</span> <span class="o">&amp;</span><span class="n">RandomState</span><span class="p">,</span> <span class="n">bytes</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="nb">u8</span><span class="p">])</span> <span class="k">-&gt;</span> <span class="nb">u64</span> <span class="p">{</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">h</span> <span class="o">=</span> <span class="n">state</span><span class="nf">.build_hasher</span><span class="p">();</span>
    <span class="n">h</span><span class="nf">.write</span><span class="p">(</span><span class="n">bytes</span><span class="p">);</span>
    <span class="n">h</span><span class="nf">.finish</span><span class="p">()</span>
<span class="p">}</span>

<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
    <span class="k">let</span> <span class="n">state</span> <span class="o">=</span> <span class="nn">RandomState</span><span class="p">::</span><span class="nf">with_seeds</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
    <span class="k">let</span> <span class="n">target</span> <span class="o">=</span> <span class="p">(</span><span class="mi">0x42</span><span class="p">,</span> <span class="mi">0xa5</span><span class="p">);</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">prediction</span> <span class="o">=</span> <span class="mi">0u64</span><span class="p">;</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">actual</span> <span class="o">=</span> <span class="mi">0u64</span><span class="p">;</span>
    <span class="k">let</span> <span class="k">mut</span> <span class="n">s</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0u8</span><span class="p">;</span> <span class="mi">16</span><span class="p">];</span>
    <span class="k">for</span> <span class="n">a</span> <span class="k">in</span> <span class="mi">0u8</span><span class="o">..=</span><span class="mi">255</span> <span class="p">{</span>
        <span class="k">for</span> <span class="n">b</span> <span class="k">in</span> <span class="mi">0u8</span><span class="o">..=</span><span class="mi">255</span> <span class="p">{</span>
            <span class="n">s</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span><span class="p">;</span>
            <span class="n">s</span><span class="p">[</span><span class="mi">15</span><span class="p">]</span> <span class="o">=</span> <span class="n">b</span><span class="p">;</span>
            <span class="k">let</span> <span class="n">out</span> <span class="o">=</span> <span class="nf">hash</span><span class="p">(</span><span class="o">&amp;</span><span class="n">state</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">s</span><span class="p">);</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">s</span><span class="p">[</span><span class="mi">7</span><span class="p">],</span> <span class="n">s</span><span class="p">[</span><span class="mi">15</span><span class="p">])</span> <span class="o">==</span> <span class="n">target</span> <span class="p">{</span>
                <span class="n">actual</span> <span class="o">=</span> <span class="n">out</span><span class="p">;</span>
            <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
                <span class="n">prediction</span> <span class="o">^=</span> <span class="n">out</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
    <span class="nd">assert_eq!</span><span class="p">(</span><span class="n">prediction</span><span class="p">,</span> <span class="n">actual</span><span class="p">);</span>
    <span class="nd">println!</span><span class="p">(</span><span class="s">"predicted {prediction:016x}"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This prints:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>predicted 2929121e3dfcdd2a
</code></pre></div></div>

<p>The exact value is not the point. The assertion is. The program never adds the target hash to <code class="language-plaintext highlighter-rouge">prediction</code>. It only hashes the other 65,535 strings.</p>

<h2 id="what-this-proves">What this proves</h2>

<p>This does not prove that <code class="language-plaintext highlighter-rouge">aHash</code> is useless.</p>

<p>It does not give a giant collision set. It does not recover the key. It does not immediately produce a practical <code class="language-plaintext highlighter-rouge">HashMap</code> denial-of-service attack. It does not mean every application using <code class="language-plaintext highlighter-rouge">aHash</code> is broken.</p>

<p>Hash-table hashers are not MACs. They are judged by different criteria: speed, distribution, and enough randomization to make precomputed collision attacks unattractive.</p>

<p>But <code class="language-plaintext highlighter-rouge">aHash</code> is not a PRF-like keyed hash over byte strings. Some outputs are algebraically related. One output in this family can be predicted exactly from the others without knowing the key.</p>

<p>That is incompatible with the strong reading of “cannot be predicted without knowing the keys”.</p>

<p><code class="language-plaintext highlighter-rouge">aHash</code> can still be a perfectly reasonable fast hash-table hasher as long as the output remains secret. The crate is also right to say that it is not cryptographically secure.</p>

<p>But do not use it as a MAC. Do not use it as a cryptographic keyed hash. Do not use it as a PRF. Do not use its output as randomness. Be careful with non-cryptographic designs such as sharded routing if an adversary gets to choose inputs and observe outputs.</p>
 ]]></content:encoded>
    <guid isPermaLink="true">https://00f.net/2026/04/26/ahash-is-not-a-prf</guid>
  </item>
  
  <item>
    <title>Swival is the AI agent I actually wanted</title>
    <link>https://00f.net/2026/04/13/swival-ai-agent/</link>
    <pubDate>2026-04-13T00:00:00+02:00</pubDate>
    <content:encoded><![CDATA[ <p>Swival is a CLI AI agent, and <a href="https://swival.dev">Swival 1.0.0</a> has just been tagged.</p>

<p>People are going to ask the obvious question.</p>

<p>Why build a new agent when Codex, Claude Code and Opencode already exist, are well established, and already look good enough for most people?</p>

<p>Because I wanted an agent that fixes the things existing agents still get wrong in actual daily use.</p>

<h2 id="privacy-local-models-and-not-leaking-secrets-to-a-provider">Privacy, local models, and not leaking secrets to a provider</h2>

<p>Current agents are built around genuinely incredible models.</p>

<p>But I still don’t trust companies such as Anthropics with my data. For open source work, fine. For closed-source work, or anything sensitive or personal, I think the default posture of most current tools just isn’t acceptable.</p>

<p>Using an AI agent inevitably leaks internal information. Sometimes a lot of it.</p>

<p>That includes access tokens, internal project names, URLs, company names, and all the little bits of context that look harmless until they aren’t.</p>

<p>Mitigating that risk is something I have cared about for a long time. I even gave a <a href="https://www.youtube.com/watch?v=FUpEAMUQkCA">Zigtoberfest talk</a> about that exact topic.</p>

<p>So I wanted an agent that does two things properly.</p>

<p>First, it needs mitigations for leaking secrets to providers.</p>

<p>That means transparently encrypting secrets before sending them to providers, then decrypting them locally so that models can still reason about them without actually seeing them. And it means being able to block and redact specific strings such as internal project names, company names and URLs.</p>

<p>The fact that current agents, including the ones heavily used by corporations, still don’t ship these features is, to me, irresponsible and unacceptable.</p>

<p>Swival has <a href="https://swival.dev/pages/secrets.html">transparent secret encryption</a> and <a href="https://swival.dev/pages/llm-filter.html">outbound LLM filters</a> specifically to reduce how much information leaves your machine.</p>

<p>Second, it needs to work well with open source models. Not as a checkbox. Actually well.</p>

<p>Local models have predictable behavior and predictable cost. They don’t suddenly get worse because a provider changed something. They don’t suddenly get more expensive because pricing moved. You control them, not a third party.</p>

<p>And of course, they also don’t leak sensitive data to anyone.</p>

<p>Open source models are getting good fast. Gemma 4, Qwen 3.6, and GLM-5.1 make that pretty obvious. Plenty of exciting models are uploaded to Hugging Face every day.</p>

<p>At the same time, efficient local inference is turning into a basic requirement for modern devices, and it’s only going to improve. Apple M5 chips are a good illustration of where this is going.</p>

<p>No, local models aren’t a replacement for everything. But they’re already good enough for a lot of work, they have a bright future, and they can be fine-tuned.</p>

<p>Even modern small models have surprisingly strong agentic capabilities.</p>

<p>The frustrating part is that most agents are still optimized and tested mainly for frontier models. Even the ones that advertise support for many providers and local models usually behave badly with local models. Tools are used poorly. Context is managed poorly. Everything gets slower. Output quality gets unreliable. Then people blame the model instead of the tool.</p>

<p>I wanted an agent that performs well with any model. From large frontier models all the way down to small local models with a small context window that anyone can run on their own machine.</p>

<p>And if it fails, I want the first instinct to be improving the agent so that it helps the model deliver as much as it can, not immediately declaring the model useless.</p>

<p>That’s one of the main reasons Swival <a href="https://swival.dev/pages/open-models.html">isn’t just for frontier models</a>. A lot of that comes from <a href="https://swival.dev/pages/context-management.html">excellent context management</a>.</p>

<p>Swival has a <code class="language-plaintext highlighter-rouge">/compact</code> command. But in practice, it’s rarely needed, if at all.
The agent keeps trying to deliver regardless of the constraints, and the context window isn’t something you should have to babysit during a long session.</p>

<p>And when I want to test new models, there’s nothing more convenient than HuggingFace CPU-less inference. So I wanted that to be trivial as well.</p>

<p>With Swival, it’s as easy as:</p>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>swival <span class="nt">--provider</span> huggingface <span class="nt">--model</span> zai-org/GLM-5.1
</code></pre></div></div>

<h2 id="agent-to-agent-is-too-useful-to-remain-niche">Agent-to-Agent is too useful to remain niche</h2>

<p>The <a href="https://a2a-protocol.org">A2A protocol</a> is great.</p>

<p>People who actually use it know how powerful it is, and they usually don’t want to go back to a single isolated agent.</p>

<p>Unfortunately, for most people, A2A is still one of these things they have vaguely heard about but never really use, mostly because mainstream tools such as Claude Code still don’t support it.</p>

<p>That’s a shame, because A2A changes what an agent can be.</p>

<p>With A2A, you can run multiple agents with different configurations and different models, and let tasks naturally reach for the right one.</p>

<p>So instead of stuffing documentation and skills into one local agent, you can have a dedicated documentation agent with direct access to the material, while other agents don’t need to carry all of that context.</p>

<p>Then, when an agent needs to know how to do something, it asks the documentation agent to research it and return a concise, accurate answer instead of dumping blind <code class="language-plaintext highlighter-rouge">grep</code> results.</p>

<p>And of course, that specialized documentation agent can run a small, cheap, local model.</p>

<p>That’s the larger idea. Don’t restrict yourself to one model, or even a tiny set of related models. Use as many models as you want, including open source ones, depending on what needs to be done.</p>

<p>I wanted A2A to be simple enough that people would actually use it.</p>

<p>Swival comes with <a href="https://swival.dev/pages/a2a.html">built-in support for A2A</a> and can act both as a server and as a client.</p>

<p>You can set up a network of specialized agents in minutes.</p>

<h2 id="open-source-readable-powerful-and-not-a-circus">Open source, readable, powerful, and not a circus</h2>

<p>Existing agents have become ridiculously bloated over time.</p>

<p>Claude Code is enormous. It flickers. It crashes. New versions keep adding gadgets I don’t care about while making the whole thing even heavier.</p>

<p>I don’t want a kitchen sink. I want a small, reliable tool.</p>

<p>Swival is lean, fully open source, doesn’t depend on any company, and isn’t optimized around a specific provider. It’s totally free and open, and has nothing to sell.</p>

<p>It’s written in Python because Python is simple, readable and maintainable. Nothing is obfuscated. Anyone can read the code, understand it, and modify it for their own needs.</p>

<p>And this isn’t a toy. It’s a workhorse. A boring one, which is exactly what I want here.</p>

<p>It focuses on the tools a developer actually needs, not on gimmicks. But it still has the features you would expect from a modern agent, and then some.</p>

<h2 id="benchmarking-needs-a-real-environment">Benchmarking needs a real environment</h2>

<p>I also wanted to run benchmarks.</p>

<p>I wanted a tool to evaluate models, settings, skills, MCP servers, and similar pieces on real-world tasks, in an environment that actually resembles how a user works.</p>

<p>A lot of benchmarking tools aren’t designed that way.</p>

<p>They either assume tools optimized for specific models, or they provide an environment that doesn’t feel much like the real thing.</p>

<p>And if you want to learn anything useful from evaluations, you need traces. Detailed ones. Accurate ones.</p>

<p>You need to be able to look at what happened and understand how a model behaved under different conditions.</p>

<p>Swival comes with strong <a href="https://swival.dev/pages/reports.html">reporting</a> features. Combined with <a href="https://calibra.swival.dev">calibra</a>, you can compare traces, <code class="language-plaintext highlighter-rouge">diff</code> them, and run evaluations that are actually meaningful.</p>

<p>Evaluating many configurations can burn through a lot of tokens.</p>

<p>That’s yet another reason I cared so much about making the agent work well with open source models running locally. For evaluations, cost is often more important than wall-clock speed.</p>

<h2 id="llms-fail-on-the-first-try">LLMs fail on the first try</h2>

<p>The real problem is that models produce terrible code and then confidently tell you everything is fine.</p>

<p>Watching a model generate code is impressive. It’s hard not to be impressed when you type a prompt and a feature, or sometimes a whole project, appears in one shot.</p>

<p>And the final report is always soothing. Everything is done. Everything works.</p>

<p>Of course.</p>

<p>The reality is that AI-generated code is almost always poor quality.</p>

<p>It may compile. It may appear to work. But from a correctness perspective, it’s often terrible.</p>

<p>You may be very happy with the code generated by Claude Code with Opus 4.6 max pro high thinking max, and maybe even want to deploy it to production, merge it into open source projects, or write triumphant blog posts about it.</p>

<p>But there’s a good chance that the code is inefficient, buggy, hard to maintain, and going to cost you later.</p>

<p>There’s a trivial experiment anyone can try.</p>

<p>Ask your favorite agent to generate code, or even just a plan.</p>

<p>Then, in a separate environment, ask another AI agent, even one running the same model, to review that code or plan.</p>

<p>It’s very likely to find issues immediately. Sometimes critical ones.</p>

<p>As much as I like AI, in my own projects I refuse pull requests blindly generated by tools such as Claude Code for exactly that reason. And in a company context, I wouldn’t deploy that output to production either.</p>

<p>There are two ways to significantly improve quality and confidence.</p>

<p>First, write the tests first, then force the agent not to declare the task complete until the tests pass.</p>

<p>The tests don’t even have to be part of the application’s formal test suite. A simple shell script with <code class="language-plaintext highlighter-rouge">curl</code> commands can be enough.</p>

<p>What matters is that this becomes a contract the agent has to satisfy.</p>

<p>That’s much stricter than a prompt, because a contract can’t be hand-waved away or interpreted creatively.</p>

<p>Second, use a loop with an LLM-as-a-judge.</p>

<p>Let one agent write code, documentation or a plan. Then let another agent review that work against the original instructions, and force the implementer to retry until the reviewer thinks it’s correct.</p>

<p>Swival makes both of these approaches trivial because they’re built in.</p>

<p>Before starting a task, you can give the agent a script that will act as a reviewer.</p>

<p>That reviewer can be another <code class="language-plaintext highlighter-rouge">swival</code> instance with a custom configuration. Or, even more simply, you can start with <code class="language-plaintext highlighter-rouge">--self-review</code>, and tasks will be reviewed by the same instance and same model in a dedicated context.</p>

<p>There’s nothing else to wire together.</p>

<p>One of the most interesting things to watch is how bad the initial output of an LLM agent can be, especially for code, and how honest and picky a model can suddenly become when it’s reviewing its own output without realizing it.</p>

<p>After a couple of iterations, the code, plan or documentation is often far better than the first attempt.</p>

<p>This is also one of the main reasons I wrote a new agent at all.</p>

<p>I don’t want to use AI to generate a mountain of code just so I can brag about productivity if the result is unreliable, insecure and unmaintainable.</p>

<p>I wanted an agent that optimizes for quality rather than raw time savings.</p>

<p>It can be slow. It can be expensive. But I want the output to be something I can trust and deploy to production.</p>

<h2 id="long-sessions-shouldnt-make-the-agent-dumber">Long sessions shouldn’t make the agent dumber</h2>

<p>Another thing I wanted was continuity.</p>

<p>I wanted an agent that remembers what I did before, and what it did before.</p>

<p>When I come back to the same project the next day, I want the agent to remember prior work without filling the live context with junk.</p>

<p>Swival does that in a way that feels much more natural than in other agents.</p>

<p>I also wanted it to stop making the same mistakes twice.</p>

<p>So Swival has a <code class="language-plaintext highlighter-rouge">/learn</code> command: at the end of a session, the agent can reflect on the issues it ran into and write concise instructions about how to avoid repeating them.
And once those learnings exist, it will keep updating them automatically.</p>

<p>That has turned out to be much more effective than premade agent skills.
Or, more accurately, it’s an extremely effective way to produce the right skills, because the agent discovers what it actually needs from real sessions instead of from speculation.</p>

<h2 id="goals">Goals</h2>

<p>One of the most frustrating things about working with an agent on a non-trivial task is how often it declares victory too early.</p>

<p>It runs for a few turns, fixes something, and then politely tells you it’s done. Except it’s not. There are still failing tests, missing pieces, or edge cases it quietly decided weren’t worth its time.</p>

<p>You re-prompt. It does a bit more work. It tells you it’s done again. And so on, until you give up and finish the task yourself.</p>

<p>People have been working around this with the so-called Ralph-style loop, which essentially means feeding the same prompt back to the agent in a shell loop until it stops finding things to do. It works, but it’s clumsy and wasteful.</p>

<p>I wanted something more structured, built directly into the agent.</p>

<p>In Swival, you set an objective with <code class="language-plaintext highlighter-rouge">/goal &lt;objective&gt;</code> in the REPL, and the agent stays on task across turns. The original objective is fed back to the model after every answer, and the loop only ends when the agent itself signals that the goal is complete after a real evidence-based audit, declares a blocker that needs your input, or hits an optional token budget.</p>

<p>The agent doesn’t get to walk away after one turn just because it feels good about its work.</p>

<p>This makes it practical to point Swival at ambitious long-running tasks such as refactors, audits or end-to-end fixes, and let it grind for hours without giving up halfway. You can pause, resume, replace, or clear the goal at any time.</p>

<p>It’s a small feature on paper, but in practice it changes what you can reasonably ask an agent to do.</p>

<h2 id="modern-features-but-without-the-usual-mess">Modern features, but without the usual mess</h2>

<p>Skills, MCP, parallel subagents and similar capabilities are table stakes for serious agent use now. Of course Swival supports all of that.</p>

<p>But I also wanted it to avoid the usual security and reliability mistakes.</p>

<p>So tool and MCP output are explicitly tagged as untrusted in order to reduce prompt-injection risk.</p>

<p>And markdown comments are ignored, so what you see in a rendered skill isn’t different from what the agent actually interprets.</p>

<p>There’s another common failure mode I have always found silly.</p>

<p>If an MCP command or tool returns a large output, many agents either stuff the whole thing into the context window or fail in some awkward way.</p>

<p>I wanted an agent that handles that properly. Swival writes large outputs to a temporary file and lets the agent access them in chunks later instead.</p>

<p>I also wanted a clean way to share files such as agent memories and <code class="language-plaintext highlighter-rouge">AGENTS.md</code> across multiple devices working on the same project, without committing them into a Git repository.</p>

<p>Swival has <a href="https://swival.dev/pages/lifecycle-hooks.html">lifecycle hooks</a> specifically for that sort of workflow.</p>

<p>Arbitrary commands are easy too.
In <code class="language-plaintext highlighter-rouge">~/.config/swival/commands/</code>, you can place either scripts or plain files. Then <code class="language-plaintext highlighter-rouge">! command_name</code> will inject either the content of the file or the output of the script into the prompt.</p>

<p>Yes, other agents have versions of this.
But I wanted it to be trivial from a user perspective.</p>

<p>Not five overlapping systems with five different names for basically the same thing. Just one simple mechanism.</p>

<p>The same goes for shell command inspection and rewriting.</p>

<p>I didn’t want people to need to learn some complicated generic hook system.</p>

<p>In Swival, enabling <a href="https://swival.dev/pages/command-middleware.html">command middleware</a> is safe and straightforward.</p>

<p>And more importantly, I wanted the agent to be usable programmatically, not just from the CLI.</p>

<p>I also didn’t want that to require a separate SDK with its own worldview and its own behavior. 
Everything the CLI agent does should be accessible in a consistent way from Python code.</p>

<p>This is why Swival can be used as a CLI, but also as a library. It exposes <a href="https://swival.dev/pages/python-api.html">a very simple API</a> so anyone can build custom agents, or more general applications, on top of a batteries-included agentic environment.</p>

<h2 id="small-things-matter">Small things matter</h2>

<p>Some of the things I cared about aren’t glamorous.</p>

<p>They’re just the kind of rough edges that make a daily tool annoying.</p>

<p>For example, markdown rendering for LLM output looks nice, but I dislike the fact that copy-pasting rendered output often strips the markdown markers.</p>

<p>I also don’t love the idea of an agent accidentally deleting files.</p>

<p>These are small things. But they matter if you actually use the tool every day.</p>

<p>Swival renders LLM markdown output while preserving the formatting tags.</p>

<p>So the output looks good, but can still be copied and pasted without losing the markdown.</p>

<p>And even in full YOLO mode, it has safety guards against dangerous commands, plus built-in support for the AgentFS copy-on-write filesystem overlay.</p>

<p>Also, when a file is deleted using Swival’s own tools, it isn’t actually deleted. It’s moved to a Trash directory instead.</p>

<p>I have never personally seen an agent delete the wrong file. But if it ever does happen, I want recovery to be possible.</p>

<h2 id="why-i-use-it">Why I use it</h2>

<p>At this point, I use Swival almost exclusively. It’s reliable, and I’m happy with the output I get from it.</p>

<p>I use open source models as much as possible, both locally and via HuggingFace inference endpoints.</p>

<p>But when I need a frontier model, I use it with GPT-5.5. This is a fantastic model.
It works amazingly well with a regular ChatGPT subscription, and I have never hit any usage limit.</p>

<p>If you are happy with your current agent, there are no reasons to switch.</p>

<p>But I would still strongly encourage you to try Swival. Maybe even use it alongside the agent you already use.</p>

<p>Because even with the very same models, it’s likely to find bugs your other agent never found.
That isn’t magic: different agents expose different environments to models, and models behave differently in those environments.</p>

<p>I have used <a href="https://swival.dev/pages/audit.html">the <code class="language-plaintext highlighter-rouge">/audit</code> built-in command</a> and it found bugs and vulnerabilities in pretty much every code base I tried it on, including code bases that had already been audited with Codex and Claude Code.</p>

<p>That’s the kind of difference I care about. Whether the tool helps me find real problems and ship better work.</p>

<p>So yes, give it a try.</p>

<h2 id="whats-next">What’s next</h2>

<p>Version 1.0.0 doesn’t mean Swival is done.</p>

<p>It means it now checks all the boxes from my original todo list, and it’s stable enough for daily use.</p>

<p>The API is also unlikely to see major breaking changes, so it’s something you can reasonably rely on if you want to build applications and agents on top of it.</p>

<p>There are still many planned changes and features.
But Swival will remain driven by real-world usage and user feedback, not by a roadmap for the sake of having a roadmap.</p>

<p>Related projects are going to keep expanding as well.</p>

<p>Right now, they’re:</p>

<ul>
  <li><a href="https://calibra.swival.dev">Calibra</a>, to evaluate models, MCP servers, skills, and related configurations</li>
  <li><a href="https://github.com/Swival/skillscheck">Skillscheck</a>, a linter for agent <code class="language-plaintext highlighter-rouge">SKILL.md</code> files</li>
  <li><a href="https://marketplace.visualstudio.com/items?itemName=jedisct1.agent-skill-lint">Agent Skill Lint</a>, a Visual Studio extension for agent skills linting</li>
  <li><a href="https://github.com/Swival/swival-commands">Swival commands</a>, a repository for user-contributed commands</li>
</ul>

<p>Swival exists because I wanted an agent that takes privacy seriously, works with the models I actually want to run, gets better over long sessions, and pushes code quality up instead of pretending quality doesn’t matter.</p>

<p>Most agents still optimize for marketing. I wanted one that optimizes for the work.</p>

<p>That’s Swival.</p>
 ]]></content:encoded>
    <guid isPermaLink="true">https://00f.net/2026/04/13/swival-ai-agent</guid>
  </item>
  
 </channel>
</rss>
