<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Liam DeVoe</title><link href="https://tybug.dev/" rel="alternate"/><link href="https://tybug.dev/feed.xml" rel="self"/><id>https://tybug.dev/</id><updated>2026-06-03T00:00:00-04:00</updated><entry><title>Swarm testing</title><link href="https://tybug.dev/swarm-testing" rel="alternate"/><published>2026-06-03T00:00:00-04:00</published><updated>2026-06-03T00:00:00-04:00</updated><author><name/></author><id>tag:tybug.dev,2026-06-03:/swarm-testing</id><summary type="html">&lt;style&gt;
    /* push/pop bug region — shared by both figures */
    .figure__chart .bug-region__fill {
        fill: #de1010;
        fill-opacity: 0.18;
    }
    .figure__chart .bug-region__frontier {
        fill: none;
        stroke: #d62828;
        stroke-width: 1.4px;
        stroke-opacity: 0.8;
    }
    .figure__chart .bug-region__label {
        fill: #d62828;
        font-size: 12px;
        font-style: italic;
        text-anchor: middle;
    }
    /* "unlikely to be explored" band — article-local one-off grey, not a palette token */
    .figure__chart …&lt;/style&gt;</summary><content type="html">&lt;style&gt;
    /* push/pop bug region — shared by both figures */
    .figure__chart .bug-region__fill {
        fill: #de1010;
        fill-opacity: 0.18;
    }
    .figure__chart .bug-region__frontier {
        fill: none;
        stroke: #d62828;
        stroke-width: 1.4px;
        stroke-opacity: 0.8;
    }
    .figure__chart .bug-region__label {
        fill: #d62828;
        font-size: 12px;
        font-style: italic;
        text-anchor: middle;
    }
    /* "unlikely to be explored" band — article-local one-off grey, not a palette token */
    .figure__chart .unexplored__hatch {
        stroke: #9a9a9a;
        stroke-width: 1px;
        stroke-opacity: 0.55;
    }
    .figure__chart .unexplored__fill {
        fill: url(#swarm-unexplored-hatch);
        stroke: #9a9a9a;
        stroke-width: 1px;
        stroke-opacity: 0.5;
        stroke-dasharray: 3 3;
    }
    .figure__chart .unexplored__label {
        fill: #6f6f6f;
        font-size: 12px;
        font-style: italic;
        text-anchor: middle;
    }
    /* swarm decomposition figure */
    /* This chart's viewBox is 520 wide vs the JointDensity charts' 460, so at
       width:100% its text renders ~0.885x smaller. Scale the fonts up by 520/460
       so axis titles (12px) and ticks (11px) match the first two figures. */
    .figure__chart.swarm .axis-label {
        font-size: 13.5px;
    }
    .figure__chart.swarm .axis text {
        font-size: 12.5px;
    }
    .swarm__contour {
        fill: var(--color-figure-rust);
        stroke: none;
    }
    .swarm__area {
        fill: var(--swarm-color);
        fill-opacity: 0.13;
        stroke: none;
    }
    .swarm__line {
        fill: none;
        stroke: var(--swarm-color);
        stroke-width: 1.5px;
        stroke-linejoin: round;
        stroke-linecap: round;
    }
    .swarm__bracket {
        fill: none;
        stroke: var(--swarm-color);
        stroke-width: 1px;
        stroke-opacity: 0.35;
        stroke-dasharray: 2 3;
    }
    .swarm__component--both {
        --swarm-color: var(--color-figure-rust);
    }
    .swarm__component--push {
        --swarm-color: var(--color-figure-teal);
    }
    .swarm__component--pop {
        --swarm-color: var(--color-figure-gold);
    }
    /* interactive "build up the swarm distribution" figure */
    .swarm-build__controls {
        display: flex;
        flex-wrap: wrap;
        gap: 0.6em;
        align-items: center;
        justify-content: center;
        margin-top: 0.8em;
    }
    /* break spacer: no-op on desktop, forces count + reset onto a new line on mobile */
    .swarm-build__break {
        display: none;
    }
    .swarm-build__btn {
        cursor: pointer;
        user-select: none;
        padding: 0.3em 0.9em;
        border: 1px solid #ccc;
        border-radius: 3px;
        color: #444;
        font-size: 0.9em;
    }
    .swarm-build__btn:hover {
        border-color: var(--color-figure-rust);
        color: var(--color-figure-rust);
    }
    .swarm-build__count {
        color: #6f6f6f;
        font-size: 0.9em;
        font-variant-numeric: tabular-nums;
        min-width: 6em;
    }
    @media (max-width: 40em) {
        .swarm-build__controls {
            row-gap: 0.4em;
        }
        .swarm-build__break {
            display: block;
            flex-basis: 100%;
            height: 0;
        }
    }
    /* interactive "pick one activation config" hover figure */
    .figure__chart .swarm-hover__dot {
        fill: var(--color-figure-rust);
        stroke: #fff;
        stroke-width: 1px;
    }
    .figure__chart .swarm-hover__guide {
        stroke: var(--color-figure-rust);
        stroke-width: 1px;
        stroke-opacity: 0.4;
        stroke-dasharray: 3 3;
    }
    /* idle "ping" cue: solid dot pinging outward; signals the panel is interactive.
       Geometry/cadence are JS-driven (see the figure script); SCSS owns only the fill. */
    .figure__chart .swarm-hover__cue {
        pointer-events: none;
    }
    .figure__chart .swarm-hover__ping,
    .figure__chart .swarm-hover__core {
        fill: var(--color-figure-rust);
    }
    .figure__chart .swarm-hover__cue-label {
        fill: #6f6f6f;
        font-size: 11px;
        font-style: italic;
        text-anchor: middle;
    }
    .swarm-hover__readout {
        text-align: center;
        margin-top: 0.6em;
        color: #6f6f6f;
        font-size: 0.9em;
        font-variant-numeric: tabular-nums;
    }
    .swarm-hover__readout .val {
        color: var(--color-figure-rust);
    }
&lt;/style&gt;

&lt;script&gt;
    // Shared helpers for the two figures below, so the bug-region geometry and the
    // axis-title subscripts can be tweaked in one place. (Article-local, not part of
    // the shared figures.js system.)

    // Append a subscript (1 to the x title, 2 to the rotated y title) to every
    // .axis-label in `svg`. SVG text has no markup, so we add a smaller, lowered tspan.
    function swarmAxisSubscripts(svg) {
        svg.selectAll('text.axis-label').each(function () {
            const t = d3.select(this);
            const sub = (t.attr('transform') || '').includes('rotate') ? '2' : '1';
            t.append('tspan').attr('dy', '0.28em').attr('font-size', '0.72em').text(sub);
        });
    }

    // Wavy inner edge of an L-shaped band hugging both axes, at perpendicular
    // distance `off` from each axis (data coords, domain [0,20]). Both near-axis
    // bands below — the "bug" band and the "unlikely to be explored" band just
    // outside it — share this wave, so their edges stay parallel and contiguous.
    // The wobble keeps an edge from reading as a rigid contour. Tweak the look here.
    function swarmBandEdge(off) {
        const amp = 0.4;        // wobble amplitude
        const base = (t, s) =&gt; 0.82 * Math.sin(t * 1.05 + s) + 0.18 * Math.sin(t * 2.3 + s * 1.7);   // one dominant wave + slight irregularity
        const ramp = t =&gt; Math.min(1, (t - 1) / 3);   // damp wobble to 0 near the corner so the arms meet cleanly
        const wob = (t, s) =&gt; amp * ramp(t) * base(t, s);
        const horiz = d3.range(20, off - 1e-6, -0.5).map(xv =&gt; [xv, off + wob(xv, 0)]);     // along x-axis, 20 → corner
        const vert  = d3.range(off, 20 + 1e-6, 0.5).map(yv =&gt; [off + wob(yv, 11.3), yv]);   // along y-axis, corner → 20
        return horiz.concat(vert);
    }

    // Catmull-Rom line through band-edge points, projected through the scales.
    function swarmBandLine(x, y) {
        return d3.line().curve(d3.curveCatmullRom.alpha(0.5)).x(d =&gt; x(d[0])).y(d =&gt; y(d[1]));
    }

    // Clip an overlay to the data panel [0,20]^2 so a wobbly, spline-smoothed band
    // edge can't bleed past the axes. Each call mints a fresh id: several charts
    // share this page, and clip-path: url(#id) resolves to the *first* match in the
    // document, so a shared id would clip one figure against another's bounds.
    let swarmClipSeq = 0;
    function swarmPanelClip(svg, x, y) {
        const id = `swarm-panel-clip-${++swarmClipSeq}`;
        let defs = svg.select('defs');
        if (defs.empty()) defs = svg.append('defs');
        defs.append('clipPath').attr('id', id).append('rect')
            .attr('x', x(0)).attr('y', y(20))
            .attr('width', x(20) - x(0)).attr('height', y(0) - y(20));
        return `url(#${id})`;
    }

    // The wavy L-shaped "bug" region hugging both axes (band width ~1). Bugs masked
    // by combining two rules live here; styled by the shared .bug-region__* rules.
    function swarmBugRegion(svg, x, y, labelX = 13) {
        const inner = swarmBandEdge(1);
        const line = swarmBandLine(x, y);
        const g = svg.append('g').attr('class', 'bug-region');
        g.append('path').attr('class', 'bug-region__fill')   // outer axes corner + wavy inner edge
            .attr('d', `M${x(0)},${y(0)} L${x(20)},${y(0)} ` + line(inner).replace(/^M/, 'L') + ` L${x(0)},${y(20)} Z`);
        g.append('path').attr('class', 'bug-region__frontier')
            .attr('d', line(inner));
        g.append('text').attr('class', 'bug-region__label')
            .attr('x', x(labelX)).attr('y', y(0.5)).attr('dy', '0.3em').text('bug');
        return g;
    }

    // The grey, diagonally-hatched "unlikely to be explored" band: an L-shaped ribbon
    // contiguous with the bug band but one step further from each axis (higher up in x
    // and y). Even with swarm enabled a disabled rule runs ~0 times and an enabled one
    // ~10, so test cases land on the axes or in the centre, almost never in this gap.
    function swarmUnexploredRegion(svg, x, y) {
        const inner = swarmBandEdge(1);     // = the bug band's outer edge (contiguous)
        const outer = swarmBandEdge(4);     // a handful of calls off each axis
        const ring = inner.concat(outer.slice().reverse());
        const line = swarmBandLine(x, y);

        // diagonal hatch fill (article-local grey; not a palette colour)
        svg.append('defs').append('pattern')
            .attr('id', 'swarm-unexplored-hatch').attr('patternUnits', 'userSpaceOnUse')
            .attr('width', 6).attr('height', 6).attr('patternTransform', 'rotate(45)')
            .append('line').attr('class', 'unexplored__hatch')
            .attr('x1', 0).attr('y1', 0).attr('x2', 0).attr('y2', 6);

        const g = svg.append('g').attr('class', 'unexplored').attr('clip-path', swarmPanelClip(svg, x, y));
        g.append('path').attr('class', 'unexplored__fill').attr('d', line(ring) + 'Z');
        g.append('text').attr('class', 'unexplored__label')
            .attr('x', x(10)).attr('y', y(2.3)).text('unlikely to be explored');
        return g;
    }

    // The trapezoidal "optimize bug" wedge: a bug that needs a fixed pop:push ratio
    // (5:1) once the history is long enough. A fixed ratio is an angular wedge from the
    // origin (near the pop axis); the "sufficiently long" condition lops off the apex,
    // leaving the outer trapezoid — which lands inside the under-explored grey band.
    // Shared by the optimize-bug figure and the continuous-swarm build-up figure that
    // floods this region (so both name the same bug).
    function swarmOptimizeBugRegion(svg, x, y) {
        const ratio = 5;                               // pop : push
        const slope = 1 / ratio;                       // push per unit pop along the nominal ray
        const fan = 0.07;                              // ±half-fan in slope (symmetric about the ratio ray)
        const y0 = 9;                                  // "sufficiently long history": wedge starts out here
        const onRay = (yv, s) =&gt; yv * s;               // push at pop = yv for a ray of the given slope
        const sNear = slope - fan, sFar = slope + fan;   // near-axis edge (less push) / far edge (more push)
        const wedge = [
            [onRay(y0, sNear), y0], [onRay(20, sNear), 20],   // near-axis edge
            [onRay(20, sFar), 20], [onRay(y0, sFar), y0],     // far edge
        ];
        const pts = wedge.map(([a, b]) =&gt; `${x(a)},${y(b)}`).join(' ');
        const g = svg.append('g').attr('class', 'bug-region');
        g.append('polygon').attr('class', 'bug-region__fill').attr('points', pts);
        g.append('polygon').attr('class', 'bug-region__frontier').attr('points', pts);   // outline the region (no centre spike)
        g.append('text').attr('class', 'bug-region__label')
            .attr('x', x(onRay(16, slope) + 2.65)).attr('y', y(16)).text('bug');
        return g;
    }
&lt;/script&gt;

&lt;p&gt;Swarm testing is a technique for increasing behavioral diversity in randomized testing. It's conceptually simple, yet powerful, which makes it a favorite of mine. In this post, I describe a natural extension to swarm testing which yields an additional increase in behavioral diversity.&lt;/p&gt;
&lt;h2 id="traditional-swarm-testing"&gt;Traditional swarm testing&lt;/h2&gt;
&lt;p&gt;Consider a stack machine with three instructions (&lt;code&gt;push&lt;/code&gt;, &lt;code&gt;pop&lt;/code&gt;, &lt;code&gt;add&lt;/code&gt;), and the corresponding stateful test&lt;sup id="fnref:1"&gt;&lt;a class="footnote-ref" href="#fn:1"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;span class="sidenote"&gt;&lt;sup&gt;1&lt;/sup&gt; In pseudocode, because I want to emphasize the behavior before any particular testing framework changes the distribution.&amp;#160;&lt;/span&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;StackMachineTest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="nd"&gt;@rule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;integers&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt;

    &lt;span class="nd"&gt;@rule&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# early-returns if less than one value on stack&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt;

    &lt;span class="nd"&gt;@rule&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# early-returns if less than two values on stack&lt;/span&gt;
        &lt;span class="o"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Here is one simple approach to exercising this test. For each test case, sample the number of rules &lt;span class="arithmatex"&gt;\(n\)&lt;/span&gt; to run from some distribution centered on the desired average test case size. Then pick the next rule to run uniformly at random (from &lt;code&gt;{push, pop, add}&lt;/code&gt;), until you've run &lt;span class="arithmatex"&gt;\(n\)&lt;/span&gt; rules total.&lt;/p&gt;
&lt;p&gt;This testing strategy has a weakness for our test. Suppose the stack machine implementation has a bug, which only manifests when the stack size is large (say, &amp;gt; 10). We can visualize whether the test finds this bug by plotting the number of calls to &lt;code&gt;push&lt;/code&gt; vs &lt;code&gt;pop&lt;/code&gt;:&lt;/p&gt;
&lt;script&gt;
    Figures.figure(() =&gt; Figures.JointDensity(
        { x: Figures.dist.normal(10, 2.5), y: Figures.dist.normal(10, 2.5), xDomain: [0, 20], yDomain: [0, 20] },
        {
            equal: true,
            xLabel: '# calls to push',
            yLabel: '# calls to pop',
            // The bug needs a deep stack, so pushes must outrun pops by ~k: the
            // frontier is the line push - pop = k, and below it (few pops) the
            // bug becomes reachable. decorate() hands us the final data-&gt;pixel
            // scales, so we shade that region without touching layout geometry.
            decorate({ svg, x, y }) {
                const k = 10;                                  // stack-depth threshold
                const region = [[k, 0], [20, 0], [20, 20 - k]];   // push - pop &gt;= k, clipped to [0,20]^2
                const g = svg.append('g').attr('class', 'bug-region');
                g.append('polygon')
                    .attr('class', 'bug-region__fill')
                    .attr('points', region.map(([a, b]) =&gt; `${x(a)},${y(b)}`).join(' '));
                g.append('line')                              // the frontier itself: push - pop = k
                    .attr('class', 'bug-region__frontier')
                    .attr('x1', x(k)).attr('y1', y(0))
                    .attr('x2', x(20)).attr('y2', y(20 - k));
                g.append('text')
                    .attr('class', 'bug-region__label')
                    .attr('x', x(17.5)).attr('y', y(3))
                    .text('bug');
            },
        }
    ));
&lt;/script&gt;

&lt;p&gt;This plot shows the joint distribution of the number of calls to &lt;code&gt;push&lt;/code&gt; and &lt;code&gt;pop&lt;/code&gt; within a test case&lt;sup id="fnref:2"&gt;&lt;a class="footnote-ref" href="#fn:2"&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;span class="sidenote"&gt;&lt;sup&gt;2&lt;/sup&gt; &lt;code&gt;push&lt;/code&gt; and &lt;code&gt;pop&lt;/code&gt; are individually normally distributed, because picking the next rule uniformly at random is a bernoulli trial, from which repeated draws form a normal distribution.&amp;#160;&lt;/span&gt; Each "point" on the plot represents a single test case. The bug lives in the lower right corner, where &lt;code&gt;push - pop ≥ 10&lt;/code&gt;. Because we expect to draw roughly as many &lt;code&gt;pop&lt;/code&gt; rules as &lt;code&gt;push&lt;/code&gt; rules, the stack is unlikely to grow large enough to trigger the bug.&lt;/p&gt;
&lt;p&gt;This leads us to the following general observation: some features, like &lt;code&gt;push&lt;/code&gt; and &lt;code&gt;pop&lt;/code&gt; here, actively mask bugs when combined together. Conceptually, such bugs live along the axes of our plots:&lt;/p&gt;
&lt;script&gt;
    Figures.figure(() =&gt; Figures.JointDensity(
        { x: Figures.dist.normal(10, 2.5), y: Figures.dist.normal(10, 2.5), xDomain: [0, 20], yDomain: [0, 20] },
        {
            equal: true,
            xLabel: '# calls to Rule',   // subscript appended in decorate()
            yLabel: '# calls to Rule',
            // Bugs masked by combining two rules surface only when one rule
            // runs many times and the other barely at all — i.e. along the axes.
            // We shade a continuous L-shaped band hugging both axes (meeting at the
            // origin); the central blob (both rules run ~equally) never reaches it.
            decorate({ svg, x, y }) {
                swarmAxisSubscripts(svg);
                swarmBugRegion(svg, x, y, 13);
            },
        }
    ));
&lt;/script&gt;

&lt;p&gt;And are unlikely to be triggered.&lt;/p&gt;
&lt;p&gt;We would like a testing strategy which explores this part of the search space. The insight of swarm testing is that one can achieve this by randomly disabling certain features for an individual test case. For example, one might assign a 50% probability of disabling each rule&lt;sup id="fnref:3"&gt;&lt;a class="footnote-ref" href="#fn:3"&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;span class="sidenote"&gt;&lt;sup&gt;3&lt;/sup&gt; This is the algorithm used in the &lt;a href="https://users.cs.utah.edu/~regehr/papers/swarm12.pdf"&gt;swarm testing paper&lt;/a&gt;. Note however that this is a poor choice for other reasons: it is unlikely to disable either almost all, or almost no, rules as the number of rules grows. The fix is straightforward, but orthogonal to this article.&amp;#160;&lt;/span&gt; For the interaction of Rule&lt;span class="subscript"&gt;1&lt;/span&gt; and Rule&lt;span class="subscript"&gt;2&lt;/span&gt;, there are four equally-likely possibilities in a test case:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Both rules are enabled. As above.&lt;/li&gt;
&lt;li&gt;Rule&lt;span class="subscript"&gt;1&lt;/span&gt; is enabled, but not Rule&lt;span class="subscript"&gt;2&lt;/span&gt;. We see some exploration along &lt;span class="arithmatex"&gt;\(y = 0\)&lt;/span&gt;.&lt;/li&gt;
&lt;li&gt;Rule&lt;span class="subscript"&gt;2&lt;/span&gt; is enabled, but not Rule&lt;span class="subscript"&gt;1&lt;/span&gt;. We see some exploration along &lt;span class="arithmatex"&gt;\(x = 0\)&lt;/span&gt;.&lt;/li&gt;
&lt;li&gt;Neither are enabled. No exploration; uninteresting.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can visualize the resulting distribution as the sum of the first three cases:&lt;/p&gt;
&lt;script&gt;
    Figures.figure(() =&gt; {
        const N = Figures.dist.normal(10, 2.5);    // an enabled rule: ~normal
        const SPIKE = Figures.dist.normal(0, 0.5); // disabled rule, centre only: a tight blob at 0
                                                   // (σ kept small so the outermost contour stays within the y=1/x=1 band)
        // The three non-trivial swarm configurations. xOn / yOn say whether push
        // (x) / pop (y) is enabled; a disabled rule is called exactly 0 times.
        const COMPONENTS = [
            { key: 'both', xOn: true,  yOn: true  },
            { key: 'pop',  xOn: true,  yOn: false },   // pop disabled
            { key: 'push', xOn: false, yOn: true  },   // push disabled
        ];
        const dens = on =&gt; on ? N : SPIKE;             // density used for the centre mixture

        const width = 520, height = 480;
        const levels = COMPONENTS.length;
        const dStag = 40, curve = 30, pad = 10;       // stagger step, curve height, breathing room
        const mLeft = 52, mBottom = 44;
        const mTop = (levels - 1) * dStag + curve + pad;
        const mRight = (levels - 1) * dStag + curve + pad;

        const PL = mLeft, PT = mTop;
        const size = Math.min(width - mRight - PL, height - mBottom - PT);   // square panel
        const PR = PL + size, PB = PT + size;

        const X = d3.scaleLinear().domain([0, 20]).range([PL, PR]);
        const Y = d3.scaleLinear().domain([0, 20]).range([PB, PT]);

        const svg = d3.create('svg').attr('viewBox', [0, 0, width, height]).attr('class', 'figure__chart swarm');

        // grid
        svg.append('g').attr('class', 'grid').attr('transform', `translate(${PL},0)`)
            .call(d3.axisLeft(Y).ticks(5).tickSize(-(PR - PL)).tickFormat(''));
        svg.append('g').attr('class', 'grid').attr('transform', `translate(0,${PB})`)
            .call(d3.axisBottom(X).ticks(5).tickSize(-(PB - PT)).tickFormat(''));

        // center: contours of the combined mixture (the swarm distribution).
        // Each component carries equal *mass* (1/3), but a disabled rule is a tight
        // spike at 0, so its mass is squeezed into a sliver and its *density* blows
        // up — the axis ridges would swamp the centre blob. Since all three configs
        // are equally likely, we normalise each component to unit peak before summing,
        // so the contours read at equal visual weight rather than by density.
        const n = 90;
        const g1 = i =&gt; (i / (n - 1)) * 20;
        const densPeak = on =&gt; on ? N(10) : SPIKE(0);             // peak of each 1D density
        const compPeak = c =&gt; densPeak(c.xOn) * densPeak(c.yOn);   // peak of the 2D component
        const joint = (a, b) =&gt; d3.sum(COMPONENTS, c =&gt; dens(c.xOn)(a) * dens(c.yOn)(b) / compPeak(c)) / levels;
        const vals = new Array(n * n);
        for (let j = 0; j &lt; n; j++)
            for (let i = 0; i &lt; n; i++) vals[j * n + i] = joint(g1(i), g1(j));
        const bands = 6, max = d3.max(vals);
        const contours = d3.contours().size([n, n])
            .thresholds(d3.range(1, bands + 1).map(k =&gt; (k / (bands + 1)) * max))(vals);
        const project = d3.geoTransform({ point(cx, cy) { this.stream.point(X(g1(cx)), Y(g1(cy))); } });
        svg.append('g').selectAll('path').data(contours).join('path')
            .attr('class', 'swarm__contour')
            .attr('fill-opacity', (d, i) =&gt; 0.10 + 0.46 * i / (bands - 1))
            .attr('d', d3.geoPath(project));

        // axes
        svg.append('g').attr('class', 'axis').attr('transform', `translate(0,${PB})`)
            .call(d3.axisBottom(X).ticks(5).tickSizeOuter(0));
        svg.append('g').attr('class', 'axis').attr('transform', `translate(${PL},0)`)
            .call(d3.axisLeft(Y).ticks(5).tickSizeOuter(0));
        svg.append('text').attr('class', 'axis-label')
            .attr('x', (PL + PR) / 2).attr('y', height - 6).text('# calls to Rule');
        svg.append('text').attr('class', 'axis-label')
            .attr('transform', `translate(16,${(PT + PB) / 2}) rotate(-90)`).text('# calls to Rule');
        swarmAxisSubscripts(svg);

        // the same wavy L-shaped bug region as the figure above (tweak in swarmBugRegion)
        swarmBugRegion(svg, X, Y, 18.5);

        // marginals. An enabled rule is ~normal (scaled to fill the strip); a
        // disabled rule never runs, so its marginal is drawn flat at zero.
        const nPeak = d3.max(d3.range(n + 1).map(i =&gt; N((i / n) * 20)));
        const pts = d3.range(n + 1).map(i =&gt; { const t = (i / n) * 20; return [t, N(t)]; });

        COMPONENTS.forEach((c, L) =&gt; {
            const g = svg.append('g').attr('class', `swarm__component swarm__component--${c.key}`);

            // top: x-marginal
            const baseT = PT - L * dStag, hT = v =&gt; baseT - (v / nPeak) * curve;
            if (c.xOn) {
                g.append('path').attr('class', 'swarm__area')
                    .attr('d', d3.area().curve(d3.curveBasis).x(d =&gt; X(d[0])).y0(baseT).y1(d =&gt; hT(d[1]))(pts));
                g.append('path').attr('class', 'swarm__line')
                    .attr('d', d3.line().curve(d3.curveBasis).x(d =&gt; X(d[0])).y(d =&gt; hT(d[1]))(pts));
            } else {
                g.append('path').attr('class', 'swarm__line')      // disabled: flat at zero
                    .attr('d', `M${X(0)},${baseT} L${X(20)},${baseT}`);
            }

            // right: y-marginal
            const baseR = PR + L * dStag, wR = v =&gt; baseR + (v / nPeak) * curve;
            if (c.yOn) {
                g.append('path').attr('class', 'swarm__area')
                    .attr('d', d3.area().curve(d3.curveBasis).y(d =&gt; Y(d[0])).x0(baseR).x1(d =&gt; wR(d[1]))(pts));
                g.append('path').attr('class', 'swarm__line')
                    .attr('d', d3.line().curve(d3.curveBasis).y(d =&gt; Y(d[0])).x(d =&gt; wR(d[1]))(pts));
            } else {
                g.append('path').attr('class', 'swarm__line')      // disabled: flat at zero
                    .attr('d', `M${baseR},${Y(0)} L${baseR},${Y(20)}`);
            }

            if (L &gt; 0) {   // level 0's square is the panel itself
                const cx = PR + L * dStag, cy = PT - L * dStag;
                g.append('path').attr('class', 'swarm__bracket')
                    .attr('d', `M${PR},${cy} L${cx},${cy} L${cx},${PT}`);
            }
        });

        return svg.node();
    });
&lt;/script&gt;

&lt;p&gt;This testing strategy now explores the previously unlikely state space that contains this type of bug. This testing strategy would easily find our &lt;code&gt;push&lt;/code&gt; / &lt;code&gt;pop&lt;/code&gt; bug, for example.&lt;/p&gt;
&lt;h2 id="a-problem"&gt;A problem&lt;/h2&gt;
&lt;p&gt;Up to this point, I've described traditional swarm testing. And it's great; we get some nice increase in diversity. Specifically, we can explore states which require one rule or more rules to be completely disabled.&lt;/p&gt;
&lt;p&gt;But, as you may have noticed, some under-explored areas remain&lt;sup id="fnref:4"&gt;&lt;a class="footnote-ref" href="#fn:4"&gt;4&lt;/a&gt;&lt;/sup&gt;:&lt;span class="sidenote"&gt;&lt;sup&gt;4&lt;/sup&gt; I am intentionally ignoring the search space represented by the upper right area. This area can easily be covered by increasing the average number of rules run in a test case.&amp;#160;&lt;/span&gt;&lt;/p&gt;
&lt;script&gt;
    Figures.figure(() =&gt; {
        // The swarm distribution from the previous figure, as one combined density.
        const N = Figures.dist.normal(10, 2.5);    // an enabled rule: ~normal
        const SPIKE = Figures.dist.normal(0, 0.5); // a disabled rule: a tight blob at 0
        const COMPONENTS = [
            { xOn: true,  yOn: true  },
            { xOn: true,  yOn: false },
            { xOn: false, yOn: true  },
        ];
        const dens = on =&gt; on ? N : SPIKE;
        const densPeak = on =&gt; on ? N(10) : SPIKE(0);
        const compPeak = c =&gt; densPeak(c.xOn) * densPeak(c.yOn);   // normalise each component to unit peak
        const joint = (a, b) =&gt; d3.sum(COMPONENTS, c =&gt; dens(c.xOn)(a) * dens(c.yOn)(b) / compPeak(c)) / COMPONENTS.length;

        return Figures.JointDensity(
            { joint, xDomain: [0, 20], yDomain: [0, 20] },
            {
                equal: true,
                marginals: false,
                xLabel: '# calls to Rule',
                yLabel: '# calls to Rule',
                // The bug band hugs the axes; just outside it sits the grey hatched
                // band swarm leaves under-explored (a rule is ~0 or ~10, never in
                // between), which the simple extension below fills in.
                decorate({ svg, x, y }) {
                    swarmAxisSubscripts(svg);
                    swarmBugRegion(svg, x, y, 18.5);
                    swarmUnexploredRegion(svg, x, y);
                },
            }
        );
    });
&lt;/script&gt;

&lt;p&gt;The newly-highlighted area corresponds to when Rule&lt;span class="subscript"&gt;1&lt;/span&gt; is enabled, but substantially less likely than Rule&lt;span class="subscript"&gt;2&lt;/span&gt;; or vice versa.&lt;/p&gt;
&lt;p&gt;To give a concrete example of why we might care about this case, suppose our stack machine gains a new &lt;code&gt;optimize&lt;/code&gt; opcode. When run, &lt;code&gt;optimize&lt;/code&gt; looks at the execution history of the machine and performs a dynamic JIT-style optimization. Now suppose that &lt;code&gt;optimize&lt;/code&gt; has a bug only when the execution history is sufficiently long, and there is the right ratio of &lt;code&gt;pop&lt;/code&gt; calls to &lt;code&gt;push&lt;/code&gt; calls; say, 5 to 1:&lt;/p&gt;
&lt;script&gt;
    Figures.figure(() =&gt; {
        // Same swarm distribution as the figure above, so the under-explored grey
        // band reads identically — only now we drop a concrete bug into it.
        const N = Figures.dist.normal(10, 2.5);    // an enabled rule: ~normal
        const SPIKE = Figures.dist.normal(0, 0.5); // a disabled rule: a tight blob at 0
        const COMPONENTS = [
            { xOn: true,  yOn: true  },
            { xOn: true,  yOn: false },
            { xOn: false, yOn: true  },
        ];
        const dens = on =&gt; on ? N : SPIKE;
        const densPeak = on =&gt; on ? N(10) : SPIKE(0);
        const compPeak = c =&gt; densPeak(c.xOn) * densPeak(c.yOn);   // normalise each component to unit peak
        const joint = (a, b) =&gt; d3.sum(COMPONENTS, c =&gt; dens(c.xOn)(a) * dens(c.yOn)(b) / compPeak(c)) / COMPONENTS.length;

        return Figures.JointDensity(
            { joint, xDomain: [0, 20], yDomain: [0, 20] },
            {
                equal: true,
                marginals: false,
                // Concrete example, so the axes name the concrete rules (no Rule subscripts).
                xLabel: '# calls to push',
                yLabel: '# calls to pop',
                // The under-explored grey band, with the concrete optimize bug (the
                // trapezoidal 5:1-ratio wedge) dropped into it.
                decorate({ svg, x, y }) {
                    swarmUnexploredRegion(svg, x, y);
                    swarmOptimizeBugRegion(svg, x, y);
                },
            }
        );
    });
&lt;/script&gt;

&lt;p&gt;This bug has two conditions: that &lt;code&gt;push&lt;/code&gt; and &lt;code&gt;pop&lt;/code&gt; have the right ratio, and that both rules are enabled. It therefore won't be caught by either the original testing strategy (which is unlikely to produce the right ratio) or by the swarm testing strategy (which will fully disable one of the rules).&lt;/p&gt;
&lt;h2 id="a-simple-extension"&gt;A simple extension&lt;/h2&gt;
&lt;p&gt;With this motivating example in mind, I propose a simple extension to swarm testing. Traditionally, each rule is disabled with 50% probability. Instead, I propose that for each test case, each rule &lt;span class="arithmatex"&gt;\(r\)&lt;/span&gt; is assigned an activation probability &lt;span class="arithmatex"&gt;\(r_p \in [0, 1]\)&lt;/span&gt;, sampled uniformly. Then, whenever a rule would normally be run, it is instead skipped with probability &lt;span class="arithmatex"&gt;\(1 - r_p\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;It might be helpful to play around and see why this algorithm gives us coverage of the previously-rare regions:&lt;/p&gt;
&lt;script&gt;
    Figures.figure(() =&gt; {
        // "Pick one activation config" — an empty joint panel that, on hover, shows the
        // distribution a single (r_push, r_pop) draw induces: the cursor is the centre of
        // the component, so r_push = push/20, r_pop = pop/20 (a fully-enabled rule peaks
        // at 20 calls — RULES/3 — matching the build-up figure below). The two marginal
        // strips show each rule's call-count distribution; a faint contour blob previews
        // the joint the build-up figure stamps down. (Article-local &amp; interactive.)
        const DOM = 20;
        const RULES = 60;                               // total rules per test case (matches the build-up figure)
        const MEAN = p =&gt; (RULES / 3) * p;              // activation p -&gt; expected calls (fully-enabled peaks at 20)
        const SIGMA_FLOOR = 0.5;
        const SIGMA = p =&gt; Math.max(Math.sqrt(RULES * (p / 3) * (1 - p / 3)), SIGMA_FLOOR);   // sd of the rule-selection (binomial), floored

        const n = 80;                                   // density grid resolution
        const g = i =&gt; (i / (n - 1)) * DOM;             // grid index -&gt; data coord

        // layout identical to the build-up figure below, so the two stack consistently
        const width = 460, height = 420;
        const margin = { top: 12, right: 12, bottom: 48, left: 52 };
        const marginal = 48;                            // thickness of each marginal strip
        let PL = margin.left, PR = width - margin.right - marginal;
        let PT = margin.top + marginal, PB = height - margin.bottom;
        const side = Math.min(PR - PL, PB - PT);        // square panel, centred in the slack
        PL += (PR - PL - side) / 2; PR = PL + side;
        PT += (PB - PT - side) / 2; PB = PT + side;
        const X = d3.scaleLinear().domain([0, DOM]).range([PL, PR]);
        const Y = d3.scaleLinear().domain([0, DOM]).range([PB, PT]);
        const TOP = { base: PT, lo: PT - marginal };    // x-marginal strip, above the panel
        const RIGHT = { base: PR, hi: PR + marginal };  // y-marginal strip, right of the panel
        const levels = 6;

        // One activation config as a separable 2D Gaussian centred at (mx,my): its
        // per-axis pdfs (also the marginals) plus the n×n joint grid for the contour.
        function componentGrid(mx, my, sx, sy) {
            const colX = new Float64Array(n), colY = new Float64Array(n);
            for (let i = 0; i &lt; n; i++) {
                colX[i] = Math.exp(-0.5 * ((g(i) - mx) / sx) ** 2) / (sx * Math.sqrt(2 * Math.PI));
                colY[i] = Math.exp(-0.5 * ((g(i) - my) / sy) ** 2) / (sy * Math.sqrt(2 * Math.PI));
            }
            const vals = new Array(n * n);
            for (let j = 0; j &lt; n; j++)
                for (let i = 0; i &lt; n; i++) vals[j * n + i] = colX[i] * colY[j];
            return { vals, colX, colY };
        }

        // contours + marginals, same rust styling as the build-up figure
        function drawContours(group, vals) {
            const max = d3.max(vals) || 1;
            const contours = d3.contours().size([n, n])
                .thresholds(d3.range(1, levels + 1).map(k =&gt; (k / (levels + 1)) * max))(vals);
            const project = d3.geoTransform({ point(cx, cy) { this.stream.point(X(g(cx)), Y(g(cy))); } });
            group.selectAll('path').data(contours).join('path')
                .attr('class', 'swarm__contour')
                .attr('fill-opacity', (d, i) =&gt; 0.10 + 0.50 * i / (levels - 1))
                .attr('d', d3.geoPath(project));
        }
        function topMarginal(group, pts) {
            const h = d3.scaleLinear().domain([0, d3.max(pts, d =&gt; d[1]) || 1]).range([TOP.base, TOP.lo]);
            group.append('path').attr('class', 'density')
                .attr('d', d3.area().curve(d3.curveBasis).x(d =&gt; X(d[0])).y0(TOP.base).y1(d =&gt; h(d[1]))(pts));
            group.append('path').attr('class', 'density__line')
                .attr('d', d3.line().curve(d3.curveBasis).x(d =&gt; X(d[0])).y(d =&gt; h(d[1]))(pts));
        }
        function rightMarginal(group, pts) {
            const w = d3.scaleLinear().domain([0, d3.max(pts, d =&gt; d[1]) || 1]).range([RIGHT.base, RIGHT.hi]);
            group.append('path').attr('class', 'density')
                .attr('d', d3.area().curve(d3.curveBasis).y(d =&gt; Y(d[0])).x0(RIGHT.base).x1(d =&gt; w(d[1]))(pts));
            group.append('path').attr('class', 'density__line')
                .attr('d', d3.line().curve(d3.curveBasis).y(d =&gt; Y(d[0])).x(d =&gt; w(d[1]))(pts));
        }

        // static chart: grid + axes + labels, plus an (initially empty) hover layer
        const svg = d3.create('svg').attr('viewBox', [0, 0, width, height]).attr('class', 'figure__chart')
            .style('touch-action', 'none');   // claim tap-drag for the panel; on the root &lt;svg&gt; (WebKit ignores it on inner SVG els)
        svg.append('g').attr('class', 'grid').attr('transform', `translate(${PL},0)`)
            .call(d3.axisLeft(Y).ticks(5).tickSize(-(PR - PL)).tickFormat(''));
        svg.append('g').attr('class', 'grid').attr('transform', `translate(0,${PB})`)
            .call(d3.axisBottom(X).ticks(5).tickSize(-(PB - PT)).tickFormat(''));
        const hoverG = svg.append('g').attr('class', 'swarm-hover');
        svg.append('g').attr('class', 'axis').attr('transform', `translate(0,${PB})`)
            .call(d3.axisBottom(X).ticks(5).tickSizeOuter(0));
        svg.append('g').attr('class', 'axis').attr('transform', `translate(${PL},0)`)
            .call(d3.axisLeft(Y).ticks(5).tickSizeOuter(0));
        svg.append('text').attr('class', 'axis-label')
            .attr('x', (PL + PR) / 2).attr('y', height - 8).text('# calls to push');
        svg.append('text').attr('class', 'axis-label')
            .attr('transform', `translate(18,${(PT + PB) / 2}) rotate(-90)`).text('# calls to pop');

        // the same under-explored band + optimize bug as the figures below, for context
        swarmUnexploredRegion(svg, X, Y);
        swarmOptimizeBugRegion(svg, X, Y);

        // Idle cue: a solid dot at panel centre that pings outward and fades (two
        // staggered pings), so a blank panel reads as interactive rather than broken.
        // JS-driven for the ease-out expansion, opacity fade, and staggered cadence;
        // removed for good on the first hover/tap. CSS owns only the rust fill.
        const cueCx = (PL + PR) / 2, cueCy = (PT + PB) / 2;
        const cueG = svg.append('g').attr('class', 'swarm-hover__cue');
        const cuePings = d3.range(2).map(() =&gt; cueG.append('circle')
            .attr('class', 'swarm-hover__ping').attr('cx', cueCx).attr('cy', cueCy));
        cueG.append('circle').attr('class', 'swarm-hover__core').attr('cx', cueCx).attr('cy', cueCy).attr('r', 2.5);
        cueG.append('text').attr('class', 'swarm-hover__cue-label').attr('x', cueCx).attr('y', cueCy - 40).text('hover');
        const CUE = { pingR: 3.75, maxScale: 9.1, period: 5200, startOp: 0.55, stagger: 0.55 };
        const cueEase = t =&gt; 1 - (1 - t) * (1 - t);   // ease-out: shoots out, slows
        const cueTimer = d3.timer(elapsed =&gt; {
            cuePings.forEach((p, k) =&gt; {
                const t = (((elapsed / CUE.period) - k * CUE.stagger) % 1 + 1) % 1;   // ping k starts k*stagger into the cycle
                p.attr('r', CUE.pingR * (1 + (CUE.maxScale - 1) * cueEase(t)))
                    .attr('opacity', CUE.startOp * (1 - t));                          // fades to 0 by cycle end → invisible reset
            });
        });
        let engaged = false;
        function engage() {
            if (engaged) return;
            engaged = true;
            cueTimer.stop();
            cueG.remove();
        }

        // The labels stay put; only the two numbers blank out when not hovering. The
        // value spans always hold a 4-char tabular number ("0.00"), so hiding just their
        // visibility leaves a blank gap of exactly the right width — nothing reflows.
        const readout = document.createElement('div');
        readout.className = 'swarm-hover__readout';
        readout.innerHTML =
            `push&lt;span class="subscript"&gt;p&lt;/span&gt; = &lt;span class="val" data-v="push"&gt;0.00&lt;/span&gt;`
            + `&amp;nbsp;&amp;nbsp;&amp;nbsp; pop&lt;span class="subscript"&gt;p&lt;/span&gt; = &lt;span class="val" data-v="pop"&gt;0.00&lt;/span&gt;`;
        const valPush = readout.querySelector('[data-v="push"]');
        const valPop = readout.querySelector('[data-v="pop"]');
        function setVals(rPush, rPop, visible) {
            valPush.textContent = rPush.toFixed(2);
            valPop.textContent = rPop.toFixed(2);
            valPush.style.visibility = valPop.style.visibility = visible ? 'visible' : 'hidden';
        }
        setVals(0, 0, false);   // labels shown, numbers blank but width-reserved

        function clear() {
            hoverG.selectAll('*').remove();
            setVals(0, 0, false);
        }
        function update(px, py) {
            const cx = Math.max(PL, Math.min(PR, px)), cy = Math.max(PT, Math.min(PB, py));
            const dataX = X.invert(cx), dataY = Y.invert(cy);
            const rPush = dataX / DOM, rPop = dataY / DOM;             // MEAN(p) = 20p  =&gt;  p = mean / 20
            const { vals, colX, colY } = componentGrid(dataX, dataY, SIGMA(rPush), SIGMA(rPop));

            hoverG.selectAll('*').remove();
            drawContours(hoverG.append('g'), vals);
            topMarginal(hoverG.append('g'), d3.range(n).map(i =&gt; [g(i), colX[i]]));
            rightMarginal(hoverG.append('g'), d3.range(n).map(i =&gt; [g(i), colY[i]]));
            hoverG.append('line').attr('class', 'swarm-hover__guide')   // drop to x-axis
                .attr('x1', cx).attr('y1', cy).attr('x2', cx).attr('y2', PB);
            hoverG.append('line').attr('class', 'swarm-hover__guide')   // across to y-axis
                .attr('x1', cx).attr('y1', cy).attr('x2', PL).attr('y2', cy);
            hoverG.append('circle').attr('class', 'swarm-hover__dot').attr('cx', cx).attr('cy', cy).attr('r', 3);

            setVals(rPush, rPop, true);
        }

        // hover (mouse) + tap/drag (touch) over the panel; touch pins the last config
        svg.append('rect')
            .attr('x', PL).attr('y', PT).attr('width', PR - PL).attr('height', PB - PT)
            .attr('fill', 'transparent').style('cursor', 'crosshair')
            .on('pointermove pointerdown', function (e) { engage(); const [px, py] = d3.pointer(e); update(px, py); })
            .on('pointerleave', function (e) { if (e.pointerType === 'mouse') clear(); });

        const wrap = document.createElement('div');
        wrap.appendChild(svg.node());
        wrap.appendChild(readout);
        return wrap;
    });
&lt;/script&gt;

&lt;p&gt;Conceptually, we are letting the distribution "roam around" our graph uniformly. Because we're uniformly sampling push&lt;span class="subscript"&gt;p&lt;/span&gt; &lt;span class="arithmatex"&gt;\(\in [0, 1]\)&lt;/span&gt; and pop&lt;span class="subscript"&gt;p&lt;/span&gt; &lt;span class="arithmatex"&gt;\(\in [0, 1]\)&lt;/span&gt;, we're equally likely to get a distribution centered on any point in the space of &lt;code&gt;# calls to push&lt;/code&gt; vs &lt;code&gt;# calls to pop&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Here, you can see that exploration in practice:&lt;/p&gt;
&lt;script&gt;
    Figures.figure(() =&gt; {
        // The continuous-p swarm distribution, built up by Monte Carlo. Each click
        // samples one activation config (push_p, pop_p) ~ U[0,1] and adds its exact
        // density — a Gaussian centered at (20*push_p, 20*pop_p) — to an accumulator
        // grid. The rendered contours are the running sum; in the limit they converge
        // to the true swarm distribution: concentrated in the lower-left and fading
        // toward the far corner, flooding the under-explored band. (Article-local &amp;
        // interactive, so it's a bespoke inline figure rather than a figures.js chart.)
        //
        // We size the test case so a fully-enabled rule averages 20 calls (a 60-rule
        // test), rather than 10 (30 rules). Uniform p_r only ever *reduces* call counts,
        // so with the shorter test the cloud would top out near the middle of the panel;
        // lengthening it lets the same distribution fill the [0,20]^2 window the
        // always-on figures used — the shape is exact, just scaled to the same canvas.
        const DOM = 20;
        const RULES = 60;                               // total rules run in a test case (vs 30 always-on); see above
        const MEAN = p =&gt; (RULES / 3) * p;              // activation p -&gt; expected calls (a fully-enabled rule peaks at 20)
        const SIGMA_FLOOR = 0.5;                        // keep the near-zero-p spike wider than a grid cell
        const SIGMA = p =&gt; Math.max(Math.sqrt(RULES * (p / 3) * (1 - p / 3)), SIGMA_FLOOR);   // sd of the rule-selection (binomial), floored

        const n = 80;                                   // density grid resolution
        const acc = new Float64Array(n * n);            // accumulated (summed) mixture density, row-major
        let N = 0;
        const g = i =&gt; (i / (n - 1)) * DOM;             // grid index -&gt; data coord

        // One activation config as a separable 2D Gaussian, given as its per-axis
        // column weights (each a proper pdf, so every config carries equal *mass*).
        // Used both to accumulate into `acc` and to draw the just-added component alone.
        function component() {
            const p1 = Math.random(), p2 = Math.random();
            const mx = MEAN(p1), my = MEAN(p2);
            const sx = SIGMA(p1), sy = SIGMA(p2);
            const colX = new Float64Array(n), colY = new Float64Array(n);
            for (let i = 0; i &lt; n; i++) {
                colX[i] = Math.exp(-0.5 * ((g(i) - mx) / sx) ** 2) / (sx * Math.sqrt(2 * Math.PI));
                colY[i] = Math.exp(-0.5 * ((g(i) - my) / sy) ** 2) / (sy * Math.sqrt(2 * Math.PI));
            }
            return { colX, colY };
        }

        function addSample() {
            const { colX, colY } = component();
            for (let j = 0; j &lt; n; j++)
                for (let i = 0; i &lt; n; i++) acc[j * n + i] += colX[i] * colY[j];
            N++;
            return { colX, colY };                      // hand the new component back so it can be flashed
        }

        // square data panel flanked by top (x) and right (y) marginal strips, matching
        // the other joint figures. PL/PR/PT/PB are the panel edges (PT is the top — the
        // smaller pixel y); the strips hug the panel along its top and right.
        const width = 460, height = 420;
        const margin = { top: 12, right: 12, bottom: 48, left: 52 };
        const marginal = 48;                            // thickness of each marginal strip
        let PL = margin.left, PR = width - margin.right - marginal;
        let PT = margin.top + marginal, PB = height - margin.bottom;
        const side = Math.min(PR - PL, PB - PT);        // square panel (both domains are [0,20]), centred in the slack
        PL += (PR - PL - side) / 2; PR = PL + side;
        PT += (PB - PT - side) / 2; PB = PT + side;
        const X = d3.scaleLinear().domain([0, DOM]).range([PL, PR]);
        const Y = d3.scaleLinear().domain([0, DOM]).range([PB, PT]);
        const TOP = { base: PT, lo: PT - marginal };    // x-marginal strip, above the panel
        const RIGHT = { base: PR, hi: PR + marginal };  // y-marginal strip, right of the panel
        const levels = 6;

        // Filled contour bands of an n×n density grid, drawn into `group` with the
        // shared rust .swarm__contour style (faint outer → dense inner), thresholds
        // relative to the grid's own peak.
        function drawContours(group, vals) {
            const max = d3.max(vals) || 1;
            const contours = d3.contours().size([n, n])
                .thresholds(d3.range(1, levels + 1).map(k =&gt; (k / (levels + 1)) * max))(vals);
            const project = d3.geoTransform({ point(cx, cy) { this.stream.point(X(g(cx)), Y(g(cy))); } });
            group.selectAll('path').data(contours).join('path')
                .attr('class', 'swarm__contour')
                .attr('fill-opacity', (d, i) =&gt; 0.10 + 0.50 * i / (levels - 1))
                .attr('d', d3.geoPath(project));
        }

        // The two marginal strips: each axis's density collapsed from a grid (a Riemann
        // sum over the other variable), scaled to its own peak so the curve fills the
        // strip — shape is what matters as N grows. Same rust .density / .density__line
        // style as the other joint figures.
        function topMarginal(group, pts) {
            const h = d3.scaleLinear().domain([0, d3.max(pts, d =&gt; d[1]) || 1]).range([TOP.base, TOP.lo]);
            group.append('path').attr('class', 'density')
                .attr('d', d3.area().curve(d3.curveBasis).x(d =&gt; X(d[0])).y0(TOP.base).y1(d =&gt; h(d[1]))(pts));
            group.append('path').attr('class', 'density__line')
                .attr('d', d3.line().curve(d3.curveBasis).x(d =&gt; X(d[0])).y(d =&gt; h(d[1]))(pts));
        }
        function rightMarginal(group, pts) {
            const w = d3.scaleLinear().domain([0, d3.max(pts, d =&gt; d[1]) || 1]).range([RIGHT.base, RIGHT.hi]);
            group.append('path').attr('class', 'density')
                .attr('d', d3.area().curve(d3.curveBasis).y(d =&gt; Y(d[0])).x0(RIGHT.base).x1(d =&gt; w(d[1]))(pts));
            group.append('path').attr('class', 'density__line')
                .attr('d', d3.line().curve(d3.curveBasis).y(d =&gt; Y(d[0])).x(d =&gt; w(d[1]))(pts));
        }

        // A whole density — centre contours + both marginals — drawn into one group.
        // Shared by the running accumulation and the single-component flash, so both
        // read identically (centre and margins always agree).
        function drawDensity(group, vals) {
            drawContours(group.append('g'), vals);
            const mx = d3.range(n).map(i =&gt; { let s = 0; for (let j = 0; j &lt; n; j++) s += vals[j * n + i]; return [g(i), s]; });
            const my = d3.range(n).map(j =&gt; { let s = 0; for (let i = 0; i &lt; n; i++) s += vals[j * n + i]; return [g(j), s]; });
            topMarginal(group.append('g'), mx);
            rightMarginal(group.append('g'), my);
        }

        function render() {
            const svg = d3.create('svg').attr('viewBox', [0, 0, width, height]).attr('class', 'figure__chart');

            svg.append('g').attr('class', 'grid').attr('transform', `translate(${PL},0)`)
                .call(d3.axisLeft(Y).ticks(5).tickSize(-(PR - PL)).tickFormat(''));
            svg.append('g').attr('class', 'grid').attr('transform', `translate(0,${PB})`)
                .call(d3.axisBottom(X).ticks(5).tickSize(-(PB - PT)).tickFormat(''));

            if (N &gt; 0) drawDensity(svg.append('g'), Array.from(acc));   // the running sum: contours + marginals

            svg.append('g').attr('class', 'axis').attr('transform', `translate(0,${PB})`)
                .call(d3.axisBottom(X).ticks(5).tickSizeOuter(0));
            svg.append('g').attr('class', 'axis').attr('transform', `translate(${PL},0)`)
                .call(d3.axisLeft(Y).ticks(5).tickSizeOuter(0));
            svg.append('text').attr('class', 'axis-label')
                .attr('x', (PL + PR) / 2).attr('y', height - 8).text('# calls to push');
            svg.append('text').attr('class', 'axis-label')
                .attr('transform', `translate(18,${(PT + PB) / 2}) rotate(-90)`).text('# calls to pop');

            // the under-explored band, with the optimize bug (trapezoidal wedge) the
            // continuous-p swarm now floods into — same regions as the figure above
            swarmUnexploredRegion(svg, X, Y);
            swarmOptimizeBugRegion(svg, X, Y);
            return svg.node();
        }

        const FLASH_MS = 600;     // fade-out duration of a single flashed component
        const FLASH_STAGGER_MS = 0;   // delay between successive flashes in a multi-add cascade

        // Flash the just-added config as its own independent density — centre contours
        // and both marginals, rendered exactly like the accumulation (it's already
        // folded into the running sum, so this briefly doubles its region) — then faded
        // out by opacity alone, so you watch the new component settle into the total
        // without any motion. `delay` staggers a cascade when several are added at once.
        function flashComponent(svgNode, c, delay = 0) {
            const vals = new Array(n * n);
            for (let j = 0; j &lt; n; j++)
                for (let i = 0; i &lt; n; i++) vals[j * n + i] = c.colX[i] * c.colY[j];
            const grp = d3.select(svgNode).append('g');
            drawDensity(grp, vals);
            grp.style('opacity', 1).transition().delay(delay).duration(FLASH_MS).ease(d3.easeQuadIn)
                .style('opacity', 0)
                .on('end', function () { d3.select(this).remove(); });
        }

        // chart + controls, wired to re-render on each click
        const wrap = document.createElement('div');
        let node = render();
        wrap.appendChild(node);

        const count = document.createElement('div');
        count.className = 'swarm-build__count';
        function refresh(flashes) {
            const next = render();
            wrap.replaceChild(next, node);
            node = next;
            count.textContent = `${N} test case${N === 1 ? '' : 's'}`;
            (flashes || []).forEach((c, i) =&gt; flashComponent(next, c, i * FLASH_STAGGER_MS));   // staggered cascade for multi-adds
        }
        function button(label, onClick) {
            const b = document.createElement('div');
            b.className = 'swarm-build__btn';
            b.setAttribute('role', 'button');
            b.setAttribute('tabindex', '0');
            b.textContent = label;
            b.addEventListener('click', onClick);
            b.addEventListener('keydown', e =&gt; { if (e.key === 'Enter' || e.key === ' ') { e.preventDefault(); onClick(); } });
            return b;
        }
        const controls = document.createElement('div');
        controls.className = 'swarm-build__controls';
        controls.appendChild(button('×1', () =&gt; refresh([addSample()])));
        controls.appendChild(button('×5', () =&gt; refresh(d3.range(5).map(() =&gt; addSample()))));
        controls.appendChild(button('×100', () =&gt; { for (let i = 0; i &lt; 100; i++) addSample(); refresh(); }));
        controls.appendChild(button('×1000', () =&gt; { for (let i = 0; i &lt; 1000; i++) addSample(); refresh(); }));
        const controlsBreak = document.createElement('div');   // wraps count + reset to a new line on mobile
        controlsBreak.className = 'swarm-build__break';
        controls.appendChild(controlsBreak);
        controls.appendChild(count);   // sample count sits before reset, setting reset apart
        controls.appendChild(button('reset', () =&gt; { acc.fill(0); N = 0; refresh(); }));
        wrap.appendChild(controls);
        count.textContent = `${N} test cases`;

        return wrap;
    });
&lt;/script&gt;

&lt;p&gt;This testing strategy will easily find both the new &lt;code&gt;optimize&lt;/code&gt; bug, and our original &lt;code&gt;push&lt;/code&gt; / &lt;code&gt;pop&lt;/code&gt; bug. I view it as a straightforward improvement on swarm testing.&lt;/p&gt;
&lt;h2 id="one-step-further"&gt;One step further&lt;/h2&gt;
&lt;p&gt;I'll conclude with a teaser. Above, I said the activation probabilities are sampled from a uniform distribution on &lt;span class="arithmatex"&gt;\([0, 1]\)&lt;/span&gt;. Let's consider a program with more features; say, 10. Now suppose this program has a bug only in some particular configuration of relative feature probabilities. For example, that some set of three features are half as common as some other set of three. Uniformly sampling the activation probabilities is very unlikely to produce this configuration, and so we will miss this bug.&lt;/p&gt;
&lt;p&gt;We want a distribution of activation probabilities that is likely to produce this configuration. Not only that, we want a distribution of activation probabilities that is also likely to produce any other possible bug-inducing configuration: feature A half as likely as B half as likely as C; A ten times as likely as all other features; feature probabilities distributed according to some power law; and many others besides.&lt;/p&gt;
&lt;p&gt;This implies the distribution of activation probabilities &lt;em&gt;should itself be randomly sampled from the space of distributions&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;It's swarms all the way down.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Thanks to &lt;a href="https://zhd.dev/"&gt;Zac&lt;/a&gt; for bouncing swarm testing ideas around with me.&lt;/em&gt;&lt;/p&gt;</content><category term="articles"/><category term="coding"/></entry><entry><title>My agent management software</title><link href="https://tybug.dev/plait" rel="alternate"/><published>2026-04-29T00:00:00-04:00</published><updated>2026-04-29T00:00:00-04:00</updated><author><name/></author><id>tag:tybug.dev,2026-04-29:/plait</id><summary type="html">&lt;p&gt;I write a lot of code. Or rather, I &lt;em&gt;used&lt;/em&gt; to write a lot of code. After Claude Opus ~4.5, it's now more accurate to say that I review and design a lot of code.&lt;/p&gt;
&lt;p&gt;Around the release of Opus 4.5 was also when I started working on …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I write a lot of code. Or rather, I &lt;em&gt;used&lt;/em&gt; to write a lot of code. After Claude Opus ~4.5, it's now more accurate to say that I review and design a lot of code.&lt;/p&gt;
&lt;p&gt;Around the release of Opus 4.5 was also when I started working on &lt;a href="https://hegel.dev/"&gt;Hegel&lt;/a&gt;. As a greenfield project spanning multiple repositories, my work on Hegel surfaced pain points I don't normally encounter when working on Hypothesis or other projects, such as managing the frequent small PRs and merge conflicts that come with a young, active codebase.&lt;/p&gt;
&lt;p&gt;Thanks to some combination of these two factors, I found myself settling on a wishlist for tooling around my development flow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I now context switch—a lot. I'm writing a feature spec one moment, bouncing design ideas off an agent the next, before getting pulled away to review a third agent's work. All while waiting for a long-running research or implementation agent in the background. I need something that manages my various task states, so I always feel that I can walk away and come back later.&lt;/li&gt;
&lt;li&gt;Coordinating a change across multiple repositories requires a context switch to manage their branches, PRs, and GitHub interlinks. It shouldn't have to. I want to say what I want once, across all repositories, and let the agents get the git details right.&lt;/li&gt;
&lt;li&gt;I never want to manually resolve a merge conflict again. The agents are here. We have the technology.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And, well—seeing as coding agents have made personalized tooling cheap (but &lt;em&gt;not&lt;/em&gt; free, despite some claims to the contrary!), I figured I'd spend a week building exactly such a tool.&lt;/p&gt;
&lt;h1 id="plait"&gt;Plait&lt;/h1&gt;
&lt;p&gt;Here's Plait, my agent management software&lt;sup id="fnref:1"&gt;&lt;a class="footnote-ref" href="#fn:1"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;span class="sidenote"&gt;&lt;sup&gt;1&lt;/sup&gt; Heavily vibecoded, but not entirely. I gave detailed guidance on all the UI and the actual semantics, and on several gritty technical decisions.&amp;#160;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;img alt="" src="/images/plait_homepage.png"&gt;&lt;/p&gt;
&lt;p&gt;The unit of work in a repository is a &lt;em&gt;worktop&lt;/em&gt;. Each worktop has a git worktree, and has a nullable 1:1 correspondence with a pull request. That is, you can think of a worktop as scoped to the same unit of work as a PR, but which may or may not have an associated PR yet.&lt;/p&gt;
&lt;p&gt;A worktop can contain multiple Claude sessions:&lt;/p&gt;
&lt;p&gt;&lt;img alt="" src="/images/plait_worktop.png"&gt;&lt;/p&gt;
&lt;p&gt;Claude sessions are standard &lt;code&gt;claude&lt;/code&gt; processes. Claude code persists sessions on disk automatically, which Plait resumes on demand with &lt;code&gt;claude --resume &amp;lt;session_id&amp;gt;&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;My most used workflow is to open a new worktop and talk with its session, eventually telling it to PR its changes. Many worktops only need this single session. Others, especially more involved features, benefit from the advanced context management you get with multiple sessions.&lt;/p&gt;
&lt;p&gt;In the background, every 5 minutes, Plait kicks off a daemon process. This daemon checks for state changes in any worktops with associated pull requests. Is there a merge conflict? Has the CI turned from green to red? Are there new PR comments or reactions?&lt;/p&gt;
&lt;p&gt;If so, the daemon starts a &lt;em&gt;tend&lt;/em&gt; session. This is a Claude session with instructions to resolve the merge conflict, fix the CI if caused by our changes, and resolve any comments addressed towards it. Tend sessions are saved for each worktop if I need to inspect them later.&lt;sup id="fnref:2"&gt;&lt;a class="footnote-ref" href="#fn:2"&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;span class="sidenote"&gt;&lt;sup&gt;2&lt;/sup&gt; Useful for debugging why a tend session didn't respect some part of its system prompt, for example.&amp;#160;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Finally, Plait has a higher order notion called a &lt;em&gt;slate&lt;/em&gt;. A slate orchestrates multiple worktops, potentially across repositories.&lt;/p&gt;
&lt;p&gt;&lt;img alt="" src="/images/plait_slate.png"&gt;&lt;/p&gt;
&lt;p&gt;I start a slate whenever a change touches more than one repository. I talk with the slate's session until I'm confident it has enough context to spawn sessions whose instructions I won't need to immediately revise. The slate then creates the appropriate worktops, spawning a session in each with instructions to implement its portion of the feature.&lt;/p&gt;
&lt;p&gt;From here, I have two options. I can either dip down to a specific worktop to manually manage its sessions. Or, if I realize I need to make a cross-repo adjustment, I can tell that to the slate session, and have it spawn and manage the worktop sessions for me.&lt;/p&gt;
&lt;p&gt;As an escape hatch to the underlying tools, I can always click &lt;code&gt;VS Code&lt;/code&gt; on a worktop to open a VS Code window at that worktree. And I can click &lt;code&gt;VS Code&lt;/code&gt; on a Claude session to open the same, additionally with a terminal window opened to that Claude session.&lt;sup id="fnref:3"&gt;&lt;a class="footnote-ref" href="#fn:3"&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;span class="sidenote"&gt;&lt;sup&gt;3&lt;/sup&gt; I don't need these often, but when I do, I &lt;em&gt;really&lt;/em&gt; need them.&amp;#160;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/Liam-DeVoe/plait"&gt;Plait is open-source here&lt;/a&gt;. I make no guarantees of support or stability. In fact, I almost guarantee it &lt;em&gt;won't&lt;/em&gt; work for you!&lt;/p&gt;
&lt;p&gt;To be clear, I fully expect Plait to be obsolete within 12 months. Either because one of the AI labs releases an AI-native GitHub that I feel is as good or better than Plait, or because the AI labs have made substantially more than just this workflow obsolete. For now, I'm enjoying it!&lt;/p&gt;</content><category term="articles"/><category term="coding"/></entry><entry><title>Property-based testing is about to rule the (software) world</title><link href="https://tybug.dev/specs" rel="alternate"/><published>2026-02-11T00:00:00-05:00</published><updated>2026-02-11T00:00:00-05:00</updated><author><name/></author><id>tag:tybug.dev,2026-02-11:/specs</id><summary type="html">&lt;p&gt;&lt;em&gt;And what can we do to prepare?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Many people have strong opinions about the next few years of AI progress. Regardless of yours, I claim that (1) the models will continue to improve for at least another 6 months; and (2) even if that stopped &lt;em&gt;today&lt;/em&gt;, Opus 4.6-tier models …&lt;/p&gt;</summary><content type="html">&lt;p&gt;&lt;em&gt;And what can we do to prepare?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Many people have strong opinions about the next few years of AI progress. Regardless of yours, I claim that (1) the models will continue to improve for at least another 6 months; and (2) even if that stopped &lt;em&gt;today&lt;/em&gt;, Opus 4.6-tier models are already powerful enough to dramatically change how many developers write software.&lt;/p&gt;
&lt;p&gt;I characterize this change as "AI code is treated as a black box". AI-pilled programmers care only about the observable outcome of code, not the implementation. In other words: the only thing that matters anymore &lt;em&gt;is the guarantees on the box&lt;/em&gt;. When I ask the black-box z3 solver for a satisfying assignment, I don't care how it got there, only that the result is a valid SAT formula.&lt;/p&gt;
&lt;p&gt;If we are to embrace AI code as an industry, we will and must adopt better ways to place guarantees on these black boxes. And I think property-based testing will quickly emerge as the forerunner.&lt;sup id="fnref:2"&gt;&lt;a class="footnote-ref" href="#fn:2"&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;span class="sidenote"&gt;&lt;sup&gt;1&lt;/sup&gt; At least until we can autonomously formally verify code according to the theorem statement "this code has no bugs". I expect this to be many years away even at current model progress rates.&amp;#160;&lt;/span&gt;&lt;sup id="fnref:3"&gt;&lt;a class="footnote-ref" href="#fn:3"&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;span class="sidenote"&gt;&lt;sup&gt;2&lt;/sup&gt; Or fuzzing, if you prefer that framing. I largely see fuzzing and PBT as two views on the identical problem, and think it's unfortunate we don't have more communication between these two worlds.&amp;#160;&lt;/span&gt;&lt;/p&gt;
&lt;h1 id="property-based-testing"&gt;Property-based testing&lt;/h1&gt;
&lt;p&gt;I have always been surprised at how under-adopted property-based testing is. Do companies not care about testing? Is it not mentioned enough in university curriculums? (Yes, but I digress). Has PBT just not permeated the cultural zeitgeist?&lt;/p&gt;
&lt;p&gt;It doesn't really matter. AI is about to provide the forcing function for PBT to become a developer household name. Or, to put it another way: &lt;em&gt;PBT is about to get a lot more users&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;And yet, the PBT ecosystem is underprepared for this influx. In Python, I maintain &lt;a href="https://github.com/hypothesisWorks/hypothesis"&gt;Hypothesis&lt;/a&gt;, which I have no qualms in claiming as the most successful PBT library of all time.&lt;sup id="fnref:1"&gt;&lt;a class="footnote-ref" href="#fn:1"&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;span class="sidenote"&gt;&lt;sup&gt;3&lt;/sup&gt; See &lt;a href="https://hypothesis.readthedocs.io/en/latest/usage.html"&gt;https://hypothesis.readthedocs.io/en/latest/usage.html&lt;/a&gt;. For example, &lt;a href="https://lp.jetbrains.com/python-developers-survey-2024/"&gt;4% of 2024 PSF survey respondents report using Hypothesis&lt;/a&gt;.&amp;#160;&lt;/span&gt; Python might well weather this storm.&lt;/p&gt;
&lt;p&gt;But as much as I love Python, it comprises a small percentage of production code. What about other languages? Most do have a PBT library. And, to be clear, many years of development effort have gone into them. But I think even their maintainers will acknowledge most other libraries don't match the breadth and depth of Hypothesis:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://drmaciver.github.io/papers/reduction-via-generation-preview.pdf"&gt;Internal shrinking&lt;/a&gt;, which is &lt;a href="https://github.com/jlink/shrinking-challenge"&gt;consistently world-class&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hypothesis.readthedocs.io/en/latest/extensions.html#alternative-backends"&gt;Pluggable backends&lt;/a&gt;, including &lt;a href="https://github.com/pschanely/hypothesis-crosshair"&gt;z3 integration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hypothesis.readthedocs.io/en/latest/reference/integrations.html#observability"&gt;Observability&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hypofuzz.com/"&gt;Coverage-guided fuzzing integration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/HypothesisWorks/hypothesis/issues/3921"&gt;A powerful internal test case representation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hypothesis.readthedocs.io/en/latest/stateful.html"&gt;Stateful testing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hypothesis.readthedocs.io/en/latest/tutorial/replaying-failures.html"&gt;Test case database&lt;/a&gt;, for regressions&lt;/li&gt;
&lt;li&gt;Test case deduplication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My point is not to glorify Hypothesis. Even after 11 years of development, there is always more to improve. Rather, the demand for PBT is about to explode, and I don't think any language is prepared for it—maybe not even Python.&lt;/p&gt;
&lt;p&gt;My concrete call to action: as a PBT ecosystem, we need to figure out how to share improvements among all libraries, to consolidate and amplify the best of our development effort. I am not the first to say this, but it has never been more true than today. The open &lt;a href="https://hypothesis.readthedocs.io/en/latest/reference/integrations.html#observability"&gt;PBT observability spec&lt;/a&gt; is designed for any language and is a step in this direction.&lt;/p&gt;
&lt;p&gt;What else can we standardize? Shrinking? The database? The choice sequence? How can we take the best parts of &lt;em&gt;every&lt;/em&gt; library and combine them into one, in preparation for the PBT renaissance?&lt;/p&gt;
&lt;p&gt;If you maintain a PBT library and want to collaborate with Hypothesis on this, &lt;a href="mailto:orionldevoe@gmail.com"&gt;reach out&lt;/a&gt;.&lt;/p&gt;</content><category term="articles"/><category term="coding"/></entry><entry><title>Homebrew catan</title><link href="https://tybug.dev/homebrew-catan" rel="alternate"/><published>2024-08-27T00:00:00-04:00</published><updated>2024-08-27T00:00:00-04:00</updated><author><name/></author><id>tag:tybug.dev,2024-08-27:/homebrew-catan</id><summary type="html">&lt;p&gt;My family's board game of choice is Catan. We've probably played close to 50 games of it in my lifetime. We've experimented with some small homebrew rules before, and more recently I saw &lt;a href="https://robert.ocallahan.org/2024/06/real-time-settlers.html"&gt;real-time Catan&lt;/a&gt;, which we played two games of. Even after two games it was clear to us …&lt;/p&gt;</summary><content type="html">&lt;p&gt;My family's board game of choice is Catan. We've probably played close to 50 games of it in my lifetime. We've experimented with some small homebrew rules before, and more recently I saw &lt;a href="https://robert.ocallahan.org/2024/06/real-time-settlers.html"&gt;real-time Catan&lt;/a&gt;, which we played two games of. Even after two games it was clear to us that real-time Catan is an enormous improvement, and I doubt we'll ever go back to regular Catan again.&lt;/p&gt;
&lt;p&gt;That said, we did find we needed to tweak the rules. Here's our full homebrew ruleset, building off cities and knights + seafarers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Turns have a set time limit. We generally start with 45 seconds a turn, and increase to 60 seconds later in the game if it's clear people need more time for more complex turns.&lt;/li&gt;
&lt;li&gt;You may take any action on anybody's turn, including trading with anyone else.&lt;/li&gt;
&lt;li&gt;The only exception to this is progress cards, which must be played on your turn.&lt;/li&gt;
&lt;li&gt;When a player takes an action that requires a response from another player (e.g. master merchant), pause the timer for all players.&lt;/li&gt;
&lt;li&gt;When a player reaches 13 victory points, the game does not end immediately. Instead there is an (indefinite, but reasonable) rebuttal period for the remainder of the turn where players continue to play.&lt;/li&gt;
&lt;li&gt;If a player still has 13 VPs at the end of the turn, they win.&lt;/li&gt;
&lt;li&gt;If two players are tied for VPs at the end of the turn, play continues until one player is ahead at the end of a turn.&lt;/li&gt;
&lt;li&gt;If any actions conflict, ties are broken by turn order, with the person who's turn it is having priority, and so on continuing clockwise.&lt;/li&gt;
&lt;li&gt;You may declare any progress card you own as tradeable by placing it face up in front of you.&lt;/li&gt;
&lt;li&gt;You can barter with other players using tradeable progress cards as you would any other resource.&lt;/li&gt;
&lt;li&gt;They are still progress cards in every respect. They count toward your progress card limit, they can be stolen by the spy, and you can still play them.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All other rules that interact with turns are still in play: you cannot play a progress card on the same turn you recieve it, the player who rolls a 7 moves the robber, etc. The purpose of the rebuttal period is to deter players from waiting until the last second to reach 13 victory points. And the purpose of not immediately ending the game when a player "wins" is to avoid a mad rush to reach 13 victory points before anyone else on a turn! Requiring progress cards to be played on your turn is both to nerf them, as we found they were otherwise too powerful, and to reduce the potential for conflicting actions.&lt;/p&gt;
&lt;p&gt;In my opinion, breaking ties by turn order is more elegant than casually deciding each case at the table, as the original post described. We found conflicting actions to be a large problem – they only happened ~once a game, but could turn the course of the game (such as a wedding played right as someone builds a settlement).&lt;/p&gt;
&lt;p&gt;While we're on the topic of homebrews, we've long been searching for a way to make the green commodity's ability in cities and knights less powerful, but haven't found anything thematically satisfying while not nerfing it into the ground.&lt;/p&gt;
&lt;p&gt;Thanks to Robert O'Callahan for describing the original idea!&lt;/p&gt;</content><category term="articles"/><category term="board games"/></entry><entry><title>Gödel's incompleteness theorem</title><link href="https://tybug.dev/incompleteness" rel="alternate"/><published>2022-03-16T00:00:00-04:00</published><updated>2022-03-16T00:00:00-04:00</updated><author><name/></author><id>tag:tybug.dev,2022-03-16:/incompleteness</id><summary type="html">&lt;p&gt;Ah, Gödel's incompleteness theorem. I won't say it's the most misused theorem in all of mathematics, but I would argue it has the worst ratio of "people who actually understand it" to "people who misapply it".&lt;/p&gt;
&lt;p&gt;Here it is:&lt;/p&gt;
&lt;div class="quote"&gt;For any sufficiently strong theory $T$, there is a sentence $\sigma …&lt;/div&gt;</summary><content type="html">&lt;p&gt;Ah, Gödel's incompleteness theorem. I won't say it's the most misused theorem in all of mathematics, but I would argue it has the worst ratio of "people who actually understand it" to "people who misapply it".&lt;/p&gt;
&lt;p&gt;Here it is:&lt;/p&gt;
&lt;div class="quote"&gt;For any sufficiently strong theory $T$, there is a sentence $\sigma$ which is independent of $T$.&lt;/div&gt;

&lt;p&gt;Before we can unpack it, you need a crash course in model theory. This will be a little bit painful, but I promise it's critically important.&lt;/p&gt;
&lt;h1 id="sentences-and-theories"&gt;Sentences and Theories&lt;/h1&gt;
&lt;p&gt;Here are the axioms of group theory, which you'll find at the beginning of any standard textbook:&lt;/p&gt;
&lt;div class="arithmatex"&gt;\[
\begin{equation}
\begin{aligned}
&amp;amp; \sigma_0: \forall a \forall b \forall c \ (c*(a*b) = (c*a)*b) \\
&amp;amp; \sigma_1: \forall a \ (e*a = a) \\
&amp;amp; \sigma_2: \forall a \exists b \ (a*b = e) \\
\end{aligned}
\end{equation}
\]&lt;/div&gt;
&lt;p&gt;(If you're not familiar with group theory, don't worry; the actual content of these axioms is largely irrelevant for us. They state that &lt;span class="arithmatex"&gt;\(*\)&lt;/span&gt; is associative, has an identity &lt;span class="arithmatex"&gt;\(e\)&lt;/span&gt;, and every element has an inverse respectively).&lt;/p&gt;
&lt;p&gt;What your group theory textbook probably didn't tell you is that there's a language &lt;span class="arithmatex"&gt;\(L_{group}\)&lt;/span&gt; associated with group theory. This is the set of special symbols we'd like to be able to refer to in our sentences: &lt;span class="arithmatex"&gt;\(L_{group} = \{e, *\}\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;The three axioms above are examples of sentences. A &lt;strong&gt;sentence&lt;/strong&gt; is a statement in first order logic which contains only logical symbols (&lt;span class="arithmatex"&gt;\(\lnot\)&lt;/span&gt;, &lt;span class="arithmatex"&gt;\(\land\)&lt;/span&gt;, &lt;span class="arithmatex"&gt;\(\lor\)&lt;/span&gt;, &lt;span class="arithmatex"&gt;\(\implies\)&lt;/span&gt;, &lt;span class="arithmatex"&gt;\(\iff\)&lt;/span&gt;, &lt;span class="arithmatex"&gt;\(\forall\)&lt;/span&gt;, &lt;span class="arithmatex"&gt;\(\exists\)&lt;/span&gt;), or symbols from our language &lt;span class="arithmatex"&gt;\(L\)&lt;/span&gt;. For instance, the following is not an &lt;span class="arithmatex"&gt;\(L_{group}\)&lt;/span&gt;-sentence:&lt;/p&gt;
&lt;div class="arithmatex"&gt;\[\forall a \ (a + a = a)\]&lt;/div&gt;
&lt;p&gt;because &lt;span class="arithmatex"&gt;\(+\)&lt;/span&gt; isn't in &lt;span class="arithmatex"&gt;\(L_{group}\)&lt;/span&gt;. Note that whether a statement is a sentence or not depends on the language, which is why we say &lt;span class="arithmatex"&gt;\(L_{group}\)&lt;/span&gt;-sentence instead of just sentence. When the language is clear from context, we'll drop the &lt;span class="arithmatex"&gt;\(L\)&lt;/span&gt;- prefix and call it a sentence.&lt;/p&gt;
&lt;p&gt;We can bundle these axioms together into an object called a &lt;strong&gt;theory&lt;/strong&gt;. &lt;span class="arithmatex"&gt;\(T_{group} = \{\sigma_0, \sigma_1, \sigma_2\}\)&lt;/span&gt; is the theory of groups. A theory is any set of sentences.&lt;sup id="fnref:1"&gt;&lt;a class="footnote-ref" href="#fn:1"&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;span class="sidenote"&gt;&lt;sup&gt;1&lt;/sup&gt; Since theories consist of sentences, and sentences depend on a language, you would be right to suspect that theories also depend on a language. Formally we call a theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; an &lt;span class="arithmatex"&gt;\(L\)&lt;/span&gt;-theory, where L is the language of the theory. We again drop the &lt;span class="arithmatex"&gt;\(L\)&lt;/span&gt;- prefix when the langauge is clear from context.&amp;#160;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Here's the incompleteness theorem again:&lt;/p&gt;
&lt;div class="quote"&gt;For any sufficiently strong theory $T$, there is a sentence $\sigma$ which is independent of $T$.&lt;/div&gt;

&lt;p&gt;We've defined &lt;strong&gt;sentence&lt;/strong&gt; and &lt;strong&gt;theory&lt;/strong&gt;. Let's tackle &lt;strong&gt;independent&lt;/strong&gt; next.&lt;/p&gt;
&lt;h1 id="models"&gt;Models&lt;/h1&gt;
&lt;p&gt;To discuss independence of sentences, we first need to talk about models. We say that &lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; is a model of a theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; if &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is true in &lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; for all &lt;span class="arithmatex"&gt;\(\sigma \in T\)&lt;/span&gt;.&lt;sup id="fnref:2"&gt;&lt;a class="footnote-ref" href="#fn:2"&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;span class="sidenote"&gt;&lt;sup&gt;2&lt;/sup&gt; If &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is true in &lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt;, you'll see this written in the literature as &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash \sigma\)&lt;/span&gt;. However, you'll see very shortly that this is an overloading of the &lt;span class="arithmatex"&gt;\(\vDash\)&lt;/span&gt; operator; its meaning changes depending on if the right hand side is a sentence &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; or a theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt;. I've avoided writing &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash \sigma\)&lt;/span&gt; here for clarity, but it is the more precise usage.&amp;#160;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Don't be scared by the notation. If you wanted to check whether something is a group or not, what do you do? You check that it satisfies all the axioms of being a group. That's all this definition is stating. Saying "&lt;span class="arithmatex"&gt;\((\mathbb{Z}, +)\)&lt;/span&gt; is a group" is equivalent to saying "&lt;span class="arithmatex"&gt;\((\mathbb{Z}, +)\)&lt;/span&gt; models &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt;". And if &lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; models &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt;, we write &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash T\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;We need just one more definition. Let &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; be any theory. Then &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; is &lt;strong&gt;complete&lt;/strong&gt; if, for all sentences &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; and for all models &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash T\)&lt;/span&gt; and &lt;span class="arithmatex"&gt;\(\mathcal{B} \vDash T\)&lt;/span&gt;, &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is true in &lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; iff &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is true in &lt;span class="arithmatex"&gt;\(\mathcal{B}\)&lt;/span&gt;. In other words, "every model of &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; agrees on the truth value of every sentence".&lt;/p&gt;
&lt;p&gt;A natural question is whether &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt; is complete. Can you think of a sentence &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; which is true in some group &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash T_{group}\)&lt;/span&gt; but false in another group &lt;span class="arithmatex"&gt;\(\mathcal{B} \vDash T_{group}\)&lt;/span&gt;? Hint: the answer is yes, there are several such sentences, and they aren't that complicated. Try and think of one now before you read on, if you like.&lt;/p&gt;
&lt;p&gt;(pause...) if you said "&lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; is abelian" (ie &lt;span class="arithmatex"&gt;\(*\)&lt;/span&gt; commutes in &lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt;), you're correct! I also would have accepted "&lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; has an element of order n" for some n. Here's a sentence that is true in &lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; iff &lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; is abelian:&lt;/p&gt;
&lt;div class="arithmatex"&gt;\[
\sigma_{abelian} = \forall a \forall b \ (a*b = b*a)
\]&lt;/div&gt;
&lt;p&gt;To see that &lt;span class="arithmatex"&gt;\(\sigma_{abelian}\)&lt;/span&gt; proves that &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt; is not complete, pick your favorite abelian group, say &lt;span class="arithmatex"&gt;\((\mathbb{Z}, +)\)&lt;/span&gt;, and your favorite non-abelian group, say &lt;span class="arithmatex"&gt;\(GL(2, \mathbb{R})\)&lt;/span&gt;. &lt;span class="arithmatex"&gt;\(\sigma_{abelian}\)&lt;/span&gt; is true in &lt;span class="arithmatex"&gt;\((\mathbb{Z}, +)\)&lt;/span&gt; and false in &lt;span class="arithmatex"&gt;\(GL(2, \mathbb{R})\)&lt;/span&gt;, since addition commutes and matrix multiplication does not. But both &lt;span class="arithmatex"&gt;\((\mathbb{Z}, +)\)&lt;/span&gt; and &lt;span class="arithmatex"&gt;\(GL(2, \mathbb{R})\)&lt;/span&gt; are models of &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt; — after all, they're both groups and thus satisfy the three axioms of &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt;. So &lt;span class="arithmatex"&gt;\(\sigma_{abelian}\)&lt;/span&gt; is true in &lt;span class="arithmatex"&gt;\((\mathbb{Z}, +) \vDash T_{group}\)&lt;/span&gt; and false in &lt;span class="arithmatex"&gt;\(GL(2, \mathbb{R}) \vDash T_{group}\)&lt;/span&gt;, so &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt; is not complete.&lt;/p&gt;
&lt;h1 id="independence"&gt;Independence&lt;/h1&gt;
&lt;p&gt;What about independence? Let &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; be any theory and &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; be any sentence. Then &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is &lt;strong&gt;independent&lt;/strong&gt; of &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; if there are two models &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash T\)&lt;/span&gt; and &lt;span class="arithmatex"&gt;\(\mathcal{B} \vDash T\)&lt;/span&gt; such that &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is true in &lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; and false in &lt;span class="arithmatex"&gt;\(\mathcal{B}\)&lt;/span&gt;. In the example above, &lt;span class="arithmatex"&gt;\(\sigma_{abelian}\)&lt;/span&gt; is independent of &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt;. A corollary is that a theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; is not complete iff there is some sentence &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; which is independent of &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt;. I'll do the proof explicitly below, but it's nothing more than unpacking the respective definitions.&lt;/p&gt;
&lt;div class="quote"&gt;
$\implies$ Let $T$ be not complete. So there is some sentence $\sigma$, some model $\mathcal{A} \vDash T$ and $\mathcal{B} \vDash T$, such that either $\sigma$ is true in $\mathcal{A}$ and false in $\mathcal{B}$, or false in $\mathcal{A}$ and true in $\mathcal{B}$. In either case, $\sigma$ is independent.
&lt;/div&gt;

&lt;div class="quote"&gt;
$\impliedby$ Let $\sigma$ be a sentence independent of $T$. Then there are $\mathcal{A} \vDash T$ and $\mathcal{B} \vDash T$ such that $\sigma$ is true in $\mathcal{A}$ and false in $\mathcal{B}$. So $T$ is not complete. $\blacksquare$
&lt;/div&gt;

&lt;p&gt;If a theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; is "not complete", we call &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; incomplete.&lt;/p&gt;
&lt;p&gt;Let's take a closer look at the incompleteness theorem, as stated:&lt;/p&gt;
&lt;div class="quote"&gt;For any sufficiently strong theory $T$, there is a sentence $\sigma$ which is independent of $T$.&lt;/div&gt;

&lt;p&gt;We just proved that &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; is incomplete iff there is a sentence &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; which is independent of &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt;. So we can restate the theorem as:&lt;/p&gt;
&lt;div class="quote"&gt;Any sufficiently strong theory $T$ is incomplete.&lt;/div&gt;

&lt;p&gt;This is where the "incompleteness" portion of the theorem's name comes from. Although these statements are equivalent, I'll continue to use the first, longer version, since I feel it's more intuitive (as it doesn't require you to unpack the definition of &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; being incomplete).&lt;/p&gt;
&lt;p&gt;Before I make our final definition of &lt;strong&gt;sufficiently strong&lt;/strong&gt;, I want to take a detour into euclidean geometry as a final example to round out our discussion of theories and models.&lt;/p&gt;
&lt;h1 id="euclidean-geometry"&gt;Euclidean geometry&lt;/h1&gt;
&lt;p&gt;Euclidean geometry is another example of a theory. It contains five axioms and three "undefined terms": point, line, and plane are undefined and are referenced in the axioms without definition. Does that sound familiar? We did the exact same thing in groups, using the "undefined terms" &lt;span class="arithmatex"&gt;\(*\)&lt;/span&gt; and &lt;span class="arithmatex"&gt;\(e\)&lt;/span&gt; in our axioms, and defining them to be part of our language &lt;span class="arithmatex"&gt;\(L_{group}\)&lt;/span&gt;. It turns out that the notion of a language has always been hiding in euclidean geometry. The language of euclidean geometry is just &lt;span class="arithmatex"&gt;\(L_{euclid} = \{\text{point}, \text{line}, \text{plane}\}\)&lt;/span&gt;. I'll call &lt;span class="arithmatex"&gt;\(T_{EG}\)&lt;/span&gt; the theory of euclidean geometry, which is the set of the five axioms of euclidean geometry.&lt;/p&gt;
&lt;p&gt;You'll notice that I'm not giving a precise mathematic definition of the axioms, but that's because Euclid himself didn't really give precise mathematical definitions either. Euclidean geometry can in fact be made precise (see &lt;a href="https://en.wikipedia.org/wiki/Tarski%27s_axioms"&gt;Tarski's Axioms&lt;/a&gt;), and everything I say below will still hold, but I'll avoid deviating too much from the euclidean geometry described in Euclid's Elements.&lt;/p&gt;
&lt;p&gt;You may also know of the particularly contentious parallel postulate (PP), the fifth axiom of euclidean geometry. Some people thought that the parallel postulate could be proven from the rest of the axioms, and gave the name "neutral geometry" to the set of axioms of euclidean gemoetry without PP. I'll call the theory of neutral geometry &lt;span class="arithmatex"&gt;\(T_{NG} = T_{EG} \setminus \{PP\}\)&lt;/span&gt;. They then showed PP could not be proven from the rest of the axioms by constructucting two models of &lt;span class="arithmatex"&gt;\(T_{NG}\)&lt;/span&gt;: one in which PP was true (a model of euclidean geometry) and one in which PP is false (a model of elliptical geometry).&lt;/p&gt;
&lt;p&gt;Once they had shown PP could not be proven from neutral geometry, they called PP independent of neutral geometry. Does this term "independent" sound familiar? It should — we defined &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; to be independent of &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; if there are &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash T\)&lt;/span&gt;, &lt;span class="arithmatex"&gt;\(\mathcal{B} \vDash T\)&lt;/span&gt; where &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is true in &lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; and false in &lt;span class="arithmatex"&gt;\(\mathcal{B}\)&lt;/span&gt;. Here, &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is the parallel postulate, &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; is neutral geometry, &lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; is a model of euclidean geometry, and &lt;span class="arithmatex"&gt;\(\mathcal{B}\)&lt;/span&gt; is a model of elliptical geometry. In general, proving that an axiom &lt;span class="arithmatex"&gt;\(\sigma \in T\)&lt;/span&gt; is "independent" of (cannot be proven from) the other axioms of &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; is equivalent to proving that &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is independent of &lt;span class="arithmatex"&gt;\(T \setminus \{\sigma\}\)&lt;/span&gt;, in the formal sense of independence described above.&lt;/p&gt;
&lt;p&gt;Because PP is independent of &lt;span class="arithmatex"&gt;\(T_{NG}\)&lt;/span&gt;, &lt;span class="arithmatex"&gt;\(T_{NG}\)&lt;/span&gt; is incomplete. However, it turns out that &lt;span class="arithmatex"&gt;\(T_{EG} = T_{NG} \cup \{\text{PP}\}\)&lt;/span&gt; is complete, so by taking PP as a new axiom we've created a complete theory.&lt;sup id="fnref:3"&gt;&lt;a class="footnote-ref" href="#fn:3"&gt;3&lt;/a&gt;&lt;/sup&gt;&lt;span class="sidenote"&gt;&lt;sup&gt;3&lt;/sup&gt; Proving that euclidean geometry is complete requires a more formal axiomatization than what Euclid gave, and so we turn to &lt;a href="https://en.wikipedia.org/wiki/Tarski%27s_axioms"&gt;Tarski's Axioms&lt;/a&gt; (sometimes called elementary euclidean geometry) instead. This is outside the scope of this post, but Tarski proved that his theory was complete by showing that it admits quantifier elimination. Completeness follows from this since the language has no constants, which means the only sentences without quantifiers are &lt;span class="arithmatex"&gt;\(\top\)&lt;/span&gt; and &lt;span class="arithmatex"&gt;\(\bot\)&lt;/span&gt;, which are true and false in every model respectively.&amp;#160;&lt;/span&gt; We'll discuss this concept of "completing" a theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; by adding new axioms again later, and whether this can save us from the consequences of the incompleteness theorem. Spoiler: it can't.&lt;/p&gt;
&lt;h1 id="sufficiently-strong"&gt;Sufficiently strong&lt;/h1&gt;
&lt;p&gt;I've left the simplest – or at least, easiest to informally explain – for last. A theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; is &lt;strong&gt;sufficiently strong&lt;/strong&gt; if it contains the natural numbers, addition on the natural numbers, and multiplication on the natural numbers (or contains objects isomorphic to them). More formally, &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; is sufficiently strong if it contains &lt;a href="https://en.wikipedia.org/wiki/Robinson_arithmetic"&gt;Robinson arithmetic&lt;/a&gt;, called &lt;span class="arithmatex"&gt;\(Q\)&lt;/span&gt;. If you're familiar with peano arithmetic, &lt;span class="arithmatex"&gt;\(Q\)&lt;/span&gt; is peano arithmetic without induction.&lt;/p&gt;
&lt;p&gt;Understanding &lt;em&gt;why&lt;/em&gt; containing &lt;span class="arithmatex"&gt;\(Q\)&lt;/span&gt; is necessary gets to the heart of the proof of the incompleteness theorem and is a much deeper discussion than we can get into here, so I hope you'll forgive me for not going into any more detail.&lt;/p&gt;
&lt;h1 id="bringing-it-all-together"&gt;Bringing it all together&lt;/h1&gt;
&lt;p&gt;Let's recap:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A sentence &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is a statement in first order logic, potentially containing symbols from some language &lt;span class="arithmatex"&gt;\(L\)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;A theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; is a set of sentences&lt;/li&gt;
&lt;li&gt;&lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; is a model of &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; (written &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash T\)&lt;/span&gt;) if &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is true in &lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; for all &lt;span class="arithmatex"&gt;\(\sigma \in T\)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;A sentence &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is independent of a theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; if there are models &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash T\)&lt;/span&gt;, &lt;span class="arithmatex"&gt;\(\mathcal{B} \vDash T\)&lt;/span&gt; with &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; true in &lt;span class="arithmatex"&gt;\(\mathcal{A}\)&lt;/span&gt; and false in &lt;span class="arithmatex"&gt;\(\mathcal{B}\)&lt;/span&gt;&lt;/li&gt;
&lt;li&gt;A theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; is sufficiently strong if it contains &lt;span class="arithmatex"&gt;\(Q\)&lt;/span&gt;, aka robinson arithmetic (informally, if it contains the natural numbers, addition, and multiplication)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And finally, the incompleteness theorem itself:&lt;/p&gt;
&lt;div class="quote"&gt;For any sufficiently strong theory $T$, there is a sentence $\sigma$ which is independent of $T$.&lt;/div&gt;

&lt;p&gt;Congratulations — you now know everything you need to understand the statement of the incompleteness theorem. If that was your goal, you can walk away here, but I'll discuss the consequences of this theorem next.&lt;/p&gt;
&lt;h1 id="consequences"&gt;Consequences&lt;/h1&gt;
&lt;p&gt;To start, a question: is the converse of the incompleteness theorem true? No. We saw above that &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt; has a sentence &lt;span class="arithmatex"&gt;\(\sigma_{abelian}\)&lt;/span&gt; which is independent of &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt;, but &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt; certainly does not contain &lt;span class="arithmatex"&gt;\(Q\)&lt;/span&gt;. Informally, this means that theories can be incomplete for "other reasons" than the incompleteness theorem (actually, it's quite easy to create incomplete theories; much easier than creating complete ones). The reason why the incompleteness theorem is so important is not because it applies to a large number of theories, but because the theories it does apply to are important ones that we would really prefer to be complete.&lt;/p&gt;
&lt;p&gt;In particular, the incompleteness theorem often comes up adjacent to the foundations of mathematics, with theories like &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt;. Although perhaps not obvious just by looking at the axioms, &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt; can prove the axioms of &lt;span class="arithmatex"&gt;\(Q\)&lt;/span&gt;, and is in fact much, much stronger than it. So &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt; is sufficiently strong and thus subject to the incompleteness theorem, so there is some &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; which is independent of &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt;. In other words, there are theorems (sentences) which we will never be able to prove or disprove from the axioms of &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;This probably doesn't sound too bad. So what? Well, think about your favorite mathematical field (which almost certainly uses &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt; as its mathemtical foundations, unless you're a category theorist). Then think about some famous unsolved conjecture in that field. Most people think there are only two options: either that conjecture is true, or it's false. The incompleteness theorem says there's a third possibility: the conjeture is one of these independent sentences &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt;, and thus can never be proven or disproven in &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;I would say that proving a famous theorem independent is a much worse fate than proving it either true or false. Consider &lt;span class="arithmatex"&gt;\(T_{NG}\)&lt;/span&gt;, the theory of neutral geometry we discussed above, and the theorem under discussion to be the parallel postulate PP, which we know is independent of &lt;span class="arithmatex"&gt;\(T_{NG}\)&lt;/span&gt;. When PP was found to be independent of &lt;span class="arithmatex"&gt;\(T_{NG}\)&lt;/span&gt;, it split the world of euclidean geometry in two. In one camp are the worlds in which PP is true; we call these euclidean geometries, with theory &lt;span class="arithmatex"&gt;\(T_{NG} \cup \{\text{PP}\}\)&lt;/span&gt;. In the other camp are the worlds in which PP is false; we call these non-euclidean geometries, with theory &lt;span class="arithmatex"&gt;\(T_{NG} \cup \{\lnot \text{PP}\}\)&lt;/span&gt;. Because PP is independent of &lt;span class="arithmatex"&gt;\(T_{NG}\)&lt;/span&gt;, both of these worlds are "equally valid". In my opinion, having two possible worlds is worse than knowing for certain which "world" we live in, like we would if PP was not independent of &lt;span class="arithmatex"&gt;\(T_{NG}\)&lt;/span&gt; (and therefore either true or false).&lt;/p&gt;
&lt;p&gt;However, there's a reason why non-euclidean geometries are significantly less studied: most people believe PP is "intuitively true", and study euclidean geometry instead of non-euclidean geometry. This is true of PP, but it's not true of all independent sentences. Sometimes an independent sentence really does fracture a theory into multiple, equally popular camps. In other words, it's not always obvious which "choice" to make (eg whether to add PP or &lt;span class="arithmatex"&gt;\(\lnot\)&lt;/span&gt;PP).&lt;/p&gt;
&lt;p&gt;For instance, in set theory, the &lt;a href="https://en.wikipedia.org/wiki/Continuum_hypothesis"&gt;Continuum Hypothesis&lt;/a&gt; (CH) is the most well known example of a theorem independent of &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt;. When it was proven to be independent, it split the world of set theory in two, just like PP did. But this time it's worse, because there is a large amount of disagreement among set theorists about whether CH is intuitively true. If you tried to get &lt;span class="arithmatex"&gt;\(\text{ZFC} \cup \{\text{CH}\}\)&lt;/span&gt; accepted as the foundation of mathematics (instead of &lt;span class="arithmatex"&gt;\(\text{ZFC} \cup \{\lnot \text{CH}\}\)&lt;/span&gt; or just &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt;), you would get significant pushback from set theorists, beacuse to them, both worlds are equally interesting.&lt;/p&gt;
&lt;p&gt;You might hold out hope that alright, fine, &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt; has some independent sentences, but they're sentences we didn't really care about anyway. This is actually mostly true if you're not a set theorist and don't work with graduate level math! Most sentences independent of &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt; come from set theory, and the rest are complicated statements in other fields, most of which I don't even understand the statement of.&lt;sup id="fnref:4"&gt;&lt;a class="footnote-ref" href="#fn:4"&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;span class="sidenote"&gt;&lt;sup&gt;4&lt;/sup&gt; See &lt;a href="https://en.wikipedia.org/wiki/List_of_statements_independent_of_ZFC"&gt;List of statements independent of ZFC&lt;/a&gt;.&amp;#160;&lt;/span&gt; But the incompleteness theorem puts a "cap", so to speak, on &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt; (and thus mathematics): the deeper into a subject you go, the closer and closer you brush up against independent statements. And if you're particularly unlucky, you'll actually run into a theorem in your work which is independent of &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt;, and you'll curse the incompleteness theorem when you do.&lt;/p&gt;
&lt;p&gt;So, sentences being independent of a theory is bad, and all theories which can serve as the fondation of mathematics have independent sentences (because they are, to a tee, sufficiently strong). This is the single most important implication of the incompleteness theorem.&lt;/p&gt;
&lt;h2 id="incompleteness-of-t_group"&gt;Incompleteness of &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;But wait — if a theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; being incomplete is bad, and we proved that &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt; is incomplete above, isn't that bad news for group theorists? Well, it's not &lt;em&gt;good&lt;/em&gt;, but it's also not bad. It's true that &lt;span class="arithmatex"&gt;\(\sigma_{abelian}\)&lt;/span&gt; splits &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt; into two theories: &lt;span class="arithmatex"&gt;\(T_{group} \cup \{\sigma_{abelian}\}\)&lt;/span&gt; and &lt;span class="arithmatex"&gt;\(T_{group} \cup \{\lnot \sigma_{abelian}\}\)&lt;/span&gt;. But these are just the theories of abelian and non-abelian groups respectively. If I had asked you whether studying abelian and non-abelian groups separately bothers you, you would have looked at me like I'm crazy. After all, if you want to prove something about abelian groups, you just assume that &lt;span class="arithmatex"&gt;\(G\)&lt;/span&gt; is abelian (but note that this is identical to working in &lt;span class="arithmatex"&gt;\(T_{group} \cup \{\sigma_{abelian}\}\)&lt;/span&gt;).&lt;/p&gt;
&lt;p&gt;The difference lies in that &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt; is not trying to be a theory of mathematics. You don't particularly care if you can't prove every possible statement for all groups, because if you can't, you can always look at a specific group you care about and prove whether that statement is true in that group or not. This isn't possible in a theory of mathematics.&lt;sup id="fnref:5"&gt;&lt;a class="footnote-ref" href="#fn:5"&gt;5&lt;/a&gt;&lt;/sup&gt;&lt;span class="sidenote"&gt;&lt;sup&gt;5&lt;/sup&gt; This is because any theory of mathematics can't prove that there are any models of that theory, or else the theory would be consistent, which contradicts Gödel's second incompleteness theorem. So there are no "specific models" of a theory of mathematics to look at — in fact, there are no models of a theory of mathematics at all.&amp;#160;&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="completeness-of-t_eg"&gt;Completeness of &lt;span class="arithmatex"&gt;\(T_{EG}\)&lt;/span&gt;&lt;/h2&gt;
&lt;p&gt;But wait — we said earlier that the theory of euclidean geometry, &lt;span class="arithmatex"&gt;\(T_{EG}\)&lt;/span&gt;, was complete. Does this contradict the incompleteness theorem? No, because &lt;span class="arithmatex"&gt;\(T_{EG}\)&lt;/span&gt; is not "sufficiently strong". There are a number of interesting theories which are complete, like &lt;span class="arithmatex"&gt;\(T_{EG}\)&lt;/span&gt;, but aren't strong enough to be subject to the incompleteness theorem.&lt;/p&gt;
&lt;h2 id="completing-a-theory"&gt;"Completing" a theory&lt;/h2&gt;
&lt;p&gt;Recall that we saw &lt;span class="arithmatex"&gt;\(T_{NG}\)&lt;/span&gt; (neutral geometry), which is incomplete, could be extended to a complete theory &lt;span class="arithmatex"&gt;\(T_{EG}\)&lt;/span&gt; (euclidean geometry) by adding the parallel postulate. We say that &lt;span class="arithmatex"&gt;\(T_{EG}\)&lt;/span&gt; is a "completion" of &lt;span class="arithmatex"&gt;\(T_{NG}\)&lt;/span&gt;, that &lt;span class="arithmatex"&gt;\(T_{NG}\)&lt;/span&gt; can be "completed" by adding PP, etc.&lt;/p&gt;
&lt;p&gt;You might wonder if we could pull the same trick for theories affected by the incompleteness theorem. Given some sufficiently strong theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt;, the incompleteness theorem says there is some &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; independent of &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt;. Could we complete &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; by adding either &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; or &lt;span class="arithmatex"&gt;\(\lnot \sigma\)&lt;/span&gt; to &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; as an axiom? The answer is no, regardless of which we choose. Adding an axiom to a theory never makes that theory weaker (ie prove less sentences) — it can only make it stronger. This new theory &lt;span class="arithmatex"&gt;\(T' = T \cup \{\sigma\}\)&lt;/span&gt; would still be sufficiently strong and thus satisfy the incompleteness theorem, so there is some new sentence &lt;span class="arithmatex"&gt;\(\sigma'\)&lt;/span&gt; which is independent of &lt;span class="arithmatex"&gt;\(T'\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;So no matter how many independent sentences we add as axioms to a sufficiently strong theory, it will still be sufficiently strong and subject to the incompleteness theorem. A sufficiently strong theory can never be "completed".&lt;/p&gt;
&lt;h1 id="independent-sentences-are-true"&gt;Independent sentences are "true"&lt;/h1&gt;
&lt;p&gt;This misunderstanding (I'm tempted to say "abuse") is the singular reason I wrote this post, so you'll have to forgive me if I rant a bit here. The single most common misuse of the incompleteness theorem is stating that the independent sentence is somehow "true". Here's a direct quote &lt;a href="https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems"&gt;from wikipedia&lt;/a&gt;:&lt;/p&gt;
&lt;div class="quote"&gt;For any such consistent formal system, there will always be statements about natural numbers that are true, but that are unprovable within the system.&lt;/div&gt;

&lt;p&gt;(They're using "consistent formal system" to be some theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt;, "statements about the natural numbers" to be some sentence &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt;, and "unprovable within the system" to mean "independent of &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt;").&lt;/p&gt;
&lt;p&gt;Except this is wrong. An independent sentence &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is absolutely not "true". It is, &lt;em&gt;by definition&lt;/em&gt;, true in some model &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash T\)&lt;/span&gt; and false in some other model &lt;span class="arithmatex"&gt;\(\mathcal{B} \vDash T\)&lt;/span&gt;, so calling it "true" is nonsense. It's neither true nor false; it's independent.&lt;/p&gt;
&lt;p&gt;What people really mean when they say that an independent sentence &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is true is that it's true in the "standard model", and therefore, they argue, intuitively true. What is the standard model? Nothing more than a particular model &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash T\)&lt;/span&gt; we have arbitrarily chosen as intuitive for visualizing &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt;. For instance, the standard model of euclidean geometry &lt;span class="arithmatex"&gt;\(T_{EG}\)&lt;/span&gt; is the plane &lt;span class="arithmatex"&gt;\(\mathbb{R}^2\)&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;But for other theories, it's not clear at all what the standard model is – say, for &lt;span class="arithmatex"&gt;\(T_{group}\)&lt;/span&gt;. You might suggest $&lt;span class="arithmatex"&gt;\((\mathbb{Z}, +)\)&lt;/span&gt;$, but there's no good reason to choose that group over, say $&lt;span class="arithmatex"&gt;\((\mathbb{Z}_8, +)\)&lt;/span&gt;$, or even $&lt;span class="arithmatex"&gt;\(GL(2, \mathbb{R})\)&lt;/span&gt;$. Here, the concept of a "standard model" breaks down.&lt;/p&gt;
&lt;p&gt;For theories which have a standard model, this line of thinking does have some philosophical merit. I just wish people would say "there is a sentence which cannot be proven from &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; but is true in the standard model", instead of saying "there is a true sentence which cannot be proven", which sounds like a contradiction. This seeming contradiction bothered me for many years when reading about the incompleteness theorem, and I was greatly relieved to eventually learn that people were simply misinterpreting the theorem.&lt;/p&gt;
&lt;h2 id="godels-completeness-theorem"&gt;Gödel's completeness theorem&lt;/h2&gt;
&lt;p&gt;Before his incompleteness theorem, Gödel proved another theorem about the completeness of first order logic. Informally, this theorem says that for all theories &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; and sentences &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt;, if &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is true in every model &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash T\)&lt;/span&gt;, then there is a proof of &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; from the axioms of &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt;. In other words, there is a proof of every true statement.&lt;/p&gt;
&lt;p&gt;The naming of these theorems suggests a contradiction: how can we have both Gödel's completeness theorem and Gödel's incompleteness theorem?&lt;/p&gt;
&lt;p&gt;Well, because they refer to two different notions of completeness. "completeness" in the completeness theorem means that "everything which is true is provable". However, "incompleteness" in the incompleteness theorem means that some theories &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; have sentences which are neither true nor false in &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt;. These independent sentences don't even satisfy the conditions of the completeness theorem (since they're not true in every model), so these two theorems are entirely orthogonal.&lt;/p&gt;
&lt;h1 id="technicalities"&gt;Technicalities&lt;/h1&gt;
&lt;p&gt;I haven't been entirely truthful with you. There are two extra assumptions we need to add before we get the true incompleteness theorem. They deal with what are essentially edge cases – though very important edge cases.&lt;/p&gt;
&lt;h2 id="satisfiable"&gt;Satisfiable&lt;/h2&gt;
&lt;p&gt;First, we require that the theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; be &lt;strong&gt;satisfiable&lt;/strong&gt;. A theory &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; is &lt;strong&gt;satisfiable&lt;/strong&gt; if there is any model &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash T\)&lt;/span&gt; at all. Equivalently, &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; is satisfiable if its axioms are consistent, ie you can't derive a contradiction from them. If we allowed &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; to be unsatisfiable, then the incompleteness theorem would fail in the trivial case: let &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; be any sufficiently strong, unsatisfiable theory. Then there are no models &lt;span class="arithmatex"&gt;\(\mathcal{A} \vDash T\)&lt;/span&gt;, so vacuously, there are no independent sentences &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; (since an independent sentence requires at least two models). But this would contradict the incompleteness theorem.&lt;/p&gt;
&lt;p&gt;Here's our updated incompleteness theorem:&lt;/p&gt;
&lt;div class="quote"&gt;For any sufficiently strong, satisfiable theory $T$, there is a sentence $\sigma$ which is independent of $T$.&lt;/div&gt;

&lt;h2 id="recursively-enumerable"&gt;Recursively enumerable&lt;/h2&gt;
&lt;p&gt;For what are actually pretty technical reasons, we also require &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; to "recursively enumerable". This is equivalent to saying that the elements of &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt; are "computable", ie there is an algorithm which, given any sentence &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt;, returns true if &lt;span class="arithmatex"&gt;\(\sigma \in T\)&lt;/span&gt; and false otherwise. It's not worth getting into the details here, but this basically rules out crazy theories where you just throw in so many axioms that you're eventually able to prove everything in all models. Any "reasonable" theory like &lt;span class="arithmatex"&gt;\(\text{ZFC}\)&lt;/span&gt; or &lt;span class="arithmatex"&gt;\(Q\)&lt;/span&gt; is recursively enumerable.&lt;/p&gt;
&lt;p&gt;You might also see such theories being called "decidable", as in, you can "decide" whether a sentence &lt;span class="arithmatex"&gt;\(\sigma\)&lt;/span&gt; is an element of &lt;span class="arithmatex"&gt;\(T\)&lt;/span&gt;.&lt;sup id="fnref:6"&gt;&lt;a class="footnote-ref" href="#fn:6"&gt;6&lt;/a&gt;&lt;/sup&gt;&lt;span class="sidenote"&gt;&lt;sup&gt;6&lt;/sup&gt; The multitude of names is thanks to computability theory, which proved that several distinct notions of computability (all with their own names) are actually exactly equivalent, and thus the names are interchangeable.&amp;#160;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;So our updated incompleteness theorem is then:&lt;/p&gt;
&lt;div class="quote"&gt;For any sufficiently strong, satisfiable, recursively enumerable theory $T$, there is a sentence $\sigma$ which is independent of $T$.&lt;/div&gt;

&lt;p&gt;You can see why I didn't want to lead with this definition :)&lt;/p&gt;
&lt;p&gt;I promise that I'm not holding anything back anymore — this is the genuine, full incompleteness theorem which Gödel himself proved&lt;sup id="fnref:7"&gt;&lt;a class="footnote-ref" href="#fn:7"&gt;7&lt;/a&gt;&lt;/sup&gt;.&lt;span class="sidenote"&gt;&lt;sup&gt;7&lt;/sup&gt; Ok, fine, you got me: Gödel's original proof required something called &lt;a href="https://en.wikipedia.org/wiki/%CE%A9-consistent_theory"&gt;&lt;span class="arithmatex"&gt;\(\omega\)&lt;/span&gt;-consistency&lt;/a&gt;, a strengthening of consistency. However, it turns out this condition can be weakened to consistency alone, with &lt;a href="https://en.wikipedia.org/wiki/Rosser%27s_trick"&gt;Rosser's trick&lt;/a&gt;.&amp;#160;&lt;/span&gt; These extra assumptions rarely come up in casual discussions, which is why I left them until now to discuss.&lt;/p&gt;
&lt;h1 id="afterword"&gt;Afterword&lt;/h1&gt;
&lt;p&gt;I debated a lot about which examples of theories to use, and in fact originally wrote a draft where I used the theory of real-valued vector spaces. Unfortunately, its axiomatization is quite dirty, and so I dropped it in favor of the theory of groups, even though I think more people would be familiar with vector spaces than groups. Oh well.&lt;/p&gt;
&lt;p&gt;There are also some philosophical implications I wanted to include, but I don't feel qualified to discuss them. I'm also not really convinced how big the philosophical implications of the incompleteness theorem are.&lt;/p&gt;</content><category term="articles"/><category term="math"/></entry></feed>