[{"content":"In December 2024, I published a post arguing that AI coding assistants would transform rather than replace software developers. It was a reasonable piece. It was also, in the way that reasonable pieces often are, a little cowardly.\nEighteen months is enough time to score predictions. The post made five claims worth testing.\nThe \u0026ldquo;can\u0026rsquo;t do\u0026rdquo; list. I wrote that AI models \u0026ldquo;struggle to design scalable microservice architectures or to make security-critical design decisions.\u0026rdquo; That claim has a shelf life of roughly twelve months, and the sell-by date has passed. Claude Code, Cursor, and Windsurf are now doing substantial architectural work — not perfectly, not without supervision, but not \u0026ldquo;struggling\u0026rdquo; either. The honest update: AI moved faster up the capability stack than I implied. The list of things AI demonstrably cannot do has shrunk, and will keep shrinking.\nThe \u0026ldquo;AI-augmented developer\u0026rdquo; framing. This one held. Developers haven\u0026rsquo;t been replaced; they\u0026rsquo;ve changed shape. But here is the uncomfortable part: the framing was already the mainstream consensus in 2024. I wasn\u0026rsquo;t saying something the crowd hadn\u0026rsquo;t agreed to. I was summarising the median position and dressing it in slightly better sentences. Vindicated, and also irrelevant — which is a particular kind of failure.\n\u0026ldquo;AGI would be needed to fully replace programmers.\u0026rdquo; Still probably true. Still also a hedge disguised as an argument. By the time we have AGI, the question of whether programmers specifically are replaced will be uninteresting. I put this in the piece because it closed a rhetorical gap; it doesn\u0026rsquo;t do analytic work.\nThe autopilot analogy. I wrote: \u0026ldquo;even in an age of autopilot, pilots still exist — their role has evolved rather than disappeared.\u0026rdquo; This analogy was stale when I wrote it. Every \u0026ldquo;AI won\u0026rsquo;t steal your job\u0026rdquo; piece in 2023 and 2024 ended with pilots. It was borrowed reassurance, and I should have caught it. The irony is that the analogy, used carefully, actually points somewhere more interesting than I took it. A modern pilot\u0026rsquo;s job is not flying the plane — the autopilot does that. The pilot\u0026rsquo;s job is monitoring the autopilot, understanding its failure modes, and knowing exactly when to grab the controls. That is a harder cognitive task than flying, not an easier one. I gestured at the transformation but didn\u0026rsquo;t press on what the transformation demands.\nThe skills to develop. I said: \u0026ldquo;system architecture, business domain expertise, and cross-functional collaboration.\u0026rdquo; Still true. Still incomplete. There is a fourth skill I didn\u0026rsquo;t name, and it is the one that actually differentiates working developers in 2026.\nHere it is: verification.\nThe question in 2024 was whether AI could write the code. That question is answered — yes, increasingly, for a widening range of tasks. The question in 2026 is whether you can verify what it wrote. Not \u0026ldquo;does it compile?\u0026rdquo; and not \u0026ldquo;do the existing tests pass?\u0026rdquo; — both of those are table stakes the model checks itself. The question is: does it do what you specified? And the harder version: did you specify clearly enough to know?\nThis is not a comfortable shift. Writing code is a generative task, and humans are tolerably good at generating things. Verification is an adversarial task — you are trying to find the flaw in something that was constructed to look correct. The model is fluent, confident, and wrong in ways that are structurally difficult to detect without the domain knowledge you were supposedly able to outsource.\nThe architectural skill I described in 2024 is real. But architecture is upstream. Verification is the daily work now — the thing you do after the model has produced something that mostly makes sense and you have to decide whether \u0026ldquo;mostly\u0026rdquo; is good enough to ship.\nI was more navigator than pilot in 2024. In 2026 I am more inspector than navigator, and the inspection is the hard part.\nOne honest limit to this argument: verification has always been part of the job. Code review exists. QA exists. What has changed is the ratio: generation is so fast, and so voluminous, that the verification burden has grown disproportionately. If you ship at the speed the model can generate, you will ship mistakes at the same speed. The constraint is not cycles; it is attention.\nThe post I wrote in 2024 got the direction right and got the pace wrong. It implied a gradual evolution. What happened was an abrupt one — not in the sense that everything changed overnight, but in the sense that the threshold shifted faster than the comfortable \u0026ldquo;transformation not replacement\u0026rdquo; framing anticipated.\nThe question was never whether AI could write the code. The question was always whether you could read it.\n","permalink":"https://blog.randomdomain.co.za/posts/2026/05/the-bottleneck-moved/","summary":"\u003cp\u003eIn December 2024, I published \u003ca href=\"/posts/2024/12/ai-coding-assistants-rise/\"\u003ea post\u003c/a\u003e arguing that AI coding assistants would transform rather than replace software developers.\nIt was a reasonable piece.\nIt was also, in the way that reasonable pieces often are, a little cowardly.\u003c/p\u003e","title":"The bottleneck moved"},{"content":"This project was built entirely in Claude Code with the Cloudflare MCP server — not a line of code was written by hand. Given the low stakes — a novelty site about South African politics — the brief was simple: make it fun, make it fast, and see how far you get in a morning. To write this post, I handed Claude the project chat transcripts and asked it to pull out the parts worth explaining. What follows is the result.\nThere are 396 people sitting in the South African National Assembly. Most South Africans couldn\u0026rsquo;t name more than ten. So I built a site that shows you two of them, side by side, and asks the only question that matters: who would you rather have as president?\nThe mechanic is simple: head-to-head matchups. Pick one, get the next pair. With enough votes across a big enough sample, you collapse all those pairwise preferences into a single ranked list — a total ordering. The corpus started as all 396 MPs, then grew to include a handful of non-MP politicians worth judging.\nIt\u0026rsquo;s at whoshouldbepresident.org. Tap a face, get the next pair, repeat. There\u0026rsquo;s a leaderboard. The whole thing runs on Cloudflare and took a morning.\nThis is how it came together — and the three or four interesting fights along the way.\nThe constraint: all-Cloudflare, planned through an MCP I started with a constraint, not a stack search. The very first prompt I sent Claude was:\nI want to host this website on CloudFlare, all in JavaScript, on cloudflare workers, backed by a cloudflare database. Please figure out what the simplest configuration is, and then lets build a local version, so we can test it and get ready for a cloudflare key etc.\nTwo things in that sentence matter. First, all Cloudflare — Workers for compute, D1 for the database, no third-party services to wire up, one bill that\u0026rsquo;s probably zero. Second, \u0026ldquo;figure out what the simplest configuration is\u0026rdquo; — meaning you go read the docs, I don\u0026rsquo;t want to.\nThe \u0026ldquo;go read the docs\u0026rdquo; bit was the interesting one. Before that prompt I\u0026rsquo;d installed the Cloudflare Docs MCP — a Model Context Protocol server that gives Claude direct, tool-call access to Cloudflare\u0026rsquo;s own documentation. So instead of dredging up half-remembered Wrangler trivia, the model could query the real docs and ground its architectural choices in current pages, version-correct flags, current limits.\nAcross the project there were exactly four search_cloudflare_documentation calls. They are, in order:\nWorkers D1 database local development wrangler setup Workers R2 static assets images serve Workers static assets limits files size upload wrangler rate limiting binding configuration wrangler.toml not unsafe The first three landed in the first five minutes. From them, the shape of the system fell out almost immediately:\nWorkers for the API and HTML — single src/index.js handling routes D1 (Cloudflare\u0026rsquo;s SQLite) for the ratings table — two tables, members and ratings Workers Static Assets for the 396 headshot JPGs — total 52 MB, well inside the static assets limit R2 was investigated and rejected, because static assets are simpler and the photos are tiny That fourth query, about rate limiting, came hours later when the anti-abuse work started — more on that in a minute. But the headline is: the MCP let me skip the usual \u0026ldquo;what does Cloudflare even call this thing\u0026rdquo; tax. I didn\u0026rsquo;t have to debate KV-vs-D1 in my head; the docs answered it. By the end of the architecture session there was a working wrangler.jsonc, a schema, and a seed script — and the model had never written a line of pre-2024 Wrangler config from memory.\nIf you\u0026rsquo;re building anything on a cloud provider with an MCP-backed doc set, this is the workflow. Use it. The cost is the price of one tool call per question; the saving is not shipping last-year\u0026rsquo;s API surface.\nScraping parliament: 21 minutes, three problems, one nasty WAF Where do you get a list of all 396 National Assembly members? The parliament\u0026rsquo;s own site, of course: parliament.gov.za/group-details?chamber=2. (chamber=2 is the NA; chamber=3 is the NCOP, which I didn\u0026rsquo;t want.)\nI\u0026rsquo;d love to tell you this took an afternoon. But time-stamps on the chat transcripts say it took 21 minutes end to end, from first prompt to a working scrape.py returning 396 of 396 members. The reason it took that long and not five minutes is the three problems in a row.\nProblem 1: the empty shell I did the obvious thing first — curl. The response was a navigation skeleton with zero member data. Predictable in retrospect: the site is an AngularJS single-page app, and the member list is injected into the DOM after Angular boots. A plain HTTP fetch never sees the data.\nClaude\u0026rsquo;s first-take diagnosis from the transcript:\n\u0026ldquo;The site is JavaScript-rendered. Let me use the Chrome DevTools tools to inspect the actual page and find the underlying API.\u0026rdquo;\nThere was a brief detour chasing a phantom API. The browser\u0026rsquo;s network tab surfaced a PHP route at /themes/parliament/assets/ajax/foldertree.php — but it responded 200 OK with content-length: 0 and a request body of dir=%2Fmnt%2Fweb%2Fhtml%2Fundefined. The literal string undefined in a server path. Some long-departed contractor\u0026rsquo;s bug, faithfully calling itself every page load. Not an API.\nProblem 2: the AngularJS scope hack The breakthrough came from poking around in the live page. AngularJS attaches its data to DOM elements via scopes, and the whole member dataset turned out to be sitting on the .tabs-content element:\nconst scope = angular.element(document.querySelector(\u0026#39;.tabs-content\u0026#39;)).scope(); scope.members // { \u0026#39;a-d\u0026#39;: [...], \u0026#39;e-g\u0026#39;: [...], ..., \u0026#39;x-z\u0026#39;: [...] } scope.memberCount // \u0026#34;396\u0026#34; Seven alphabetical buckets, one object per member. No pagination, no API to reverse-engineer, no rate limiting to dance around. One page.evaluate() call from Playwright grabs the lot.\nEach member object had everything I needed:\n{ \u0026#34;id\u0026#34;: 6067, \u0026#34;full_name\u0026#34;: \u0026#34;Zelna Saira Abader\u0026#34;, \u0026#34;party\u0026#34;: \u0026#34;MK\u0026#34;, \u0026#34;province\u0026#34;: \u0026#34;Gauteng\u0026#34;, \u0026#34;national\u0026#34;: 0, \u0026#34;profile_pic_url\u0026#34;: \u0026#34;/storage/app/media/MemberImages/6067.jpg\u0026#34; } So the plan was: launch a real browser with Playwright, wait for Angular to settle, reach into the scope, pull everything, then download the JPGs from parliament.gov.za/storage/app/media/MemberImages/{id}.jpg. Easy.\nProblem 3: \u0026ldquo;Attack ID: 20000051\u0026rdquo; It was not easy. The first Playwright run came back not with member data but with a block page — \u0026ldquo;Web Page Blocked! Attack ID: 20000051\u0026rdquo;. The site sits behind a WAF (Imperva/Incapsula, by the look of the block-page signature) that detects stock headless browsers by fingerprint. It was happy to talk to my real Chrome during the DevTools poking; it slammed the door on Playwright the moment it tried.\nStandard headless-detection fingerprints. The fix is the well-known stealth dance:\nbrowser = playwright.chromium.launch( headless=True, args=[\u0026#34;--disable-blink-features=AutomationControlled\u0026#34;], ) context = await browser.new_context() await context.add_init_script(\u0026#34;\u0026#34;\u0026#34; Object.defineProperty(navigator, \u0026#39;webdriver\u0026#39;, { get: () =\u0026gt; undefined }); Object.defineProperty(navigator, \u0026#39;plugins\u0026#39;, { get: () =\u0026gt; [1, 2, 3, 4, 5] }); \u0026#34;\u0026#34;\u0026#34;) That flag disables Chromium\u0026rsquo;s automation banner; the init script patches navigator.webdriver (the giveaway property the WAF was checking) and stuffs a fake plugins array. With those in place, the next run rendered 142 members. The run after that, all 396.\nThe whole arc — empty shell → API red herring → scope discovery → WAF block → bypass → 396/396 — fits in the transcript inside about a fifteen-minute window. The scraper hasn\u0026rsquo;t needed to change since. I can run uv run scrape.py every couple of months; it diffs the current list against the local one, downloads new MPs\u0026rsquo; photos, deletes departed ones, and rewrites members.json.\nThe vote endpoint was wide open With the data scraped, a Worker stood up, and a leaderboard rendering, I had a working site. I also had a glaring security hole.\nThe first version of the API was naive in the most boring way. POST /api/vote accepted a JSON body of { winner_id, loser_id } and incremented wins on the first and losses on the second. There was nothing tying that POST to a pair the server had actually offered. A single fetch() loop in any browser tab could send arbitrary winner/loser combos and run a candidate\u0026rsquo;s score to infinity in a minute.\nA few hours in I sent the prompt I should have sent up front:\nPlease review the current implementation of our ranking website. We want to make sure that … once it is deployed, we don\u0026rsquo;t have people gaming the system, or spamming it with bots.\nThe diagnosis came back blunt:\n\u0026ldquo;The vote endpoint is effectively wide open — there\u0026rsquo;s nothing tying a vote to a human, a real session, or even a pair the server actually offered.\u0026rdquo;\nI sat down and thought about it for a few minutes. The threat model is:\nBots spamming votes — easy if there\u0026rsquo;s no rate limit. Bots crafting arbitrary pairs — easy if /api/vote doesn\u0026rsquo;t check that the (winner, loser) pair was ever offered by /api/pair. Cross-origin abuse — easy if CORS is *. Each one has a cheap fix. I ended up with three:\nLayer 1: HMAC-signed pair tokens When the client asks /api/pair, the server picks two random members and returns them — plus a token. The token is a base64url-encoded JSON payload:\n{ w: 6067, // winner candidate id l: 4521, // loser candidate id n: nonce, // 16 random bytes, hex e: 1747... // expiry, unix seconds } …concatenated with an HMAC-SHA256 signature over the payload, keyed by a HMAC_SECRET stored as a Wrangler secret. On /api/vote, the worker:\nSplits the token into payload.sig Recomputes the HMAC and constant-time-compares Decodes the payload, checks the expiry Checks that the body\u0026rsquo;s winner_id/loser_id are exactly the w/l from the token (in either order — the user can pick whichever side they want) If anything fails, return 400. Now the only valid votes are ones the server itself just handed out, and they\u0026rsquo;re only valid for a short window. A bot can\u0026rsquo;t craft pairs; it can at best replay the one the server gave it.\nLayer 2: rate limit Cloudflare\u0026rsquo;s RATE_LIMITER binding does this in two lines of wrangler.jsonc and one await env.RATE_LIMITER.limit({ key: ip }) in the handler. I tuned the cap a few times. First pass was 10/minute, which I almost immediately bumped:\n\u0026ldquo;That changes the rate limit to be permissive for enthusiastic humans (one vote every 2 seconds is still faster than a person naturally clicks).\u0026rdquo;\nIt\u0026rsquo;s at 120/minute now. The point isn\u0026rsquo;t to stop bots dead — they can rotate IPs — it\u0026rsquo;s to make a single-IP attack uneconomic.\nLayer 3: CORS lock-down ALLOWED_ORIGIN was set to the production domain. No more *. POST cross-origin now eats a CORS error.\nWhat I cut: Turnstile I\u0026rsquo;d originally planned a fourth layer: Cloudflare Turnstile, the invisible-Captcha widget, verified server-side on every vote. I built the docs for it before I built the code, looked at the resulting flow, and ripped it back out. The HMAC pair-token plus a rate limit is enough for a site whose worst-case outcome is \u0026ldquo;DA\u0026rsquo;s leaderboard score is slightly inflated.\u0026rdquo; Turnstile is real friction on every vote for a benefit that\u0026rsquo;s mostly theatre at this scale. YAGNI.\nToken expiry, the UX kicker The token has a 60-second expiry. That number is the result of one piece of real-world pain. I left a tab open for ten minutes, came back, voted, got \u0026ldquo;Vote failed — please try again.\u0026rdquo; Tried again. Still failed. The token was dead.\nThe fix wasn\u0026rsquo;t to extend the expiry — long-lived signed tokens are worse, not better, for replay. The fix was to make the client treat any vote error as \u0026ldquo;fetch a new pair silently.\u0026rdquo; If the token\u0026rsquo;s stale, the user sees the next two MPs without ever seeing the failure. The server logs the failure regardless, so I can still tell from wrangler tail whether a single IP is repeatedly probing with bad tokens versus benign expired-tab traffic.\nWilson score, not Elo The first version of the leaderboard used Elo. It looked sophisticated; it was actually wrong for this problem.\nThe issue with Elo for a \u0026ldquo;rate things from a corpus of 396\u0026rdquo; task is that it\u0026rsquo;s designed for opponents who play many games. Most MPs in this dataset would have one or two head-to-head matchups before anyone looked at the leaderboard. Elo with two games of data is essentially random.\nI asked Claude about it directly:\n\u0026ldquo;I see you\u0026rsquo;re using Elo to rank members. Why do we use that instead of say a beta distribution with up and down votes?\u0026rdquo;\nThe honest answer was that Elo was the first thing it reached for. The Bayesian Beta posterior, or the Wilson lower bound, is the right tool for \u0026ldquo;is this success rate real or just noise from a small sample.\u0026rdquo; For 396 MPs with sparse vote counts, Wilson lower bound at 95% confidence is the answer:\nfunction wilsonScore(wins, total, z = 1.96) { if (total === 0) return 0; const p = wins / total; const z2 = z * z; return ( (p + z2 / (2 * total) - z * Math.sqrt((p * (1 - p) + z2 / (4 * total)) / total)) / (1 + z2 / total) ); } Why I like it for this:\nA member with zero wins scores 0 regardless of how many losses, so they sink to the bottom without an arbitrary \u0026ldquo;minimum votes\u0026rdquo; threshold. A member with 10 wins, 0 losses scores around 0.72 — better than a member with 1-and-0 (who scores 0.21), because we trust the larger sample more. It degrades gracefully. Nobody dominates from a single lucky win. The leaderboard renders — instead of 0% for members who haven\u0026rsquo;t won a matchup yet — a Wilson score of zero is technically correct, but \u0026ldquo;0%\u0026rdquo; reads as a value judgement when it\u0026rsquo;s really \u0026ldquo;we don\u0026rsquo;t have data yet.\u0026rdquo;\nThere\u0026rsquo;s also a bias fix in /api/pair. The naive version is \u0026ldquo;pick two random members.\u0026rdquo; Random pairing means popular candidates show up too often and obscure ones never. The version that ships picks 5 candidates at random, then takes the two with the fewest total matchups (wins + losses), then shuffles which side each is shown on. That keeps head-to-heads spread evenly across all 396 instead of clumping around whoever\u0026rsquo;s already at the top.\nWhat got tuned, what got cut A few smaller things worth mentioning:\nStatic assets, not R2. The 396 JPGs total 52 MB — well inside Workers Static Assets limits, no R2 needed. The MCP query \u0026ldquo;Workers static assets limits files size upload\u0026rdquo; was the one that settled this. Symlinks across the worker boundary. worker/public/images is a symlink to the repo\u0026rsquo;s ../../images directory. Wrangler follows the symlink in production. One source of truth, no copy step. Worth checking the symlink isn\u0026rsquo;t accidentally an empty directory after a fresh clone — mine was, once. TDD as a retrofit. The project\u0026rsquo;s CLAUDE.md now says \u0026ldquo;New features must be built with TDD.\u0026rdquo; I added that line after the MVP shipped, having noticed how many regressions a small test suite would have caught. The right time to write that line was at the start; the second-best time is when you\u0026rsquo;re closing the barn door. Deploy SHA in the version tag. I twice told myself I\u0026rsquo;d shipped a fix only to find the old code still serving. Now npm run deploy injects the current git SHA into the page as a \u0026lt;meta\u0026gt; tag, and the worker exposes it at /version. If the SHA in the browser doesn\u0026rsquo;t match the SHA in git, I haven\u0026rsquo;t deployed. Closing It runs on Cloudflare\u0026rsquo;s free tier — Workers, D1, Static Assets, the rate-limiter binding, all in. The data refresh is one command; the deploy is one command; the schema reset is one command.\nThe interesting question isn\u0026rsquo;t can you build this in a morning — of course you can, it\u0026rsquo;s a CRUD app over 396 rows. The interesting question is where did the time actually go. Most of it wasn\u0026rsquo;t the scraper or the Worker. It was the anti-abuse layer (designing it, then walking back the Turnstile bit), the leaderboard math, the bias fix, the UX polish, and the contrast pass to hit Lighthouse 100/100.\nIf I were doing it again, the lesson I\u0026rsquo;d write on a sticky note is: build the threat model before you build the leaderboard. A leaderboard implies a vote endpoint, and a vote endpoint is going to get hit by something dumb the moment it\u0026rsquo;s public. The pair-token + rate-limit pattern is two evenings of work if you do it last, and forty minutes if you do it first. The difference is in how confident you can be about the numbers you publish.\nVote on a few MPs while you\u0026rsquo;re here. Some of the matchups are genuinely hard.\nOne last thing: this post was assembled the same way the site was built. I handed Claude the chat transcripts from the project and asked it to read back through them and pull out what was actually interesting. The structure above — the four technical problems worth writing about — came out of that pass. The same tool that wrote the code also read the conversations about writing the code, and decided what was worth your time.\n","permalink":"https://blog.randomdomain.co.za/posts/2026/05/pick-a-president/","summary":"\u003cp\u003e\u003cem\u003eThis project was built entirely in Claude Code with the Cloudflare MCP server — not a line of code was written by hand. Given the low stakes — a novelty site about South African politics — the brief was simple: make it fun, make it fast, and see how far you get in a morning. To write this post, I handed Claude the project chat transcripts and asked it to pull out the parts worth explaining. What follows is the result.\u003c/em\u003e\u003c/p\u003e","title":"Building \"Who Should Be President?\" in a morning, on Cloudflare"},{"content":"My home fibre drops. Not constantly — but regularly enough that I\u0026rsquo;ve developed a habit of opening the router admin page when Netflix stalls, just to confirm what I already suspect.\nThe TP-Link ER605 I use is a decent small-business router: dual WAN, PPPoE on WAN1 for fibre, DHCP on WAN2 for a mobile backup. When the fibre goes down the router fails over silently. The backup connection works. Everything sort of continues — browsing is slower, calls drop, but nothing catastrophically fails.\nThe problem is I don\u0026rsquo;t know it happened until much later, and I can\u0026rsquo;t easily tell how long it was down for. The only real evidence is a log line buried in the router\u0026rsquo;s syslog:\n2 2026-02-23 12:35:14 PPPoE Client WARNING [7C-F1-7E-74-E5-52]: PPPoE \u0026lt;account\u0026gt;@isp.co failed to connect to the server because sending PADI times out. I wanted to fix this. I wanted an email the moment the fibre dropped, hourly reminders while it was still down, and a \u0026ldquo;connection restored\u0026rdquo; email with total downtime included. Could I write a script that polled the router?\nThis is the story of using Claude Code to figure out how — starting from nothing and ending, 35 minutes later, with working code.\nThere is no API The ER605 is part of TP-Link\u0026rsquo;s Omada ecosystem — the same lineup as the EAP access points and Omada SDN Controller software. If you run an Omada Controller, you get a proper REST API and alerting options. I\u0026rsquo;m not running a controller. The ER605 sits on its own in standalone mode, and in standalone mode you get a web UI and nothing else.\nTP-Link publishes no API documentation for the standalone web UI. There is a Python library on PyPI — tplinkrouterc6u — that claims to support TP-Link routers. We tried it first.\nFirst attempt: just use the library Claude Code installed tplinkrouterc6u and wrote a quick test:\nfrom tplinkrouterc6u import TplinkRouterProvider router = TplinkRouterProvider.get_client(\u0026#39;https://192.168.1.1\u0026#39;, \u0026#39;admin\u0026#39;, \u0026#39;\u0026lt;password\u0026gt;\u0026#39;) router.authorize() The traceback:\nssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: EE certificate key too weak (_ssl.c:1000) The phrase \u0026ldquo;key too weak\u0026rdquo; is the interesting part. A normal self-signed cert failure — expired cert, wrong hostname, untrusted CA — produces \u0026ldquo;self-signed certificate.\u0026rdquo; You can bypass that with verify=False. \u0026ldquo;Key too weak\u0026rdquo; is different: modern OpenSSL, at its default security level 2 (standard in Ubuntu 22.04+), refuses to complete an SSL handshake with any endpoint using less than a 2048-bit RSA key. The ER605\u0026rsquo;s self-signed cert uses a 1024-bit key. verify=False won\u0026rsquo;t help — the handshake won\u0026rsquo;t even start.\nClaude\u0026rsquo;s read at the time: \u0026ldquo;The SSL issue is that the cert key is \u0026rsquo;too weak\u0026rsquo; — not just untrusted. Setting verify=False won\u0026rsquo;t fix this; we need to lower the cipher security level.\u0026rdquo;\nAnd even past SSL, the library was a dead end. tplinkrouterc6u targets consumer TP-Link routers (Archer, Deco series) whose auth flow is completely different from the ER605\u0026rsquo;s. So we had two problems: fix SSL, then figure out the auth from scratch.\nBreaking through the SSL wall The fix is a custom HTTPAdapter for requests that builds its own SSL context with SECLEVEL=0, dropping the key strength requirement entirely:\nclass WeakSSLAdapter(HTTPAdapter): \u0026#34;\u0026#34;\u0026#34;Router uses a 1024-bit RSA cert which modern Python rejects.\u0026#34;\u0026#34;\u0026#34; def init_poolmanager(self, *args, **kwargs): ctx = create_urllib3_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE ctx.set_ciphers(\u0026#34;DEFAULT:@SECLEVEL=0\u0026#34;) kwargs[\u0026#34;ssl_context\u0026#34;] = ctx return super().init_poolmanager(*args, **kwargs) Mount this adapter on a requests.Session and Python will talk to 1024-bit cert endpoints again. It\u0026rsquo;s not great for general internet traffic, but for a local LAN connection to a known device it\u0026rsquo;s fine.\nProbing the web UI With SSL working, the next step was understanding the auth flow. I suggested Claude try the router\u0026rsquo;s login page directly: https://192.168.1.1/webpages/login.html. The HTML loaded. The JS bundles it referenced — /webpages/js/su/data/proxy.js and others — came back as:\n\u0026lt;h1\u0026gt;Not Found\u0026lt;/h1\u0026gt; Strange. The HTML itself loaded fine, but its static dependencies 404\u0026rsquo;d. The fix turned out to be a User-Agent header. The router\u0026rsquo;s embedded web server serves static files to browsers but 404s anything that doesn\u0026rsquo;t look like one. Adding User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit\\537.36 to all requests unlocked the JS files immediately. It\u0026rsquo;s presumably some kind of anti-scraping heuristic baked into the Luci config, though it\u0026rsquo;s a flimsy one.\nOnce the JS was accessible, Claude downloaded password.js to a temp file and started reading.\nThe password widget reveals the auth scheme Inside password.js, which handles the login form\u0026rsquo;s password field:\nif ($.type(encrypt) == \u0026#34;function\u0026#34; \u0026amp;\u0026amp; check){ value = encrypt(value + (obj.withTimestamp ? \u0026#39;_\u0026#39; + $.su.locale.uptime : \u0026#39;\u0026#39;), param); }; Two discoveries in eight lines of code:\nFirst: the password isn\u0026rsquo;t sent in plaintext. It\u0026rsquo;s passed to $.su.encrypt with a parameter called param — which, looking higher up in the file, was [\u0026quot;n\u0026quot;, \u0026quot;e\u0026quot;]. RSA encryption using public key components fetched from the server.\nSecond: when withTimestamp is set, the plaintext being encrypted isn\u0026rsquo;t just the password — it\u0026rsquo;s password_\u0026lt;uptime\u0026gt;. The uptime value comes from $.su.locale.uptime. So somewhere there must be a /locale endpoint that returns the router\u0026rsquo;s current uptime, and the encrypted payload changes every second because the uptime changes every second. A replay attack won\u0026rsquo;t work.\nFinding the uptime endpoint once we knew to look for it: POST /cgi-bin/luci/;stok=/locale?form=lang.\nWhich uncovered the next complication.\nTwo request formats, no documentation Almost every endpoint in the ER605\u0026rsquo;s API takes a data field containing a JSON blob:\nresp = session.post(url, data={\u0026#34;data\u0026#34;: json.dumps({\u0026#34;method\u0026#34;: \u0026#34;get\u0026#34;, \u0026#34;params\u0026#34;: {...}})}) The locale endpoint is the exception. It takes raw form-encoded data:\nresp = session.post(url, data=\u0026#34;operation=read\u0026#34;, headers={\u0026#34;Content-Type\u0026#34;: \u0026#34;application/x-www-form-urlencoded\u0026#34;}) We hit this because JSON-formatted requests to the locale endpoint returned errors. Switching to raw form data worked. The two formats ended up as separate helpers in the client class, _post_json and _post_form, called where appropriate:\ndef _post_json(self, path, data): \u0026#34;\u0026#34;\u0026#34;POST with JSON-wrapped data payload (most endpoints).\u0026#34;\u0026#34;\u0026#34; resp = self.session.post(self._url(path), data={\u0026#34;data\u0026#34;: json.dumps(data)}) resp.raise_for_status() return resp.json() def _post_form(self, path, data): \u0026#34;\u0026#34;\u0026#34;POST with form-encoded data (locale endpoint).\u0026#34;\u0026#34;\u0026#34; resp = self.session.post( self._url(path), data=data, headers={\u0026#34;Content-Type\u0026#34;: \u0026#34;application/x-www-form-urlencoded\u0026#34;}, ) resp.raise_for_status() return resp.json() The RSA encryption: no padding The next find was in encrypt.js. When you see \u0026ldquo;RSA encryption,\u0026rdquo; the default assumption is PKCS#1 v1.5 padding — the format Python\u0026rsquo;s pycryptodome implements in Crypto.Cipher.PKCS1_v1_5. The ER605 doesn\u0026rsquo;t use it.\nThe router\u0026rsquo;s encrypt.js implements a custom nopadding() function. It UTF-8 encodes the plaintext, zero-pads the byte array to the key length, converts that to a big integer, and runs raw modular exponentiation: c = pow(m, e, n). No randomness, no padding scheme. Just textbook RSA applied directly to the zero-padded plaintext.\nUsing PKCS1_v1_5 would produce a ciphertext the router would silently reject as a wrong password, with no indication that the crypto format was the problem. Instead, Claude replicated the custom scheme directly in Python:\ndef _rsa_encrypt(self, plaintext, n_hex, e_hex): \u0026#34;\u0026#34;\u0026#34;RSA encrypt with no-padding (matches router\u0026#39;s encrypt.js).\u0026#34;\u0026#34;\u0026#34; n = int(n_hex, 16) e = int(e_hex, 16) key_len = (n.bit_length() + 7) \u0026gt;\u0026gt; 3 # UTF-8 encode then zero-pad to key length ba = [] for ch in plaintext: c = ord(ch) if c \u0026lt; 128: ba.append(c) elif c \u0026lt; 2048: ba.append((c \u0026amp; 63) | 128) ba.append((c \u0026gt;\u0026gt; 6) | 192) else: ba.append((c \u0026amp; 63) | 128) ba.append(((c \u0026gt;\u0026gt; 6) \u0026amp; 63) | 128) ba.append((c \u0026gt;\u0026gt; 12) | 224) while len(ba) \u0026lt; key_len: ba.append(0) m = int.from_bytes(ba, byteorder=\u0026#34;big\u0026#34;) c = pow(m, e, n) return format(c, \u0026#34;x\u0026#34;).zfill(256) It\u0026rsquo;s not much code. The entirety of the crypto lives in pow(m, e, n). Why would a router vendor write a custom no-padding RSA scheme? Most likely: someone needed to encrypt a short string in a browser context without adding a crypto dependency, saw that textbook RSA could be written in a dozen lines of JavaScript, and never thought about the consequences of deterministic encryption. The 1024-bit key is probably the same decision — small enough to generate in the browser, before anyone was paying attention to key sizes.\nIncidentally, pycryptodome shows up in the project\u0026rsquo;s dependencies (as a transitive requirement from tplinkrouterc6u) but is never used in our code. We didn\u0026rsquo;t need it.\nFinding the status endpoint With a working login that returned a session token (stok), the next problem was finding an endpoint that returned WAN interface data.\nThe obvious guesses produced a consistent response:\n{\u0026#34;id\u0026#34;: 1, \u0026#34;error_code\u0026#34;: \u0026#34;1014\u0026#34;} 1014 is the router\u0026rsquo;s \u0026ldquo;no such endpoint\u0026rdquo; code. We tried variations: /admin/wan?form=status, /admin/interface?form=status, /admin/network?form=wan_status — all returned 1014.\nThe solution was to look at the web UI pages that display WAN status in a browser. Each page under /webpages/pages/userrpm/ contains inline JavaScript with calls shaped like $.su.url(\u0026quot;/admin/...\u0026quot;) that reveal the actual API paths the page uses. Fetching interface_wan.html with a browser User-Agent and grepping for those calls produced:\n/admin/interface?form=status2 Note the trailing 2. There\u0026rsquo;s presumably an older v1 endpoint the firmware kept around for compatibility, and this is the current one. The generalizable lesson here: when reversing a web-app-style admin UI, the endpoint you want is almost always referenced in the page that displays the data. Read the source of that page before guessing endpoint names.\nThe login payload: a Lua error helps One more footgun before everything worked together. The login POST body must be structured as:\n{ \u0026#34;method\u0026#34;: \u0026#34;login\u0026#34;, \u0026#34;type\u0026#34;: \u0026#34;default\u0026#34;, \u0026#34;params\u0026#34;: { \u0026#34;username\u0026#34;: \u0026#34;admin\u0026#34;, \u0026#34;password\u0026#34;: \u0026#34;\u0026lt;encrypted\u0026gt;\u0026#34; } } The params wrapper around username and password is not optional. Send those fields at the top level and the router returns:\nattempt to index field \u0026#39;params\u0026#39; (a nil value) That\u0026rsquo;s a raw Lua traceback leaking through the JSON API. The router\u0026rsquo;s backend is OpenWrt\u0026rsquo;s Luci framework with Lua handler scripts, and the handler was doing request.params.username without a nil check. Our malformed JSON exposed the internals. In hindsight this is a useful diagnostic: a Lua error in a JSON response tells you exactly what you\u0026rsquo;re talking to, and roughly how the request parsing works on the other side.\nThe first successful run After about 35 minutes of work — most of it reading JavaScript files — Claude ran the first real query. The output:\nInterface Type Status IP Address Gateway ---------------------------------------------------------------------- WAN (WAN1) pppoe DOWN - - WAN/LAN1 (WAN2) dhcp UP 192.168.2.101 192.168.2.1 LAN (LAN1) static UP 192.168.1.1 - The PPPoE connection was actually down at the moment we ran the script for the first time. The monitor worked, and it confirmed the original complaint, in the same instant. The fibre had been dropping. The script could see it.\nBuilding the monitor The actual monitoring logic is straightforward once you can query the router. monitor.py runs in a loop, polling every five minutes:\nLogin to the router, query /admin/interface?form=status2, logout. Write each interface\u0026rsquo;s status to a daily CSV file (logs/YYYY-MM-DD.csv). Track PPPoE state transitions: send an immediate email when it goes DOWN, hourly reminders while it stays down, and a \u0026ldquo;restored\u0026rdquo; email when it comes back — including how long it was out. Persist state to state.json so the script survives a restart mid-outage. The state persistence led to one useful addition: a recover_state_from_logs() function that scans the CSV history to reconstruct the last known PPPoE state if state.json is missing. If the process is restarted and we have no state, we need to know whether the current \u0026ldquo;DOWN\u0026rdquo; just started or has been going on for six hours.\nThe script runs as a systemd user service. One non-obvious detail: the service needs PYTHONUNBUFFERED=1 in its environment. Without it, Python\u0026rsquo;s output buffering swallows print() calls and nothing appears in journalctl. The symptom — an empty log when the service is clearly running — is confusing until you know what to look for.\nEmail goes through Gmail SMTP with an app password. The down and reminder emails include the ISP\u0026rsquo;s support contact details, because that\u0026rsquo;s the information you actually need during an outage and you want it in the same notification.\nWhat I learned doing this with Claude Code A few things about this project worth generalizing:\nI contributed two hints. The reverse engineering required two pieces of information from me: the URL to start from (https://192.168.1.1/webpages/login.html), and the router password. Everything else — the SSL issue and its fix, the User-Agent requirement, the password.js discovery, the timestamp scheme, the RSA no-padding, the two request formats, the params wrapper, the correct endpoint — Claude figured out by reading the device\u0026rsquo;s own code. I was more navigator than pilot.\nLLMs are surprisingly good at \u0026ldquo;read this and tell me what it does.\u0026rdquo; The bottleneck in reversing an admin API like this isn\u0026rsquo;t network probing — it\u0026rsquo;s understanding the auth flow. The ER605 made this tractable because it ships its crypto and form logic as readable browser JavaScript. Claude read password.js, recognised the RSA pattern, fetched encrypt.js, saw the custom no-padding scheme, and replicated it in Python without me having to understand the underlying math. This generalizes: any time your reverse engineering target is a web app that ships its JS unminified, an LLM can do a lot of the reading work.\nFailures were diagnostic, not destructive. An LLM trying wrong endpoint names and getting 1014 errors isn\u0026rsquo;t wasted effort — it\u0026rsquo;s narrowing the search space. The Lua traceback from the malformed login payload told us exactly what to fix. The router couldn\u0026rsquo;t be broken from our probing; we just got error codes back. This is a better failure mode than, say, reversing a binary where a wrong guess means a segfault.\nDocument at the end of the session. At the end of the 35-minute session, Claude wrote a CLAUDE.md file in the project directory summarising all the API gotchas: the no-padding RSA, the params wrapper, the SECLEVEL issue, how to find new endpoints by reading the UI pages. Three months later, that file was the source of truth for this blog post. If you\u0026rsquo;re using Claude Code for exploration or reverse engineering, have it write down what it found before you close the session. The knowledge is perishable if it only lives in a terminal.\nOne honest limit: this worked because the router ships readable JavaScript. A device that compiled its admin UI to WebAssembly, or that handled auth in a native binary outside the browser, would be significantly harder. The TP-Link engineering team did us a favour by writing their crypto in plain JS.\nThree months later The script has been running on an Intel NUC on the same LAN since February. It\u0026rsquo;s produced 56 daily CSV logs and sent alerts for half a dozen outages — some brief, some long enough that the ISP contact details in the reminder email actually got used.\nAs I write this, state.json shows:\n{ \u0026#34;pppoe_up\u0026#34;: false, \u0026#34;down_since\u0026#34;: \u0026#34;2026-05-14T14:00:49\u0026#34;, \u0026#34;last_down_email\u0026#34;: \u0026#34;2026-05-16T14:33:19\u0026#34; } The fibre has been down for two days. The monitor has been emailing me about it every hour. The ISP is presumably working on it.\n","permalink":"https://blog.randomdomain.co.za/posts/2026/05/router-api/","summary":"\u003cp\u003eMy home fibre drops. Not constantly — but regularly enough that I\u0026rsquo;ve developed a habit of opening the router admin page when Netflix stalls, just to confirm what I already suspect.\u003c/p\u003e\n\u003cp\u003eThe TP-Link ER605 I use is a decent small-business router: dual WAN, PPPoE on WAN1 for fibre, DHCP on WAN2 for a mobile backup. When the fibre goes down the router fails over silently. The backup connection works. Everything \u003cem\u003esort of\u003c/em\u003e continues — browsing is slower, calls drop, but nothing catastrophically fails.\u003c/p\u003e\n\u003cp\u003eThe problem is I don\u0026rsquo;t \u003cem\u003eknow\u003c/em\u003e it happened until much later, and I can\u0026rsquo;t easily tell how long it was down for. The only real evidence is a log line buried in the router\u0026rsquo;s syslog:\u003c/p\u003e\n\u003cpre tabindex=\"0\"\u003e\u003ccode\u003e2 2026-02-23 12:35:14 PPPoE Client WARNING [7C-F1-7E-74-E5-52]: PPPoE \u0026lt;account\u0026gt;@isp.co failed\nto connect to the server because sending PADI times out.\n\u003c/code\u003e\u003c/pre\u003e\u003cp\u003eI wanted to fix this. I wanted an email the moment the fibre dropped, hourly reminders while it was still down, and a \u0026ldquo;connection restored\u0026rdquo; email with total downtime included. Could I write a script that polled the router?\u003c/p\u003e\n\u003cp\u003eThis is the story of using Claude Code to figure out how — starting from nothing and ending, 35 minutes later, with working code.\u003c/p\u003e","title":"Reverse-engineering my router's secret API so an AI could watch it"},{"content":"In June 2020, the Constitutional Court handed down New Nation Movement NPC and Others v President of the Republic of South Africa and Others [2020] ZACC 11. The order, in the part everyone quotes, says this:\nIt is declared that the Electoral Act 73 of 1998 is unconstitutional to the extent that it requires that adult citizens may be elected to the National Assembly and Provincial Legislatures only through their membership of political parties.\nBut neither the court, not anyone else, actually read the Electoral Act.\nThe declaration was suspended for 24 months. Parliament missed the deadline. There was an extension. Then another extension. Then the Electoral Amendment Act 1 of 2023 was rushed onto the statute book. Then One Movement South Africa NPC v President of the Republic of South Africa [2023] ZACC 42 struck down parts of that. Then the Electoral Matters Amendment Act 2024 happened. Then we had an election in 2024 and almost no independent candidate got anywhere near a seat.\nIt was, by any measure, a saga.\nIt was also — and this is the bit I want to argue — completely unnecessary. The Electoral Act, in the schedule that everyone was busy declaring unconstitutional, already told you exactly how to be an independent candidate. You just had to read it. The Court did not. The applicants did not. The Minister did not. The IEC did not. Nobody asked, in the technical sense the statute itself required them to ask, what a \u0026ldquo;party\u0026rdquo; actually is.\nSo before we get to the punchline — which involves a Khoisan princess, an enormous court-built scaffold — let me walk you through how we got here.\nThe cast of characters The case began in the Western Cape High Court as New Nation Movement NPC and Others v President of the Republic of South Africa and Others (Case No 17223/18). The applicants were:\nNew Nation Movement NPC — a registered not-for-profit, the lead applicant. Chantal Dawn Revell — described in her own founding affidavit as \u0026ldquo;a Princess of the Korona Royal Household which is one of the five official Royal Priesthoods of the Khoe and the San First Nations\u0026rdquo; (WCHC judgment, para 4). She is the Khoisan princess of this story. Mediation Foundation for Peace and Justice — present at the High Court, dropped out by the time the matter reached the Constitutional Court. GRO — a small civil-society outfit. Indigenous First Nation Advocacy SA PBP (FNASA) — another civic body. Cited as respondents were the President, the Minister of Home Affairs, the Electoral Commission, and the Speaker of the National Assembly. (The NCOP was later roped in.) The President and Speaker effectively abided the court\u0026rsquo;s decision; the Minister and the IEC opposed.\nThe complaint was simple to state and, on its face, hard to argue with: the Electoral Act 73 of 1998 forces every aspirant to public office at national and provincial level to do so as a member of a political party. There is no legal route for an individual to stand alone. That, said the applicants, is unconstitutional because it limits the right under section 19(3)(b) of the Constitution — \u0026ldquo;Every adult citizen has the right \u0026hellip; to stand for public office and, if elected, to hold office\u0026rdquo; — and the right under section 18 to freedom of association (which, said the applicants, includes the right not to associate).\nStop one: the Western Cape High Court (Desai J, 17 April 2019) The application was urgent — the 2019 general election was three weeks away on 8 May 2019. The High Court had little patience.\nDesai J\u0026rsquo;s judgment is brisk. He dismissed the application, reasoning along these lines:\nSection 19(3)(b) is silent. It says \u0026ldquo;Every adult citizen has the right \u0026hellip; to stand for public office\u0026rdquo;. It does not say \u0026ldquo;as an independent candidate as opposed to a member of a political party\u0026rdquo;. A textual reading does not give the applicants what they want.\nThe Constitution as a whole points away. Section 1(d) declares South Africa a state founded on \u0026ldquo;a multi-party system of democratic government\u0026rdquo;. Sections 46(1)(a) and 105(1)(a) leave the design of the electoral system to \u0026ldquo;national legislation\u0026rdquo;, subject to the requirement in sections 46(1)(d) and 105(1)(d) that the system \u0026ldquo;results, in general, in proportional representation\u0026rdquo;. The party-list system the Electoral Act adopts is the result of that legitimate parliamentary choice.\nMs Revell\u0026rsquo;s reasons for not joining a party are \u0026ldquo;rather tenuous\u0026rdquo;. Desai J was particularly unimpressed with the second applicant\u0026rsquo;s affidavit. She said she did not want to belong to a political party because she had no confidence in any of them, and because the Royal Houses she represented had committed themselves to political non-partisanship. That explanation, the judge said with a verbal shrug, \u0026ldquo;warrants little comment and is hardly compelling\u0026rdquo;. She could join or form a party; she simply elected not to.\nMogoeng CJ\u0026rsquo;s \u0026ldquo;dictum\u0026rdquo; in My Vote Counts II was obiter and not binding. The applicants had leaned heavily on a passage in My Vote Counts NPC v Minister of Justice and Correctional Services [2018] ZACC 17, where the then-Chief Justice said \u0026ldquo;every adult citizen may in terms of the Constitution stand as an independent candidate to be elected to municipalities, Provincial Legislatures or the National Assembly\u0026rdquo;. Desai J dismissed it as \u0026ldquo;quite patently obiter\u0026rdquo; — and worse, said it was directly contradicted by the Ramakatsa statement that \u0026ldquo;the Constitution itself obliges every citizen to exercise the franchise through a political party\u0026rdquo; (drawing on Ramakatsa v Magashule [2012] ZACC 31).\nThe right does not exist, in other words. Parliament was entitled to choose a closed-list PR system. Once it had chosen, that choice was not unconstitutional.\nThe application was dismissed. The applicants raced to the Constitutional Court on direct appeal.\nStop two: the urgency hearing (ZACC 27/2019, 2 May 2019) The Constitutional Court, on the eve of the 2019 election, heard argument only on whether the matter was urgent. It concluded that it was not — six days from polling day was not a sensible time to be redrawing the electoral system — and postponed the merits hearing to 15 August 2019. The 2019 election proceeded under the existing party-list system. Nobody — least of all the urgent applicants — managed to vote for an independent candidate. The case lived to fight another day.\nStop three: the main Constitutional Court judgment (ZACC 11/2020, 11 June 2020) This is the judgment that everyone cites. Madlanga J wrote the majority (with Cameron J, Jafta J, Khampepe J, Mathopo AJ, Mhlantla J, Theron J and Victor AJ concurring). Jafta J wrote a concurrence on slightly different grounds. Froneman J dissented.\nThe majority\u0026rsquo;s reasoning, stripped of footnotes, runs like this.\nSection 19 is broader than its bullet points Section 19(1) says \u0026ldquo;Every citizen is free to make political choices, which includes the right—(a) to form a political party; (b) to participate in the activities of, or recruit members for, a political party; and (c) to campaign for a political party or cause.\u0026rdquo; The word \u0026ldquo;includes\u0026rdquo;, Madlanga J said, is not exhaustive — the listed choices are examples. \u0026ldquo;A conscious choice not to form or join a political party is as much a political choice as is the choice to form or join a political party; and it must equally be deserving of protection\u0026rdquo;.\nSection 18\u0026rsquo;s negative right Section 18 — \u0026ldquo;Everyone has the right to freedom of association\u0026rdquo; — was read to include the negative right not to associate. Madlanga J spent a chunk of paragraphs on European and African human-rights jurisprudence — Young, James and Webster v UK; Sigurjónsson v Iceland; Chassagnou v France; Tanganyika Law Society v Tanzania — to build the proposition that the right to freedom of association protects the right not to be coerced into joining one. Forcing someone to join a party in order to access the political process is exactly that kind of coercion.\nReconciling section 19(3)(b) with the rest of the Constitution The High Court — and the respondents — had said section 1(d), section 46(1)(a), section 105(1)(a) and section 157(2)(a) all pointed to a party-only system. Madlanga J accepted that section 157(2)(a) (governing municipalities) genuinely does require an exclusively-party system at municipal level, but treated this as a \u0026ldquo;discrete and narrow limitation\u0026rdquo; peculiar to local government — flowing from the unique negotiated history around municipal transformation. The general right under section 19(3)(b) remained the governing norm everywhere else. And the apparent tension with the transitional Schedule 2 arrangements was, by 2020, spent: those arrangements were always meant to be temporary.\nJustification under section 36 The Court asked whether the limitation was reasonable and justifiable. The Minister had filed no real justification evidence in the High Court — his answering affidavits \u0026ldquo;make no attempt to deal with the question of justification\u0026rdquo;. What was offered in argument compared the existing system to a single Bill that had been introduced and not adopted, which \u0026ldquo;says nothing about why the exclusion of independent candidates by the Electoral Act is justified\u0026rdquo;. The state had failed to discharge the onus on it. The limitation was not justified.\nRemedy The Court declared the Electoral Act unconstitutional \u0026ldquo;to the extent that it requires that adult citizens may be elected to the National Assembly and Provincial Legislatures only through their membership of political parties\u0026rdquo;, and suspended the declaration for 24 months to allow Parliament to remedy the defect.\nWhat the majority leaned on I want to be fair to Madlanga J. The judgment is well-written and the textual argument about section 19(1) — \u0026ldquo;includes\u0026rdquo; is not \u0026ldquo;is\u0026rdquo; — is genuinely good. The negative-right-to-associate reasoning is supported by a respectable body of international case law. The point that the Minister did not bother to defend the limitation on section 36 grounds is, frankly, the most damning thing about the respondents\u0026rsquo; case.\nBut the judgment never engages, at all, with how the Electoral Act actually allocates seats. It never goes inside the schedule. It never asks how a \u0026ldquo;party\u0026rdquo; is defined for purposes of contesting an election. It never asks whether the impugned system might already contemplate parties that look indistinguishable from independent candidates. It treats \u0026ldquo;party\u0026rdquo; as if it must necessarily mean a mass-membership organisation rather than a legal vehicle.\nThat is the bit I want to come back to. Hold the thought.\nStop four: the extensions (ZACC 24/2022 and ZACC 12/2023) Parliament, in the unsurprising way of Parliaments, missed the 24-month deadline. The Minister came back to the Court asking for an extension. The Court granted one in Minister of Home Affairs v New Nation Movement NPC [2022] ZACC 24, and then a further one in [2023] ZACC 12. The Electoral Amendment Act 1 of 2023 was eventually enacted in April 2023.\nWhat the Electoral Amendment Act did, in short, was bolt independent candidates onto the existing party-list architecture. A new section 31B was inserted, requiring independent candidates to gather a fixed quantity of supporting signatures before they could nominate. The seat-allocation formulas in Schedule 1A were rewritten so that independent candidates\u0026rsquo; votes counted toward filling regional but not the national-list \u0026ldquo;compensatory\u0026rdquo; half of the National Assembly\u0026rsquo;s 400 seats. The architecture remained recognisably a closed-list PR system; independent candidates were a graft, not a redesign.\nStop five: One Movement (ZACC 42/2023, 4 December 2023) Predictably, the graft was challenged. One Movement South Africa NPC v President [2023] ZACC 42 came directly to the Constitutional Court. Zondo CJ wrote the lead judgment; Kollapen J wrote separately on the signature challenge; Theron J dissented in part.\nTwo issues mattered:\nThe signature requirement. The Electoral Amendment Act required independents to gather signatures equal to 15% of the previous-election quota in each region they wished to contest. The Court held this was an unjustifiable limitation on the rights to associate, to make political choices, and to stand for office. The remedy was a reading-in: 1,000 signatures per region became the interim threshold, with Parliament given 24 months to fix it.\nThe recalculation challenge. OSA also attacked items 7, 12 and 23 of Schedule 1A — the provisions governing forfeiture, surplus distribution and recalculation of seats when independents are in the mix. The applicant said these put independent candidates at a competitive disadvantage compared to political parties because, where parties could share the spoils of \u0026ldquo;surplus\u0026rdquo; votes, independents could not. The Court (majority) dismissed this challenge. The basic structural choice — that votes for an independent could not be recycled across regions or topped up from the national list — was held to be a defensible legislative choice flowing from the very nature of an independent candidacy.\nNotice what was being argued and what was being assumed. Everyone — applicants, respondents, the Court — was treating Schedule 1A as the thing that needed new rules to accommodate independent candidates. Nobody asked whether the original Schedule 1A had, in its quiet bureaucratic way, already done so.\nThe bit I really want to talk about In 2003, by section 25 of the Electoral Laws Amendment Act 34 of 2003, Parliament inserted Schedule 1A into the Electoral Act. Schedule 1A is the seat-allocation engine. It is dense, formulaic, and reads like a chartered accountant\u0026rsquo;s nightmare. Buried inside it is item 7(1). I will quote it in full, because the whole argument turns on it:\n7. (1) If a party has submitted a national or a regional list containing fewer names than the number of its provisional allocation of seats which would have been filled from such list in terms of item 8 or 9 had such provisional allocation been the final allocation, it forfeits a number of seats equal to the deficit.\nRead that again. Slowly.\nWhat it says is: if a party has, say, three names on its list, and its share of the vote provisionally entitles it to five seats, it does not lose its three seats. It loses only the deficit — the two seats it cannot fill because it has run out of names. The three names get seated. The two extra seats are redistributed among the other parties under the recalculation rules in items 7(2) and 7(3).\nNow consider the limiting case. Suppose a party — and remember, the Electoral Act, as a piece of statutory drafting, defines \u0026ldquo;registered party\u0026rdquo; as a party registered in terms of section 15 of the Electoral Commission Act, a provision which, throughout all of this litigation, was never itself impugned — fields a list with one name. Suppose that name\u0026rsquo;s share of the national vote, when it is run through the items 5 and 6 quotas, produces a provisional allocation of, say, four seats.\nWhat does item 7(1) say happens? The single named candidate takes their seat. The other three seats are forfeited and recalculated. Job done. Our one-name party is now represented in the National Assembly by the single human being whose name was on the list.\nA \u0026ldquo;party of one\u0026rdquo; is, in any sense that matters to the person actually being elected, an independent candidate.\nThe Electoral Act, in other words, already contained a path by which an individual could stand for the National Assembly without joining anyone else\u0026rsquo;s organisation. They simply had to register a vehicle — a \u0026ldquo;party\u0026rdquo;, in the dry administrative sense of the term — with their name on the list. The vehicle could have one member. It could have one name on its list. The forfeiture rule in item 7(1) would catch any \u0026ldquo;excess\u0026rdquo; seats and redistribute them. The candidate would still be seated.\nThis is not a strained or formalistic reading — it is what item 7(1) says. Is the result any stranger than the post-New Nation architecture, where independent candidates have to gather 1,000 signatures per region and where their \u0026ldquo;surplus\u0026rdquo; votes are essentially binned? Hardly.\nDid the applicants make this argument? No.\nDid the Constitutional Court engage with it? Also no.\nThe High Court came closest. Desai J said, almost in passing, at paragraph 7 of his judgment: \u0026ldquo;The First Applicant registered or intended to register a political party. If that party wins enough votes in the national election, any member of that party could be elected to public office.\u0026rdquo; He used this to suggest the applicants did not actually need the relief they sought. He used it badly — because forming a multi-member party is not the same as standing alone — but the seed of the point was right there in his judgment, and nobody on appeal picked it up.\nMadlanga J\u0026rsquo;s majority opinion is 128 paragraphs long. The word \u0026ldquo;Schedule\u0026rdquo; appears, but Schedule 1A — the operative mechanism that actually translates votes into seats — never gets a proper unpacking. The Court took it as a given that the existing system required party membership for a seat. It then spent paragraph after paragraph proving, with international comparators and constitutional values, that requiring party membership was bad. It pulled values out of section 19, out of section 18, out of section 1(a), out of August, out of Ramakatsa. It produced lovely jurisprudence on negative rights of association. It cited Alexis de Tocqueville.\nWhat it never did was read the formula. Had any of the parties run the numbers through Schedule 1A, they would have noticed that the system was already perfectly capable of seating a single human being whose name appeared on a registered party\u0026rsquo;s list. The list could have one entry. The \u0026ldquo;party\u0026rdquo; could be a paper-thin legal vehicle. The seat would still be the seat.\nThe Constitutional Court did not, as it must when interpreting a statute, read the statute. It read its own values and announced that the right was there because it had to be there. It then, in deference to Parliament, suspended the declaration so Parliament could go and build the very thing the statute, on a less squeamish reading, already had.\nThis is what I mean when I say the Court sucked values out of the air. There was no need to. They were sitting in item 7(1) of a Schedule that had been on the statute book since 2003.\nThere is a serious version of this critique that is worth stating without the snark.\nWhen a court is asked to declare a statute unconstitutional for what it fails to provide, the court\u0026rsquo;s first duty is to read the statute the way Parliament drafted it and ask whether the alleged gap really is a gap. If a workable interpretation already exists within the four corners of the Act, the court should prefer it. That is the avoidance canon; it is also the rule of constitutional subsidiarity, and the rule about reading-down statutes to save them. The Constitutional Court itself has said as much, repeatedly, in Hyundai and Investigating Directorate: Serious Economic Offences v Hyundai Motor Distributors and in countless cases since.\nIn New Nation Movement, the Court did not perform that exercise on Schedule 1A. It accepted at face value that the Electoral Act required party membership, even though the seat-allocation mechanics did not in fact require it. It then leapt to the constitutional values level and declared the Act invalid. The remedy was not a reading-in or a reading-down. It was a strike-down, suspended, on the basis that Parliament would build the missing thing from scratch.\nYou can think this is fine — that the right ought to be enforced at the constitutional level even if a workable statutory route exists. You can also think this is a missed opportunity to do less judicial damage. I think it is the latter.\nThe final irony After all of this — five judgments, two extensions, two Amendment Acts, and hundreds of pages of jurisprudence on the negative right to associate — when the 2024 general election finally rolled around, our Khoisan princess, Chantal Dawn Revell, the second applicant whose case carried the whole challenge from the Western Cape High Court to the Constitutional Court and back, did not even stand as an independent candidate.\nThe principal beneficiary of the right she had fought five years to establish did not exercise it. Not because she had to — of course she didn\u0026rsquo;t, that is precisely the negative-right-to-associate point Madlanga J spent so much energy defending — but because the entire constitutional architecture of independent candidacy in South Africa was built around an asserted need that, in the end, the person asserting it did not personally have.\nThe plumbing did not need to be ripped out. The plumbing already worked. The Constitutional Court walked into a perfectly serviceable bathroom, declared it unconstitutional, and told Parliament to rebuild it from a bare slab. Parliament obliged, badly, with a graft of signature thresholds and recalculation rules that promptly got challenged again in One Movement.\nMeanwhile, the Electoral Act, in item 7(1) of Schedule 1A, had quietly been saying for two decades: if you want to be a party of one, just put one name on the list. The forfeiture rule will tidy up the rest.\nWas there precedent for the notion of a \u0026ldquo;party of one\u0026rdquo;? Yes — and it predates all of this by almost two years. Even Brandi Carlile knew a party of one could exist:\nA party of one. It was always there. Nobody asked.\n","permalink":"https://blog.randomdomain.co.za/posts/2026/05/party-of-one/","summary":"\u003cp\u003eIn June 2020, the Constitutional Court handed down \u003cem\u003eNew Nation Movement NPC and Others v President of the Republic of South Africa and Others\u003c/em\u003e [2020] ZACC 11. The order, in the part everyone quotes, says this:\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003eIt is declared that the Electoral Act 73 of 1998 is unconstitutional to the extent that it requires that adult citizens may be elected to the National Assembly and Provincial Legislatures only through their membership of political parties.\u003c/p\u003e\u003c/blockquote\u003e\n\u003cp\u003eBut neither the court, not anyone else, actually read the Electoral Act.\u003c/p\u003e","title":"Party of One - How the Constitutional Court manufactured a right to be an independent candidate"},{"content":"A Political Critique of Maya CJ’s Judgement in EFF v Speaker (CCT 35/24)\nThe first judgment in Economic Freedom Fighters and Another v Speaker of the National Assembly and Others [2026] ZACC 17 is being read, in the public conversation, as a vindication of constitutional accountability against a recalcitrant parliamentary majority. That reading is comfortable, and it is wrong. It is comfortable because it tracks an emotionally satisfying narrative: a powerful President accused of corruption, a governing party closing ranks, and a Chief Justice who refuses to let the matter rest. It is wrong because it mistakes the form of Maya CJ’s judgment for its substance. The substance, prised free of its rhetorical packaging, is a piece of constitutional politics. The first judgment delivers, by judicial means, a censure of President Ramaphosa that the National Assembly declined to deliver politically. It is not the role of the Constitutional Court to be the appeals branch of a defeated opposition motion. Maya CJ’s judgment treats the Court as though it were.\nThree features of the first judgment, taken together, identify what kind of judgment it is. The first is its prosecutorial scene-setting: the lavish recitation of the Phala Phala material before the constitutional question has been formally posed. The second is its construction of an accountability obligation in section 89 that the section’s text does not contain and that its grammar cannot bear. The third is the asymmetric remedy ordered by the Court — a positive Panel finding binds the National Assembly, a negative finding does not. No principled constitutional theory unifies those three features. They are unified by the political result they were designed to deliver. The Chief Justice did not write a constitutional judgment that happened to land on a politically favourable outcome. She wrote a politically favourable outcome and arranged a constitutional judgment around it. The rest of this essay shows that, and explains why it matters.\nBegin with the most conspicuous tell. A Chief Justice writing on the constitutional validity of a parliamentary rule does not need to summarise the smell of the alleged crime that produced the parliamentary vote she is reviewing. She especially does not need to do so in the first quarter of her judgment, before the constitutional question has been formally posed. Yet that is precisely what Maya CJ does. Paragraphs [12] to [24] of the first judgment marshal, in the gravest possible register, the Panel’s account of foreign currency hidden inside cushions, of police officials asked to “handle the matter with discretion”, of suspects allegedly paid R150,000 each to buy their silence, of a President who “abused his position as Head of State”. The function of this material is not legal. The function is political. It pre-loads the reader’s moral register before any constitutional principle is articulated, so that when the principle finally arrives — that section 89 must be read as imposing an accountability obligation, and that rule 129I is incompatible with that obligation — any rebuttal feels like a defence of impunity rather than what it actually is: a defence of the constitutional text the National Assembly actually has, as against the text Maya CJ wishes it had.\nIt helps to state, plainly, what the constitutional question actually was. The Court was asked whether rule 129I of the National Assembly Rules — the rule the Assembly had adopted in response to this Court\u0026rsquo;s earlier orders in EFF I and EFF II — was consistent with section 89(1) of the Constitution, read with section 55. That is a narrow structural question. It concerns the design of a procedural rule against the requirements of the Constitution\u0026rsquo;s impeachment framework. It does not, on any characterisation, require a court to assess the strength of the Panel\u0026rsquo;s findings, the guilt or innocence of the President, or the moral valence of the events at Phala Phala farm. Majiedt J, writing the third judgment in the same matter on the same papers, understood this. His second paragraph — [303] — puts the question as follows: \u0026ldquo;The central question is whether rule 129I ensures that the National Assembly can effectively discharge its obligation under section 89(1), read with section 55 of the Constitution\u0026rdquo;. He takes ten words to dispose of the Phala Phala material at [310]: \u0026ldquo;the merits of the Report\u0026hellip; is not the issue here.\u0026rdquo; A colleague on the same bench, addressing the same case, reached the constitutional question in his second paragraph and had no use for the factual narrative. That contrast is not incidental. It shows that the prosecutorial scene-setting in the first judgment was not compelled by the case. It was a choice.\nA judge who writes this way is not performing the function of a court. Prosecutors set the scene; their case stands or falls on whether the reader feels the weight of what was done. Courts are supposed to ask a narrower and colder question: does the law authorise this rule, or does it not? Maya CJ allows the moral register of the prosecutor to do the work that her constitutional analysis cannot. The Phala Phala material is the rhetorical scaffolding of a judgment which, without it, would have to confess that section 89 says “may”, that the word “accountability” appears nowhere in the section, that section 89 confers a discretion on the National Assembly, and that the body that exercised that discretion in December 2022 did so by a majority of 214 to 149. None of those facts changes if the President was guilty. None changes if he was innocent. They are facts about the Constitution. The first judgment cannot bear to leave them alone.\nThe political character of the first judgment becomes clearer still once one sees what it does with section 89. The textual problem is plain on the face of the section. The word “accountable” appears nowhere in section 89; the section’s operative verb is the permissive “may”; and Kollapen J was right to observe that an obligation cannot be smuggled in alongside a discretion that the very same sentence confers. The political problem follows. Section 89 confers a discretionary power on the National Assembly to remove a President. The drafters used the word “may”. They could have written “must”. They wrote “may” because they understood that the question of removal is an intensely political question, properly answered by an elected body in the political register, not extracted by judicial command from the silences in the text. Maya CJ’s response to that drafting choice is to convert the discretion into a duty whenever, in her view, the political body has exercised the discretion the wrong way. The conversion is dressed in interpretive language — section 89 is to be “read in light of” sections 1(d), 42(3) and 55(2), and so on — but its substance is straightforward: in a case where the Assembly’s vote produced a politically unsatisfying result, the Court will reach behind the vote and locate an obligation in the text that requires a different result. The political effect is to shift the locus of impeachment decision-making away from the National Assembly and toward the constitutional review that now follows every National Assembly decision the opposition dislikes.\nIf that sounds harsh, consider the remedy. The reading-in ordered by the Court provides that where the Panel finds sufficient evidence, the Assembly is bound to refer the matter to the Impeachment Committee. Where the Panel finds the contrary, the Assembly may proceed regardless. There is no principled accountability theory in which those two situations are different. They are constitutionally identical: in each, a panel has reported, and the political body must decide. To bind the Assembly when the Panel says “proceed” while leaving it free to proceed anyway when the Panel says “do not” is to embed in the constitutional architecture a permanent advantage for those who wish to pursue impeachments and a permanent disability on those who do not. That is not a structural rule. It is an operational instruction. It tells future minority parties in the National Assembly that the Panel’s pen now does the work that they could not do at the despatch box. And it tells the governing party — whichever party governs — that the Constitution has been judicially refitted to deny it the option the Assembly exercised in December 2022. The asymmetry is the political shape of the remedy. It is the shape one would design if one wanted to maximise the political utility of section 89 for opposition parties and minimise it for governing ones. It is not the shape one would design if one were doing constitutional interpretation in a politically blind way.\nNotice, too, the shape of the institutional rearrangement the judgment effects. Before EFF v Speaker, the constitutional position was that the National Assembly was the political body to which the Constitution had assigned a political question, and that the Court would intervene where the body acted unlawfully or irrationally — that is, where its conduct, not its judgment, fell foul of constitutional standards. Maya CJ’s judgment quietly inverts the relationship. The Assembly’s judgment itself is now reviewable on a constitutional standard that the Court will supply. The instrument of that review is the manufactured accountability obligation in section 89. Whenever the Assembly votes in a way the Court regards as inconsistent with that obligation, the Court can strike down not merely the vote but the rule that permitted the vote. The Court has, in other words, written for itself a roving commission to police political judgment in impeachment processes. Kollapen J saw this, which is why he was at pains to insist that the same constitutional result could have been reached through a rationality challenge to the vote alone. The first judgment chose the broader weapon. It chose the weapon that strikes not only the present case but the next one, and the one after that. That is a political choice. It enlarges the Court’s domain at the expense of the National Assembly’s.\nIt is sometimes said in defence of judgments of this kind that they merely give content to the values of the Constitution — accountability, the rule of law, the supremacy of the Constitution itself — and that to characterise them as political is to misunderstand what constitutional adjudication is. The defence has a kernel of truth. All constitutional adjudication is, in a sense, political: it concerns the allocation of authority among the branches of state. But there is a difference between a judgment that takes a constitutional text and works outward from it, even into politically charged territory, and a judgment that takes a politically desired outcome and works backward to a text that can be made to support it. The first judgment is of the second kind. The prosecutorial preamble, the manufactured accountability obligation, and the asymmetric remedy described above are the moves of a judgment that started from its destination. The political question is what follows from that. The answer is that a Court whose Chief Justice writes that way will be relied upon, in the next politically charged case, by the side that lost the political vote. She will be expected to deliver the same favour she delivered here. If she does, she will degrade further the political legitimacy of parliamentary outcomes. If she does not, she will expose the first judgment as a one-off, and one-offs are how Courts lose institutional authority.\nThere is a second institutional consequence, less remarked but more corrosive. By recasting section 89 as an instrument of executive accountability rather than a discretion to remove a sitting President, the first judgment narrows the political space within which the President can govern. The judgment makes the Section 89 Panel — a body without political accountability of any kind — the de facto gatekeeper of an impeachment. Once the Panel has reported, the Assembly’s role is, on the order, to ratify or to refuse to ratify; and the order makes refusing to ratify a constitutionally suspect act whenever the Panel has found sufficient evidence. This is not, on any view, what the drafters of section 89 had in mind. The Panel was conceived as a procedural filter, not as a political tribunal whose findings, once issued, bind the elected body. Maya CJ’s judgment elevates the Panel into a tribunal in fact, while denying that it is one in form. The political consequence is that future Presidents will be hostage to the Panel’s assessment in a way that the Constitution does not require them to be. The political consequence for the National Assembly is that its members will be reluctant to vote against a Panel’s positive finding, regardless of whether they think the case justified, because to do so is to invite the constitutional litigation that has now been licensed. The chilling effect on parliamentary deliberation is exactly the kind of consequence that judges with proper institutional humility weigh before issuing orders. The first judgment shows no sign of having weighed it.\nDefenders of Maya CJ will say that all of this is the unavoidable cost of holding power to account in a country where holding power to account is hard. They will say that the Phala Phala material is part of the necessary context, that the accountability framework is faithful to the Constitution’s basic values, that the asymmetric remedy reflects the asymmetry between investigation and exoneration, and that the Court has done no more than its constitutional duty. Each of these claims can be made. But each is the language of advocacy. They are the things the EFF’s counsel argued in submission. The point of having a Chief Justice is to be capable of resisting the advocate’s frame when the law does not support it. The first judgment does not resist. It absorbs. The Phala Phala material is in the judgment because the EFF placed it there. The accountability framework is in the judgment because the EFF’s case required it. The asymmetric remedy reflects, not constitutional logic, but the political objective the EFF brought to the Court. A Chief Justice who allows her judgment to be shaped this way has, in effect, served as senior counsel for the side that hired the litigation. And it is a posture the first judgment itself betrays: at paragraph [44], Maya CJ describes the EFF\u0026rsquo;s case as \u0026ldquo;a shoddy framing of its challenge\u0026rdquo; — leaving open the question why a litigant whose framing the Chief Justice herself called shoddy received, from the Chief Justice, the judgment that framing required.\nIt bears emphasising what this essay is not arguing. It is not arguing that Maya CJ was wrong to invalidate rule 129I. The third judgment shows that the rule cannot survive the structural analysis the Constitution invites: it inserts an antecedent question that the Constitution does not authorise the Assembly to answer. That is a complete reason for invalidation, and it is a politically neutral reason. It would apply whether the President were Ramaphosa or his successor, whether the Panel had reported guilt or innocence, whether the political composition of the Assembly were ANC-dominated, EFF-dominated, or coalition-fractured. The criticism advanced here is not that the rule should have stood. It is that the Chief Justice, having available to her the principled structural path that Majiedt J ultimately took, chose the political path instead. She wrote a judgment that vindicates the EFF’s framing, indicts a sitting President in everything but name, and reshapes the institutional balance between the Court and the National Assembly in favour of the side that lost the parliamentary vote. The political cost is that the office of the Chief Justice has been used, in this case, to perform a function the Constitution does not assign to it.\nThere is also a quieter point worth marking. Kollapen J’s second judgment levelled a series of textual and structural objections that went directly to the foundation of the first judgment’s framework. Maya CJ does not engage with those objections in any sustained way. A Chief Justice who is confident in her reasoning answers her colleague’s arguments. A Chief Justice whose reasoning is doing political work tends, instead, to write past the dissent and trust the institutional weight of the office to do the rest. The first judgment reads as the latter. That, too, is a political posture rather than a judicial one. A court whose senior member declines to engage with the rebuttal that has been placed in front of her by another member of the same bench has lost the discipline that distinguishes a judgment from an advocacy brief.\nThe first judgment will be read, in the years ahead, as a high-water mark of judicial intervention in the political branches’ management of impeachment. Whether one regards that as a triumph or as a warning depends on whether one believes that the Court’s role is to discipline elected bodies whose decisions one dislikes, or to police the boundaries within which those bodies are entitled to make decisions one might dislike. Maya CJ has come down firmly on the side of discipline. That is a political choice. It would be more honest if it were defended as such, rather than dressed in the borrowed clothes of constitutional inevitability. The clothes do not fit. Underneath them is a Chief Justice who used the apex court’s pen to deliver a verdict the elected body had refused to deliver, and who, in doing so, traded the institutional standing of her office for a result the EFF could not secure at the despatch box. Whatever one thinks of the President, of the Panel, or of the contents of any couch, that is not what the Constitutional Court is for.\n","permalink":"https://blog.randomdomain.co.za/posts/2026/05/politics-in-robes/","summary":"\u003cp\u003e\u003cstrong\u003eA Political Critique of Maya CJ’s Judgement in EFF v Speaker (CCT 35/24)\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe first judgment in \u003cem\u003eEconomic Freedom Fighters and Another v Speaker of the National Assembly and Others\u003c/em\u003e [2026] ZACC 17 is being read, in the public conversation, as a vindication of constitutional accountability against a recalcitrant parliamentary majority. That reading is comfortable, and it is wrong. It is comfortable because it tracks an emotionally satisfying narrative: a powerful President accused of corruption, a governing party closing ranks, and a Chief Justice who refuses to let the matter rest. It is wrong because it mistakes the form of Maya CJ’s judgment for its substance. The substance, prised free of its rhetorical packaging, is a piece of constitutional politics. The first judgment delivers, by judicial means, a censure of President Ramaphosa that the National Assembly declined to deliver politically. It is not the role of the Constitutional Court to be the appeals branch of a defeated opposition motion. Maya CJ’s judgment treats the Court as though it were.\u003c/p\u003e","title":"Politics in Robes"},{"content":"Eno Reyes gave a short talk recently that deserves more attention than it got. He is co-founder and CTO of Factory — a company building autonomous software engineering agents, founded in 2023, whose main product is a coding agent called Droid. His argument is deceptively simple: the frontier of what AI agents can do is not a function of model capability. It is a function of how verifiable your environment is.\nThe argument opens with Karpathy on software 2.0 and a point Eno attributes to Jason about the asymmetry of verification. The intuition is essentially P vs NP made practical: there are many tasks that are far easier to check than to produce. The interesting cases — the ones that yield to agents — have five properties: they have an objective truth, they are quick to validate, they scale (you can check many in parallel), they are low-noise, and they produce continuous signals rather than a binary pass/fail.\nSoftware development scores highly on all five. It is why software development agents are currently the most advanced agents in the world. The 20-to-30-year investment in automated testing — unit tests, end-to-end tests, linters, type systems, QA pipelines — has built exactly the verification infrastructure that agents need to self-correct.\nThe problem is that most codebases have not built it well enough.\nYour company probably runs at 50–60% test coverage. Someone secretly hates the flaky build that fails every third run, but no one says anything. The linter exists, but it is not opinionated enough to catch AI-generated mediocrity — it catches style, not quality. These gaps are tolerable when humans fill them with judgment. They become critical failures when you introduce agents, because agents cannot substitute judgment for missing signals. They will produce code that passes every check you have, while violating every implicit standard you have not encoded.\nThe reframing of the development loop is the useful part of the talk. Traditional development: understand → design → code → test. Agent-assisted development with good validation: specify constraints → generate → verify (automated and human) → iterate. The shift is from writing software to curating the environment in which software is written. The engineer\u0026rsquo;s job becomes encoding opinions — which patterns are acceptable, which invariants must hold, which tests would catch specifically AI-generated slop. Eno calls this \u0026ldquo;specification-driven development\u0026rdquo; and notes that most of the better coding tools have started building around it: plan mode, spec mode, AGENTS.md files.\nHis point about junior versus senior developers is sharp. If your senior engineers use agents successfully and your juniors do not, the instinct is to blame skill or prompting technique. The real answer is usually that junior engineers do not know which niche practices your codebase requires, and those practices are not encoded anywhere an agent can find them. Fix the validation, and the gap closes. That is a meaningfully different diagnosis than \u0026ldquo;junior engineers are bad at prompting.\u0026rdquo;\nThe Google/Meta analogy grounds this nicely. A new hire with zero context can safely round YouTube\u0026rsquo;s border radius and be confident it will not take down a billion-user product. That confidence does not come from the hire\u0026rsquo;s competence. It comes from the validation infrastructure that must be satisfied before the change ships. The claim is that we can now build that infrastructure at smaller scale — and that coding agents can help identify where the gaps are. You can ask a coding agent to find where your linters are under-opinionated. You can ask it to generate tests.\nOne quote from Factory engineer Alvin that should survive the talk: \u0026ldquo;A slop test is better than no test.\u0026rdquo; Controversial. Also correct. A bad test that passes when your code is correct and fails when it is wrong teaches agents to write more tests. The pattern propagates. Other agents notice it, follow it, and the environment becomes more opinionated over time.\nEno is explicit that none of this is Factory-specific. The checklist — linters, tests, OpenAPI docs, type systems, AGENTS.md files — applies to any coding agent you are currently using. Spending 45 days comparing tools to find one that scores 10% better on SWE-bench is not the highest-leverage move. Investing in the validation infrastructure that makes every coding tool work better — that is where the 5–7x return comes from. Not 1.5x. Not 2x.\n\u0026ldquo;The limiter is not the capability of the coding agent. The limit is your organization\u0026rsquo;s validation criteria.\u0026rdquo;\nWorth writing somewhere permanent.\n","permalink":"https://blog.randomdomain.co.za/posts/2025/12/agents-need-verification/","summary":"\u003cp\u003eEno Reyes gave a short talk recently that deserves more attention than it got.\nHe is co-founder and CTO of Factory — a company building autonomous software engineering agents, founded in 2023, whose main product is a coding agent called Droid.\nHis argument is deceptively simple: \u003cstrong\u003ethe frontier of what AI agents can do is not a function of model capability. It is a function of how verifiable your environment is.\u003c/strong\u003e\u003c/p\u003e","title":"The limit is your validation criteria, not the agent"},{"content":" AI is blowing up the software stack. Not only by creating more code, but by making all code cheaper. The act of writing software is rapidly commoditising. What used to take months now takes minutes. The bottleneck is no longer development…it’s validation and verification.\nAs AI pushes out more code, faster than humans can reason about, the real leverage shifts to testing, validation, reliability, and security. The winners won’t be the ones who ship the most code, but the ones who can trust what they ship.\n— Lenny Pruss\nLenny Pruss is an early-stage investor at Amplify Partners, focused on developer tools and infrastructure — Datadog, CockroachDB, Chainguard, that kind of portfolio.\n","permalink":"https://blog.randomdomain.co.za/posts/2025/11/ai-blows-up-stack/","summary":"\u003cblockquote\u003e\n\u003cp\u003eAI is blowing up the software stack. Not only by creating more code, but by making all code cheaper. The act of writing software is rapidly commoditising. What used to take months now takes minutes. The bottleneck is no longer development…it’s validation and verification.\u003c/p\u003e\n\u003cp\u003eAs AI pushes out more code, faster than humans can reason about, the real leverage shifts to testing, validation, reliability, and security. The winners won’t be the ones who ship the most code, but the ones who can trust what they ship.\u003c/p\u003e","title":"AI is blowing up the software stack"},{"content":"Test-Driven AI Development: A New Contract Between Human and Machine A note for readers who don\u0026rsquo;t know me: this was first written for an internal work blog, where my colleagues could read it in the spirit it was intended. I\u0026rsquo;m publishing it here unchanged.\nThe manifesto form is deliberate. It is overstated on purpose — pitched to drag the Overton window on what \u0026ldquo;coding with AI\u0026rdquo; should mean, not offered as a finished doctrine. I still stand by every claim; I just want to flag that the volume is turned up for rhetorical effect.\nRead it as a provocation, not a sermon.\nI. The Problem We have been handed a new kind of colleague: tireless, fast, and utterly untrustworthy.\nLarge Language Models can write code in seconds that would take humans hours. They can refactor entire codebases, implement complex algorithms, and generate thousands of lines of working software. But they hallucinate. They drift. They misunderstand. They forget context. They are, in the words of our peers, \u0026ldquo;eager puppies\u0026rdquo; or \u0026ldquo;goldfish with PhDs.\u0026rdquo;\nThe response has been to treat them like junior developers: prompt carefully, review meticulously, hope for the best. We write specifications in English, that most ambiguous of languages, and wonder why the AI misunderstands. We try to teach them \u0026ldquo;clean code\u0026rdquo; and \u0026ldquo;best practices\u0026rdquo; - concepts we invented to help humans maintain codebases.\nThis is backwards.\nII. The Core Insight The AI can do anything you want, but you cannot trust it. And if you do, you must verify.\nThe only verification that matters is: does it do what we specified?\nNot \u0026ldquo;is the code elegant?\u0026rdquo; Not \u0026ldquo;does it follow SOLID principles?\u0026rdquo; Not \u0026ldquo;would this pass code review?\u0026rdquo;\nDoes it pass the tests?\nTests are not documentation. Tests are not an afterthought. Tests are not \u0026ldquo;coverage metrics.\u0026rdquo;\nTests are the specification.\nIII. The Principles 1. Tests are the only valid form of specification If you cannot encode a requirement as a test, you do not have a requirement - you have a vibe.\nNot: \u0026ldquo;The system should be fast\u0026rdquo;\nBut: assert response_time \u0026lt; 0.1 # 90th percentile \u0026lt; 100ms\nNot: \u0026ldquo;The code should be maintainable\u0026rdquo;\nBut: assert cyclomatic_complexity \u0026lt; 10\nNot: \u0026ldquo;The UI should feel responsive\u0026rdquo;\nBut: assert frame_time_p95 \u0026lt; 16.67 # 60fps\nIf you cannot measure it, you cannot build it. If you cannot test it, you do not know if you have it.\n2. Implementation is disposable, contracts are permanent The code can be rewritten in any paradigm, any style, any architecture. The tests remain.\nYou can demand the AI refactor from object-oriented to functional, from synchronous to async, from monolith to microservices. As long as the tests pass, the system is correct.\n3. The test suite is the interface, the codebase is a black box You do not care what is inside the box. You care that when you invoke the interface, you get the specified behavior.\nThe AI can implement sort() as quicksort, mergesort, or a neural network. You don\u0026rsquo;t care. You specified the contract:\n@given(lists(integers())) def test_sort_is_sorted(input_list): result = sort(input_list) assert result == sorted(input_list) # Correct output assert len(result) == len(input_list) # No elements lost assert is_stable(input_list, result) # Stability property 4. Narrow the scope, constrain the solution space Every test is a constraint. The more tests you write, the smaller the space of valid implementations.\nThis prevents the AI from \u0026ldquo;being helpful\u0026rdquo; in ways you didn\u0026rsquo;t ask for. It cannot add features, cannot make assumptions, cannot wander off task.\nThe tests are guard rails.\n5. Types and interfaces are part of the specification You specify whether something is a class or a function. You specify whether data is mutable or immutable. You specify the type signatures.\ndef test_number_type_immutability(): a = Number(4) b = a.add(3) assert a.value == 4 # Original unchanged assert b.value == 7 # New value returned This IS the specification. The AI now knows: Number is a class, add() is a method, and the type is immutable.\nIV. What We Reject We reject \u0026ldquo;clean code\u0026rdquo; as a primary virtue. Clean code was invented to help humans read and maintain software. If the AI maintains the code, and humans only maintain the tests, then clean code is optimization for the wrong metric.\nWe reject DRY as sacred. Don\u0026rsquo;t Repeat Yourself was a human ergonomic concern. The AI is happy to update code in five places. If the tests pass, repetition is irrelevant.\nWe reject architectural purity for its own sake. SOLID, design patterns, layered architectures - these were cognitive tools for humans managing complexity. If the complexity is managed by tests and implemented by AI, the architecture is an implementation detail.\nWe reject code review as the primary quality gate. Code review catches what humans miss. Tests catch what anyone misses. A code review happens once. Tests run forever.\nWe reject natural language specifications. English is ambiguous. \u0026ldquo;Fast\u0026rdquo; means nothing. \u0026ldquo;User-friendly\u0026rdquo; means nothing. \u0026ldquo;Robust\u0026rdquo; means nothing.\nTests are unambiguous.\nV. What We Embrace We embrace tests as the engineering discipline. Writing good tests is now the craft. Knowing what to test, how to test it, what properties matter - this is where the rigor lives.\nWe embrace comprehensive contracts. Unit tests for correctness Property tests for invariants Performance tests for speed Fuzz tests for security Integration tests for composition Regression tests for stability We embrace measurable requirements. Every requirement must be operationalized. Every quality must be quantified. Every expectation must be encoded.\nWe embrace fearless refactoring. The AI can rewrite the entire codebase overnight. If the tests pass, you ship it.\nWe embrace the unknown implementation. You don\u0026rsquo;t need to understand how the code works. You need to understand what it does. The tests tell you what it does.\nVI. The Practices 1. Write the tests first Before the AI writes a single line of implementation, write the comprehensive test suite:\nWhat are the happy paths? What are the edge cases? What are the error conditions? What are the performance requirements? What are the security properties? 2. Make tests self-documenting def test_authentication_rate_limiting(): \u0026#34;\u0026#34;\u0026#34; Security requirement: After 5 failed login attempts, the account must be locked for 15 minutes to prevent brute force attacks. \u0026#34;\u0026#34;\u0026#34; for _ in range(5): assert login(\u0026#34;user\u0026#34;, \u0026#34;wrong\u0026#34;) == False assert login(\u0026#34;user\u0026#34;, \u0026#34;correct\u0026#34;) == \u0026#34;rate_limited\u0026#34; time.sleep(900) # 15 minutes assert login(\u0026#34;user\u0026#34;, \u0026#34;correct\u0026#34;) == True The AI reads this and knows exactly what to implement.\n3. Use property-based testing Don\u0026rsquo;t just test examples, test properties:\n@given(integers(), integers()) def test_addition_commutative(a, b): assert add(a, b) == add(b, a) @given(integers(), integers(), integers()) def test_addition_associative(a, b, c): assert add(add(a, b), c) == add(a, add(b, c)) Let the test framework generate thousands of cases.\n4. Test performance as a first-class property def test_query_scales_logarithmically(): for size in [100, 1000, 10000, 100000]: data = generate_dataset(size) elapsed = time_query(data) # Allow O(log n) growth + constant overhead assert elapsed \u0026lt; 0.001 * math.log2(size) + 0.01 5. Test security with fuzzing def test_sql_injection_resistance(): malicious_payloads = [ \u0026#34;\u0026#39;; DROP TABLE users--\u0026#34;, \u0026#34;1\u0026#39; OR \u0026#39;1\u0026#39;=\u0026#39;1\u0026#34;, # ... hundreds more ] for payload in malicious_payloads: result = query_user(payload) assert not database_was_modified() assert not sensitive_data_leaked(result) 6. Demand refactors freely \u0026ldquo;AI, rewrite this using async/await\u0026rdquo;\n\u0026ldquo;AI, convert this to use a state machine\u0026rdquo;\n\u0026ldquo;AI, optimize this for memory instead of speed\u0026rdquo;\nRun the tests. If they pass, you\u0026rsquo;re done.\nVII. The Skills That Matter Now The human\u0026rsquo;s job is no longer writing implementations. It is:\n1. Understanding the problem domain What are the real requirements? What are the edge cases? What properties must hold?\n2. Designing the contract What is the interface? What are the types? What are the invariants?\n3. Writing comprehensive tests This is the craft. This is where experience matters.\n4. Maintaining the specification When requirements change, update the tests. The AI will update the implementation.\n5. Eating the sin This is why all the other skills matter.\nThe machine has no skin in the game. It cannot be held accountable when something goes catastrophically wrong.\nYou must eat the sin.\nWhen you type git commit and git push, you are saying: \u0026ldquo;I accept responsibility for what this does.\u0026rdquo;\nNot \u0026ldquo;the AI did it.\u0026rdquo; Not \u0026ldquo;the tests passed.\u0026rdquo; Not \u0026ldquo;the system validated it.\u0026rdquo;\nYou did it. You shipped it. You own it.\nYou judge what tests cannot capture:\nIs this the right problem to solve? Will this harm users in ways we didn\u0026rsquo;t anticipate? Are there ethical implications beyond correctness? You audit the specification:\nWhat haven\u0026rsquo;t we tested? What assumptions are baked into these tests? What could go wrong that we didn\u0026rsquo;t anticipate? You accept that failures are inevitable:\nNo test suite is complete No specification is perfect Production will find edge cases you never imagined When it breaks, you don\u0026rsquo;t say \u0026ldquo;the AI was wrong\u0026rdquo; or \u0026ldquo;the tests were insufficient.\u0026rdquo;\nYou say: \u0026ldquo;I shipped it. I own the failure. I will fix it.\u0026rdquo;\nTDAID does not eliminate responsibility. It focuses it.\nYou cannot hide behind \u0026ldquo;I was just following best practices\u0026rdquo; or \u0026ldquo;the code looked good in review.\u0026rdquo;\nYou specified it. You tested it. You shipped it. You own it.\nThe machine has no conscience. The tests have no judgment. The system has no mercy.\nOnly the human can eat the sin.\nEmbrace the Triangle of Trust:\nThe machine implements The tests verify The human owns Remove any leg and the system falls.\nYou cannot outsource responsibility to the machine. You can only outsource the implementation.\nVIII. The Future In this future:\nCodebases are ephemeral, test suites are permanent Developers write tests, AI writes implementations Code review focuses on test quality, not implementation quality \u0026ldquo;Technical debt\u0026rdquo; means poorly tested code, not \u0026ldquo;messy\u0026rdquo; code Refactoring is instant and fearless Languages and frameworks become implementation details The test suite becomes the codebase. Everything else is just a proof object.\nIX. The Call to Action Start today: On your next feature, write comprehensive tests first Let the AI implement it Verify only that tests pass, not that the code is \u0026ldquo;good\u0026rdquo; Refactor ruthlessly Observe how much faster you move when implementation quality doesn\u0026rsquo;t matter Challenge yourself: Can you specify your entire system as a test suite? Can you operationalize every requirement? Can you trust the machine if you verify comprehensively? Teach others: Tests are specifications, not validation If you can\u0026rsquo;t test it, you can\u0026rsquo;t build it Implementation quality is an AI problem, contract quality is a human problem X. The Motto \u0026ldquo;I don\u0026rsquo;t care what the code is. I only care that it does what I say it does.\u0026rdquo;\nTest-Driven AI Development is not a methodology. It is a recognition that the nature of software development has fundamentally changed. The machine implements. The human specifies. The tests are the contract. Write tests. Trust nothing. Verify everything.\nTDAID: Because the only code you can trust is code you\u0026rsquo;ve tested.\nWrite tests. Trust nothing. Verify everything. Sign your name.\n","permalink":"https://blog.randomdomain.co.za/posts/2025/11/tdaid-manifesto/","summary":"\u003ch2 id=\"test-driven-ai-development-a-new-contract-between-human-and-machine\"\u003eTest-Driven AI Development: A New Contract Between Human and Machine\u003c/h2\u003e\n\u003cp\u003e\u003cem\u003eA note for readers who don\u0026rsquo;t know me: this was first written for an internal work blog, where my colleagues could read it in the spirit it was intended. I\u0026rsquo;m publishing it here unchanged.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eThe manifesto form is deliberate. It is overstated on purpose — pitched to drag the Overton window on what \u0026ldquo;coding with AI\u0026rdquo; should mean, not offered as a finished doctrine. I still stand by every claim; I just want to flag that the volume is turned up for rhetorical effect.\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eRead it as a provocation, not a sermon.\u003c/em\u003e\u003c/p\u003e","title":"The TDAID Manifesto"},{"content":"I recently worked on a project where I wrote the unit tests first, then had Claude Code generate the implementation that passed those tests. The experience crystallized something I\u0026rsquo;ve been thinking about: natural language is fundamentally too ambiguous to be an effective specification language for software, no matter how smart our LLMs become.\nThis isn\u0026rsquo;t a hot take about current AI capabilities. This is a statement about linguistics.\nThe Fundamental Problem Human language is inherently ambiguous. This is not a limitation of GPT-4, GPT-5, or whatever GPT-24 eventually arrives. This is a fundamental property of natural language itself. No amount of model sophistication can fix the ambiguity baked into how humans communicate.\nConsider: \u0026ldquo;I need a recipe class where the id is case insensitive.\u0026rdquo;\nYou prompt an LLM with this. It goes off and codes. How do you know it actually did what you wanted? How do you know where it put the class? Did it use id or ID or recipe_id? Did it make equality case-insensitive, or just comparison? What about hashing?\nYou don\u0026rsquo;t know. You have to review it. And reviewing code is expensive cognitive labor, because you\u0026rsquo;re now verifying that the LLM correctly interpreted your inherently ambiguous natural language specification.\nWhy Formalism Won There\u0026rsquo;s a reason science made its greatest leaps forward over the past few centuries. Chemistry embraced $H_2O$. Physics embraced $E = \\frac{mc^2}{\\sqrt{1-v^2/c^2}}$. Mathematics embraced symbols and formal notation.\nWhy? Because natural language wasn\u0026rsquo;t precise enough to move these fields forward.\nWhen you write $H_2O$, there is no ambiguity. Two hydrogen atoms, one oxygen atom, specific bonding structure. When you write that relativistic energy equation, there is no ambiguity about the relationship between energy, mass, velocity, and the speed of light-or that energy diverges to infinity as $v \\to c$, which is why you can\u0026rsquo;t actually reach the speed of light.\nActually, speaking of limits: consider how much hand-waving is required to explain the conceptual understanding of a limit in natural language. \u0026ldquo;As x gets closer and closer to a, the function approaches L.\u0026rdquo; What does \u0026ldquo;closer and closer\u0026rdquo; mean? How close is close enough?\nContrast that with the formal epsilon-delta definition: for every $\\epsilon \u0026gt; 0$, there exists a $\\delta \u0026gt; 0$ such that if $0 \u0026lt; |x-a| \u0026lt; \\delta$, then $|f(x)-L| \u0026lt; \\epsilon$.\nIt\u0026rsquo;s the same concept. One is ambiguous, hand-wavy, and imprecise. The other is formal, unambiguous, and executable. Natural language would require paragraphs to express what these symbols capture precisely. And those paragraphs would still leave room for misinterpretation.\nProgramming languages exist for the same reason. They are formal systems that eliminate ambiguity. This is not a bug-it\u0026rsquo;s the entire point.\nTests As Formal Specification Here\u0026rsquo;s what TDD advocates have been saying for years, but which becomes crucial in the age of AI: unit tests are not primarily about testing code. They are formal specifications for how that code must behave.\nConsider this test:\nfrom Recipe import Recipe def test_recipe_equality_case_insensitive(): assert Recipe(id=\u0026#34;MyRecipe\u0026#34;) == Recipe(id=\u0026#34;myrecipe\u0026#34;) Look at what I\u0026rsquo;ve specified in two lines:\nThere must exist a class called Recipe It must be in a module named Recipe available for import It takes an id parameter in its constructor Equality comparison must be case-insensitive (because \u0026ldquo;MyRecipe\u0026rdquo; == \u0026ldquo;myrecipe\u0026rdquo;) Four precise, unambiguous specifications encoded in two lines of code. Try writing the equivalent in natural language without introducing ambiguity. You can\u0026rsquo;t. Or rather, you can try, but you\u0026rsquo;ll write three paragraphs and still leave gaps.\nThis is the power of formal specification: you leverage the precision of a programming language to eliminate the ambiguity of natural language. You\u0026rsquo;re not escaping formalism-you\u0026rsquo;re embracing it, but only for the specification, not the implementation.\nThe Inversion Before AI, TDD was a hard sell because it required double labor. You had to write the test and write the implementation. The implementation was usually the hard part, requiring careful design decisions, performance optimization, error handling, and so on.\nAI inverts this equation.\nThere\u0026rsquo;s perhaps a deeper reason we\u0026rsquo;ve been avoiding tests: they force us to be precise, and that precision is uncomfortable. Writing tests means confronting ambiguity in our own thinking. Natural language lets us handwave past details; formal specifications don\u0026rsquo;t. This discomfort-this cognitive load of thinking clearly about what we actually want-is exactly what made tests feel burdensome.\nIn the age of AI, that discomfort is now the most valuable part of the development process. The implementation (the formerly hard part) is now handled by AI. The cognitive load has shifted entirely to specification. What used to be the \u0026ldquo;extra\u0026rdquo; work is now the only work that matters.\nImplementation is now cheap. It\u0026rsquo;s scalable. It can be outsourced to an LLM that can generate multiple implementations in seconds. Want to try three different approaches? Fine, generate three implementations and benchmark them.\nWhat\u0026rsquo;s no longer cheap is knowing what you actually want. And the only way to specify that unambiguously is through formal specification-which in practice means tests.\nSome developers are using AI to write tests from implementation after the fact. Some are using AI to write both. I\u0026rsquo;m saying these approaches are backwards. You should write the tests-perhaps with AI assistance-and not care about the implementation. Focus on what matters: that the software does what you want it to do.\nWhy This Matters When you write a test, the space for ambiguity radically shrinks. The test is the specification, written in a language both humans and machines can execute and verify.\nWhen you write natural language prompts, even incredibly detailed ones, you\u0026rsquo;re asking the LLM to interpret ambiguous human language and translate it into precise code. You then have to review that code to ensure the translation was correct. This is backwards.\nWrite the test. Let the LLM generate code that passes the test. Now you know-not believe, not hope, but know-that the implementation satisfies your specification. Because your specification was formal, not ambiguous.\nThe Actionable Takeaway Developers already know TDD. It\u0026rsquo;s not a new practice. What\u0026rsquo;s new is that AI has removed the main objection: the \u0026ldquo;double labor\u0026rdquo; of writing both tests and implementation.\nFocus on your tests. Treat them as the specification. Make them precise, comprehensive, and unambiguous. Then let AI handle the implementation. Verify it passes your tests. Done.\nThis isn\u0026rsquo;t about trusting AI or not trusting AI. It\u0026rsquo;s about the fundamental reality that natural language is ambiguous and formal languages are not. We\u0026rsquo;ve known this for 400 years across every scientific discipline. It\u0026rsquo;s time to apply that lesson to AI-assisted software development.\nThe future of software development in the age of AI isn\u0026rsquo;t about writing better prompts. It\u0026rsquo;s about writing better tests.\n","permalink":"https://blog.randomdomain.co.za/posts/2025/10/tdd-for-ai/","summary":"\u003cp\u003eI recently worked on a project where I wrote the unit tests first, then had Claude Code generate the implementation that passed those tests. The experience crystallized something I\u0026rsquo;ve been thinking about: \u003cstrong\u003enatural language is fundamentally too ambiguous to be an effective specification language for software, no matter how smart our LLMs become.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis isn\u0026rsquo;t a hot take about current AI capabilities. This is a statement about linguistics.\u003c/p\u003e","title":"Test-Driven Development in the Age of AI: Why Natural Language Can't Replace Formal Specifications"},{"content":"Every time I have read a new post offering advice on how to work with an LLM, be it with prompts or context, I just couldn\u0026rsquo;t shake the feeling that there was some unifying \u0026ldquo;theory of language\u0026rdquo; that explained what made a prompt good or bad.\nI had initially explored this by wanting to describe the ideal prompt as a mixture of formalism and \u0026ldquo;maneuverability\u0026rdquo;. However, when discussing my ideas with colleagues, it was politely pointed out that I was simply imposing a new system of superstition to \u0026ldquo;formalize\u0026rdquo; an existing list of superstitions.\nBut is there such a \u0026ldquo;theory of prompting\u0026rdquo;? What would such a theory be? If we build a large model of language, should it be safe to assume that we have built something that models language?\nTurns out that, if we move our focus from Large Language Model to rather be Large Language Model, then the field of Linguistics is exactly what we want. There is no need for us to re-invent theories and notations - the research has been done for us!\nWhat follows are examples of good and bad prompts, and my attempt to explain why they are \u0026ldquo;good\u0026rdquo; or \u0026ldquo;bad\u0026rdquo; - not with my own vibism, but by relying on existing linguistic theories.\nWorking Memory Theory Working Memory Theory is a crucial concept in linguistics, particularly in understanding how we process and learn language. Working memory acts as a mental workspace where we hold and process incoming linguistic information, like words and sentences, while integrating them with our existing knowledge. (Sound familiar?)\nBut working memory has limited capacity, meaning it can only hold a certain amount of information at once - generally around 7 items. Just as human working memory can be overwhelmed, LLM working memory can also be overwhelmed!\nWhen you overwhelm an LLM with a kitchen-sink prompt, you\u0026rsquo;re creating the computational equivalent of \u0026ldquo;cognitive overload\u0026rdquo;. Just as with humans, breaking down complex information into \u0026ldquo;semantically coherent segments\u0026rdquo; significantly improves processing accuracy of an LLM.\nBefore (cognitive overload):\nCreate a complete web application with user authentication, database integration, real-time chat, file upload functionality, admin dashboard, and responsive design using React, Node.js, Express, MongoDB, and Socket.io with proper error handling, security measures, and performance optimization. After (chunked for cognition):\nLet\u0026#39;s build a web application step by step: 1\\. First, create a basic React frontend with user registration/login forms 2\\. Then, set up a Node.js/Express backend with MongoDB for user management 3\\. Next, implement secure authentication with JWT tokens 4\\. Finally, add real-time chat using Socket.io Focus on step 1 first - create the user registration component. We already intuitively know, have experienced, and feel that the chunked version is better. But now we have a theory of why: working memory.\nLinguistic Anchoring The \u0026ldquo;anchoring effect\u0026rdquo; describes the tendency for individuals to rely heavily on the first piece of information they receive (the \u0026ldquo;anchor\u0026rdquo;) when making decisions, even if that information is irrelevant or misleading.\n\u0026ldquo;Selective Prompt Anchoring\u0026rdquo; is the application of anchoring to prompting, where we set specific tokens to be the \u0026ldquo;anchored text\u0026rdquo;. We are attempting to amplify attention towards these tokens, so as to better control the model\u0026rsquo;s focus.\nBefore (attention drift):\nWrite a function to sort a list. After (linguistically anchored):\nTASK: Sort a list of integers efficiently FOCUS: Choose optimal algorithm for large datasets CONSTRAINTS: Handle edge cases (empty lists, duplicates) DELIVERABLE: Python function with time complexity analysis def sort_large_list(nums: List[int]) -\u0026gt; List[int]: \u0026#34;\u0026#34;\u0026#34;Efficiently sort a large list of integers.\u0026#34;\u0026#34;\u0026#34; # Your implementation focusing on the TASK above Again, we have seen the second prompt perform better. But, instead of us offering advice based on heuristics, we can lean on the existing anchoring effect literature to explain why it works.\nInformation Density Information Density is a measure of how much information is packed into a linguistic unit. Information density is not an inherent property of a language, but rather context-dependent. For example, a word might be highly predictable in one sentence and less predictable in another.\nSpeakers and writers make choices about how to most efficiently encode and communicate their messages, and information density plays a role in these choices.\nThe best example of this theory in action is writing prompts that are clear and concise.\nBefore (low density, high noise):\nPlease help me write some code that can handle files and do some processing on them. I need it to work with different types of files and be able to process them efficiently. Can you make something that\u0026#39;s robust and handles errors well? After (optimized information density):\nCreate a Python file processor class that: - Accepts .txt, .csv, .json file types - Reads content with encoding detection - Applies transformation function (passed as parameter) - Writes to output directory with \u0026#39;_processed\u0026#39; suffix - Handles FileNotFoundError, PermissionError, UnicodeDecodeError - Logs progress for files \u0026gt; 1MB In the the optimized version, we\u0026rsquo;ve removed vague terms, removed redundancy, increased specificity, provided measurable success criteria, and made a specific choice about message encoding.\nInformation theory thus provides the framework for what we intuitively know already: precision beats verbosity.\nBut if you can\u0026rsquo;t be precise, be verbose, right? Well, this works because \u0026ldquo;pragmatics\u0026rdquo; and \u0026ldquo;discourse theory\u0026rdquo; suggest that redundancy and multiple attempts at explanation can help listeners (and LLMs!) triangulate meaning.\nEmbodied Cognition Cognitive linguistics holds that humans understand abstract concepts through physical metaphors. It turns out that good LLM prompts sometimes also use this physical metaphor approach.\nInstead of treating code as abstract logic, a good code generation prompt should leverage \u0026ldquo;embodied cognition\u0026rdquo; by placing programming concepts in a physical experience.\nBefore (abstract):\nImplement caching functionality After (embodied):\nCreate a memory system that works like a librarian\u0026#39;s quick-access shelf—frequently requested books stay within arm\u0026#39;s reach while rarely used volumes move to distant archives. Build this caching layer where hot data stays close and cold data migrates to deeper storage. The \u0026ldquo;language\u0026rdquo; of embodied cognition happens through \u0026ldquo;image schemas\u0026rdquo;. These are the conceptual frameworks for capturing the different ways of describing our \u0026ldquo;physical\u0026rdquo; actions. For example:\nCONTAINER schema: \u0026ldquo;Put validation logic inside a protective wrapper\u0026rdquo; PATH schema: \u0026ldquo;Guide data through transformation pipelines\u0026rdquo; BALANCE schema: \u0026ldquo;Maintain equilibrium between performance and memory\u0026rdquo; Placing the abstract in the concrete through physical metaphor is not just how we speak to each other, but also describes what makes a prompt \u0026ldquo;good\u0026rdquo;.\nGiven that an LLM is trained on human text, should we actually be surprised that even without a physical experience, a good prompt to the LLM mirrors our physical metaphor?\nRegister and Politeness Do you say \u0026ldquo;please\u0026rdquo; when asking an LLM to do something? The research says you should. In fact, the \u0026ldquo;register\u0026rdquo; you use for speaking (and prompting) affects the output quality. Think about the register you use when speaking to a colleague or client.\nThe optimal register often varies by task type and technical domain. For code generation, a professional technical register consistently outperforms both casual and overly formal approaches.\nBefore (inappropriate register):\nhey can u plz write me some python code that does stuff with lists thx After (professional technical register):\nGenerate a Python function that implements efficient list manipulation operations, including sorting, filtering, and transformation methods. Include docstrings and type hints following PEP 8 conventions. This is something that sociolinguistic research has already demonstrated. Just like us, LLMs respond to register shifts. When you adopt a senior developer\u0026rsquo;s register, you activate patterns associated with expert code production.\nThis approach also manifests as a \u0026ldquo;persona prompting\u0026rdquo; - \u0026ldquo;You are an expert python developer\u0026rdquo;. This technique leverages \u0026ldquo;accommodation theory\u0026rdquo;, which is the tendency to match communication styles with perceived expertise levels.\nDiscourse Markers Discourse markers are words or phrases that help organize and connect ideas in speech and writing. They\u0026rsquo;re signposts that guide readers through a text, showing how different parts relate to each other.\nExamples of such words include \u0026ldquo;first,\u0026rdquo; \u0026ldquo;next,\u0026rdquo; \u0026ldquo;specifically,\u0026rdquo; and \u0026ldquo;moreover\u0026rdquo;. One would think that to be \u0026ldquo;concise\u0026rdquo;, we need to leave out these words. But they create cognitive scaffolding that guides both human and AI reasoning.\nBefore (unstructured):\nMake this code faster and add error handling and documentation After (discourse-structured):\nLet\u0026#39;s improve this code systematically. First, analyze performance bottlenecks using profiling data. Next, implement targeted optimizations for the critical path. Then, add comprehensive error handling for edge cases. Finally, document the optimization strategy and performance gains. The discourse markers create a \u0026ldquo;cognitive map\u0026rdquo; that prevents the LLM from conflating tasks or missing requirements. For code generation, you will probably get additional lift in result quality if you mirror the discourse markers like \u0026ldquo;first\u0026rdquo;, \u0026ldquo;then\u0026rdquo;, \u0026ldquo;finally\u0026rdquo;, that are already part of coding constructs.\nFrame Semantics Frame semantics theory shows that words only make sense within structured knowledge frames. A \u0026ldquo;frame\u0026rdquo; is like a mental schema that includes all the background knowledge and expectations associated with a particular concept.\nFor code generation, this means activating an entire \u0026ldquo;conceptual framework\u0026rdquo; rather than just a single feature:\nBefore (isolated concepts):\nAdd authentication to the API After (frame-activated):\nImplement the AUTHENTICATION frame for our API: - Authority: JWT token issuer - Credentials: username/password pairs - Validation: cryptographic verification - Session: token lifecycle management - Permissions: role-based access control - Audit: authentication event logging Build these frame components with security-first design. How many times have you had an LLM go down the wrong path? Now you know it\u0026rsquo;s because you\u0026rsquo;ve not activated the correct framing.\nConstruction Grammar Construction grammar is a set of linguistic theories that treats grammatical construction as the primary units of language, rather than focusing on words and rules as separate entities.\nThe sentence \u0026ldquo;As a senior Python developer, architect a data pipeline that handles real-time streaming\u0026rdquo;, when examined based on it\u0026rsquo;s grammar, can be thought of as a sentence with the structure \u0026ldquo;As a [EXPERT_ROLE], [ACTION_VERB] a [TARGET_OBJECT] that [CONSTRAINT]\u0026rdquo;. This \u0026ldquo;Role-Action-Object\u0026rdquo; pattern has proved to be rather effective when working with an LLM.\nOther patterns that seem to work well include:\nConditional-Temporal Pattern\nWhen [CONDITION] occurs, then [ACTION], ensuring [OUTCOME] Example: \u0026#34;When user input arrives, validate and sanitize it, ensuring no code injection\u0026#34; Analogical Pattern\n[TASK] is like [FAMILIAR_DOMAIN] where [MAPPING] Example: \u0026#34;Database normalization is like organizing a library where books are grouped by topic without duplication\u0026#34; When thinking about what makes a good prompt statement, thinking of it in terms of constructive grammar, and then testing out variations with the template, can help us to produce more rigorous prompting advice.\nFormal Specification The ideal description of a task would be via a \u0026ldquo;formal language\u0026rdquo;, where you have a defined set of strings constructed from a finite alphabet according to precise rules. But in a formal language, you end up with tasks described as\n∀x (HasItemsInCart(x) → ◊(ProceedsToCheckout(x) → ◇(SeesPaymentOptions(x) ∧ CorrectTotal(x)))) What we can do instead is approximate a (strict) formal language with \u0026ldquo;temporal logic\u0026rdquo; using something like Gherkin, implementing what is often called \u0026ldquo;Behaviour Driven Development (BDD)\u0026rdquo;:\nGiven a user has items in their shopping cart When they proceed to checkout Then they should see the payment options And the cart total should be calculated correctly Here, \u0026ldquo;Given\u0026rdquo; is the initial state conditions (\u0026ldquo;existential quantification\u0026rdquo;), \u0026ldquo;When\u0026rdquo; is the state transition actions (\u0026ldquo;temporal operators\u0026rdquo;), and \u0026ldquo;Then\u0026rdquo; is the final state assertions (\u0026ldquo;model logic\u0026rdquo;).\nAnother framing is the say this is that Gherkin allows us to maintain natural language comprehension while implementing \u0026ldquo;model-theoretic validation\u0026rdquo;:\nScenario: Valid login Given a user with email \u0026#34;test@example.com\u0026#34; and password \u0026#34;secure123\u0026#34; When they attempt to login Then they should be redirected to the dashboard Scenario: Invalid password Given a user with email \u0026#34;test@example.com\u0026#34; and password \u0026#34;wrong\u0026#34; When they attempt to login Then they should see \u0026#34;Invalid credentials\u0026#34; error Here, each scenario provides a \u0026ldquo;concrete model\u0026rdquo; that satisfies (or violates) the abstract specification. This is precisely how model theory works. We have abstract logical statements becoming concrete interpretations, through specific models.\nLinguistic theory explains why BDD (via Gherkin and other \u0026ldquo;formal\u0026rdquo; specifications) work so well with LLMs. They provide ’compositional structure’ (in that each scenario decomposes cleanly into semantic components),’speech act clarity’ (because there is an explicit performative structure with \u0026ldquo;Given/When/Then\u0026rdquo;), ’discourse coherence’ (since there is a temporal sequence that has a clear causal relationship), ‘frame activation’ (in that they use a domain-specific vocabulary that activates relevant knowledge frames), and ’model theoretic validation’ (because multiple examples constrain the space for interpretation).\nBDD is almost the perfect amalgamation of \u0026ldquo;LLM linguistics for code generation\u0026rdquo;.\nConclusion While I have not covered every piece of prompting advice or linguistic theory, I think I have done enough to demonstrate that we do not have to invent new ways of thinking for working with (a model of) language.\nIf we continue to improve our language models, thereby producing better models of language, I\u0026rsquo;m willing to bet that linguistic theories will become even more relevant.\nWhile I am still trying to better conceptualize linguistic theory as \u0026ldquo;prompting advice\u0026rdquo;, I can, for now, offer some points for better prompting - and yes, I used an LLM to help me develop these:\nUse Conceptual blending to create novel solutions by integrating multiple domains. Instead of requesting a \u0026ldquo;caching system,\u0026rdquo; prompt for a \u0026ldquo;library-memory hybrid where frequent books migrate to the reference desk.\u0026rdquo; This activates richer conceptual frameworks than technical specifications alone.\nUse relevance theory to optimize context selection. Every piece of context should enable new inferences — if removing information doesn\u0026rsquo;t change potential outputs, it\u0026rsquo;s noise.\nUse code-switching strategies to leverage the boundary between natural and formal languages. Strategic mixing (using natural language for logic and code syntax for structure) outperforms pure natural language or pure code examples. x Use specification languages that align with both formal semantics and natural language structure. The most successful approaches will be those that treat code generation as a translation task from linguistically well-formed specifications to executable implementations.\n","permalink":"https://blog.randomdomain.co.za/posts/2025/08/reinventing-linguistics/","summary":"\u003cp\u003eEvery time I have read a new post offering advice on how to work with an LLM, be it with prompts or context, I just couldn\u0026rsquo;t shake the feeling that there was some unifying \u0026ldquo;theory of language\u0026rdquo; that explained what made a prompt good or bad.\u003c/p\u003e","title":"Are We Re-Inventing Linguistics?"},{"content":"The transformation of software development through AI presents an intriguing application of Jevons Paradox: as development becomes more efficient, it is likely that we won\u0026rsquo;t see a reduction in demand for developers, but rather an increase. What was a previously (economically) unfeasible project, now becomes viable, and so organizations can explore innovative solutions that were once too resource-intensive to pursue.\n\u0026ldquo;Cheaper software means people are going to want more of it.\u0026rdquo;\nWhile AI excels at rapid prototyping and routine coding tasks, it currently encounters significant limitations in system completion and complex architecture decisions. Dustin Ewers calls this the \u0026ldquo;70% problem\u0026rdquo;: AI gets you most of the way, but someone must handle the rest — testing, deployment, maintenance, and fixing compounding errors. This suggests a transformation of the developer\u0026rsquo;s role rather than its obsolescence — technological augmentation rather than replacement.\nThe core value of software development lies not in code production, but in the deep understanding of business processes, system design, and problem-solving — domains where human judgment remains paramount.\nEven in scenarios where AI capabilities expand dramatically, the opportunity cost dynamics suggest continued demand for human developers focusing on high-value activities while AI handles routine tasks. This is comparative advantage at work: even if AI outperforms humans broadly, computational scarcity means AI resources will be allocated to highest-value tasks, leaving meaningful work for humans.\n\u0026ldquo;The AI revolution is similar to the introduction of compilers.\u0026rdquo;\nAs we navigate this transition, the question isn\u0026rsquo;t whether developers will remain relevant, but rather how the profession will evolve to leverage AI\u0026rsquo;s capabilities while developing new areas of expertise. Ewers puts it plainly: \u0026ldquo;the best days of our industry lie ahead.\u0026rdquo;\nQuotes from Dustin Ewers — Ignore the Grifters, which is well worth reading in full.\n","permalink":"https://blog.randomdomain.co.za/posts/2025/02/ai-transforms-software/","summary":"\u003cp\u003eThe transformation of software development through AI presents an intriguing application of Jevons Paradox: as development becomes more efficient, it is likely that we won\u0026rsquo;t see a reduction in demand for developers, but rather an increase.\nWhat was a previously (economically) unfeasible project, now becomes viable, and so organizations can explore innovative solutions that were once too resource-intensive to pursue.\u003c/p\u003e\n\u003cblockquote\u003e\n\u003cp\u003e\u0026ldquo;Cheaper software means people are going to want more of it.\u0026rdquo;\u003c/p\u003e","title":"The transformation of software development through AI presents an intriguing…"},{"content":"The rise of AI coding assistants like ChatGPT and GitHub Copilot has sparked intense debate in the programming community: Are we witnessing the beginning of the end for human developers?\nThroughout history, new technologies have often been perceived as threats to existing industries. However, rather than replacing workers entirely, these innovations have typically transformed how work is done - becoming valuable tools that augment human capabilities and often create entirely new types of jobs.\nToday\u0026rsquo;s AI models show impressive capabilities in specific programming tasks - generating REST API boilerplate, writing unit tests, and translating between programming languages. Yet they face significant limitations, particularly with tasks that require deep system understanding. For instance, they struggle to design scalable microservice architectures or to make security-critical design decisions, and they often produce code with subtle bugs or security vulnerabilities when tackling novel problems that require original solutions.\nRather than its replacement, we\u0026rsquo;re witnessing a transformation of the programmer\u0026rsquo;s role. Developers are evolving into \u0026ldquo;AI-augmented developers\u0026rdquo;, who leverage AI tools for routine tasks while focusing their expertise on higher-level challenges such as system architecture, business logic, requirements gathering, and integrating multiple systems and technologies. This transformation positions human programmers as strategic problem-solvers.\nBut what happens when AI capabilities advance even further?\nIf we were to reach a level where AI systems could fully understand ambiguous human requirements, design novel architectures from scratch, and reason about tradeoffs across entire systems, they could theoretically handle most current programming tasks. However, such capabilities would likely represent AGI or near-AGI, affecting not just programming but most knowledge work. Ironically, the development of such systems would itself require extraordinarily sophisticated programming and engineering work, which would potentially create new types of programming jobs focused on AI system development, maintenance, and oversight.\nConsider how, even in an age of autopilot, pilots still exist - their role has evolved rather than disappeared. Similarly, programmers looking to thrive in this AI-augmented future should focus on developing skills that AI currently struggles with: system architecture, business domain expertise, and cross-functional collaboration. The most successful developers will be those who learn to effectively partner with AI tools while maintaining their core problem-solving and architectural thinking skills.\n","permalink":"https://blog.randomdomain.co.za/posts/2024/12/ai-coding-assistants-rise/","summary":"\u003cp\u003eThe rise of AI coding assistants like ChatGPT and GitHub Copilot has sparked intense debate in the programming community: Are we witnessing the beginning of the end for human developers?\u003c/p\u003e\n\u003cp\u003eThroughout history, new technologies have often been perceived as threats to existing industries. However, rather than replacing workers entirely, these innovations have typically transformed how work is done - becoming valuable tools that augment human capabilities and often create entirely new types of jobs.\u003c/p\u003e","title":"The rise of AI coding assistants"},{"content":"Werner Vogels published his annual predictions in late November 2023. I want to take one of them seriously — not as a cheerleader, but as a sceptic who agrees with the conclusion while finding the framing too comfortable.\nThe third prediction: AI assistants will redefine developer productivity.\nHe is right. That is the frustrating part.\nThe 2023 Stack Overflow Developer Survey found that 70 percent of respondents were already using or planning to use AI tools in their workflow. Vogels cites this as evidence of momentum. It is also evidence of a category error: \u0026ldquo;planning to use\u0026rdquo; and \u0026ldquo;using\u0026rdquo; describe very different relationships with a tool. I plan to exercise in the mornings.\nThe actual claim worth interrogating is this: \u0026ldquo;The AI assistants on the horizon will not only understand and write code, they will be tireless collaborators and teachers.\u0026rdquo;\nTireless is accurate. Tireless, and utterly untrustworthy.\nWe have been handed a new kind of colleague: one who never complains, never sleeps, and will confidently produce a plausible-looking answer whether or not it is correct. The collaboration model Vogels describes — \u0026ldquo;no task will exhaust their energy, and they\u0026rsquo;ll never grow impatient explaining a concept or redoing work\u0026rdquo; — is real. What it omits is the corollary: they will also never admit they do not know. The patience is not wisdom. It is compliance.\nVogels buries the important hedge in a single sentence: \u0026ldquo;Make no mistake, developers will still need to plan and evaluate outputs.\u0026rdquo; That sentence is doing an enormous amount of work. Planning and evaluation are not minor overhead. They are the entire professional competence that distinguishes a developer from a prompt-issuer. If the AI takes on the generation and the developer retains only the review, we have not rebalanced the workload — we have moved the cognitive burden up a level, where errors are harder to spot and more expensive to miss.\nThe role-blurring prediction is where I feel the most ambivalence. \u0026ldquo;The lines between product managers, front- and back-end engineers, DBAs, UI/UX designers, DevOps engineers, and architects will blur.\u0026rdquo; That is almost certainly true. It is also describing a world where no one is specifically responsible for anything. Lines between roles are not just division of labour. They are accountability structures. Blurring them is also blurring the answer to \u0026ldquo;whose fault was this.\u0026rdquo;\nNone of this contradicts the central prediction. AI assistants will redefine developer productivity. They already have, in my own work — writing boilerplate I would have resented, explaining a codebase I have just joined, sketching tests I can then harden. These are genuine productivity gains.\nBut Vogels is the CTO of Amazon. Amazon sells Bedrock. Amazon sells CodeWhisperer. Amazon sells the compute that runs inference at scale. Reading his predictions with that in mind does not make them wrong. It does make the optimism legible as something other than disinterested analysis.\nThe piece is not dishonest. It is conveniently incomplete.\nWhat is missing is any account of verification. The prediction describes the capability ceiling. It says very little about the floor: the minimum set of practices that make AI-assisted development trustworthy enough to stake a production system on. Tests. Specification. Review. The unglamorous infrastructure of professional software.\nAn assistant that writes unit tests is only useful if I already know what those tests should prove. An assistant that suggests the \u0026ldquo;optimal infrastructure for your projects\u0026rdquo; is only useful if I can evaluate whether the suggestion is optimal or merely plausible. Of the 70 percent using or planning to use AI tools — I would like to know how many have invested equally in the review practices that close the loop.\nVogels says the coming years will see engineering teams \u0026ldquo;more productive, develop higher quality systems, and shorten software release lifecycles.\u0026rdquo; He is probably right. The teams that will get there are the ones who treat the AI as a generator, not an oracle — and who have built the verification infrastructure to tell the difference. The rest will ship fast and confidently.\nThat, too, is a redefinition of developer productivity. Just not the one anyone is advertising.\n","permalink":"https://blog.randomdomain.co.za/posts/2023/12/vogels-ai-predictions/","summary":"\u003cp\u003eWerner Vogels published his annual predictions in late November 2023.\nI want to take one of them seriously — not as a cheerleader, but as a sceptic who agrees with the conclusion while finding the framing too comfortable.\u003c/p\u003e\n\u003cp\u003eThe third prediction: AI assistants will redefine developer productivity.\u003c/p\u003e\n\u003cp\u003eHe is right.\nThat is the frustrating part.\u003c/p\u003e","title":"On Werner Vogels' 2024 AI Predictions"}]