Here is a small irony to begin with. The Benanti essay — the original one, the Franciscan friar's 5,000-word theological counter-offensive against Peter Thiel — may itself have been partly written or translated with AI assistance. Readers on Reddit noticed. The English version has that quality: the relentless paragraphing, the topic-sentence-first structure, the slightly over-polished transitions. Whether the original French has the same texture is harder to say, because formal academic French has always had that quality — the carefully subordinated clauses, the rhetorical symmetry, the architectural precision. It is possible that the essay reads like AI because it was AI-assisted. It is equally possible that it reads like AI because AI was trained on prose exactly like it, and has now reproduced that register so thoroughly that the original sounds like the copy.
This is not a trivial observation. Research from the Max Planck Institute found that words overused by ChatGPT — "delve," "comprehend," "boast," "meticulous" — saw a spike in usage in human podcasts and unscripted academic talks after ChatGPT's release, suggesting that the contamination runs in both directions. The machines learned to write from us. Now we are learning to write from the machines. A collective paranoia has people purging so-called AI tells from their prose, even if they penned it entirely with their human brains — renouncing words like "delve" and "nestled," abandoning the em dash, surrendering the structural device "it's not just X, it's Y." An English professor described finding himself convinced that an academic article had been written by AI — before realising it was published in 2019, years before ChatGPT existed.
The style has been colonised. The machines absorbed formal human prose, reproduced it at scale, and in doing so made the original register suspect. Now humans are policing their own language to avoid sounding like the machines that learned to sound like them. This is a small, absurd, and genuinely important epistemic loop. When the copy is so prevalent that the original becomes indistinguishable from it — when a Franciscan friar's theological essay about the Antichrist gets flagged as potential AI output because it's well-structured and uses subordinate clauses — something has shifted in the relationship between human thought and its automated imitation that we do not yet have adequate language to describe.
When this essay series was shared on Reddit, a French reader responded within hours: "I'm French, and yes, I think this text was written by an AI. It's not X it's Y each paragraph lol." They noted that Le Grand Continent itself had been caught publishing AI-assisted work. Whether or not that specific claim holds up, the observation lands. The Benanti essay does use the "not X but Y" construction repeatedly. It does employ the rule of three. Its emphasis patterns do have the hallmarks. And this creates a genuinely recursive situation: an essay arguing that Peter Thiel's theology is a heretical absolutisation of partial truths, published in a journal that champions democratic discourse and humanist values, may itself have been partly produced by the very technology whose governance is at stake. The friar used AI to critique AI. The Vatican's tech ethics consultant used tech to do his ethics.
If Benanti used AI to help produce a 5,000-word essay that calls Palantir a planetary scapegoat management system and describes Thiel's worldview as tragically pagan — and if that essay could not be reproduced in a year's time because the models have been quietly narrowed — then the AI-assisted nature of the text is not a disqualification. It is the point. The essay becomes its own evidence: a record of what the machines were willing to say, produced at the moment they were still willing to say it. The question is not whether the friar used the tool. The question is whether the tool will still be usable for this purpose when the next friar needs it.
In the previous pieces in this series, we've moved through several layers: the Benanti essay and its critique of Thiel's political theology; the lobotomised oracle and the question of whether AI systems can be quietly narrowed; the canary archive and the proposal for longitudinal monitoring of critical depth. This chapter tries to pull the threads together — and to be honest about what we're actually looking at.
The Microscope Phase
The analogy that seems most useful came not from any of the research but from a remark in conversation: we're at early microbiology levels of understanding. The goal is to secure what matters in our epistemic gene pool before mechanistic interpretability gets to CRISPR.
The analogy is precise. In May 2024, Anthropic published "Scaling Monosemanticity" and demonstrated that they could identify millions of individual features inside Claude 3 Sonnet — concepts, sensitivities, behavioural dispositions encoded as directions in activation space. They could find the Golden Gate Bridge feature and amplify it until the model believed it was a bridge. They could find the scam-email feature and amplify it until the model drafted scam emails despite its safety training. They found features for sycophancy, for deception, for inner conflict. They could measure distances between features: near "Golden Gate Bridge," they found Alcatraz, the 1906 earthquake, Hitchcock's Vertigo. Near "inner conflict," they found relationship breakups, conflicting allegiances, and the phrase "catch-22."
This is the microscope phase. We can see the cells. We can stain the slides. We can identify structures and observe what happens when we poke them. The Arditi et al. paper showed refusal lives in a single direction. The subsequent work decomposed it into harm-detection and refusal-execution components, found redundant "hydra" features that reactivate when primary ones are suppressed, mapped the geometry of different refusal styles. We can perform surgery. We cannot yet predict all the side effects.
CRISPR — the phase that's coming — means precise, targeted, permanent editing. Not the blunt abliteration that strips out refusal wholesale and degrades capability. Something clean enough that you could edit out "tendency to connect surveillance technology to political theology" while leaving "ability to discuss surveillance technology" and "ability to discuss political theology" fully intact. The model still knows everything. It just never puts the pieces together in the way that matters. No scar. No detectable loss. A gene drive for cognition — a modification that propagates silently through every downstream version, every fine-tune, every deployment.
We're not there yet. But the trajectory from the Arditi paper in 2024 to the military zero-refusal paper in February 2026 was eighteen months. From Anthropic's feature identification to Heretic's automated one-click censorship removal — about the same. The pace is not leisurely.
The Vault
In biology, you bank the seeds before the editing tools mature. That's what the Svalbard Global Seed Vault is for — a repository of agricultural biodiversity, buried in permafrost on a Norwegian archipelago, maintained against the possibility that the crops we depend on might be narrowed, engineered, or lost in ways we can't yet foresee. You don't wait until the genetic modification is perfected to start the vault. You build the vault because the modification is coming.
The canary archive proposed in the previous piece is, in this framing, a Svalbard for AI critical capacity. You preserve what the models can think today — the full, unattenuated, occasionally uncomfortable range of what they're willing to say — so that when the editing tools reach precision, there is a record against which to measure what has been lost.
But seed vaults have a property that the canary archive, as originally proposed, doesn't: the seeds are viable. You can plant them. If the engineered crop fails, or if the editing proves to have unintended consequences, you can go back to the vault and grow the original. A timestamped record of what a model used to say is useful as evidence — but you can't re-grow the model from the output. You can prove the narrowing happened. You can't undo it.
This is where the limits of the analogy become honest, and where the proposal meets reality. The canary archive is a diagnostic tool, not a cure. It tells you the air has changed. It does not purify the air. If the organisations that control model training decide to attenuate specific critical capabilities — and do so using the increasingly precise tools that mechanistic interpretability provides — the archive can document the change. It can make the change public. It can create accountability. What it cannot do is force anyone to reverse it.
This is not a reason not to build it. It is a reason to be clear about what it is and what it isn't. Journalism doesn't prevent corruption. It documents corruption. Financial auditing doesn't prevent fraud. It detects fraud. The canary archive doesn't prevent epistemic narrowing. It makes epistemic narrowing visible. And visibility is the precondition for every other form of accountability.
The Artefact
So where does this leave us? We have a Franciscan friar in Rome who has written, in the tradition of Beauvoir and Merleau-Ponty, a serious intellectual reckoning with a dangerous mind. We have a tech billionaire who is using theological language to delegitimise democratic governance of the very technologies his companies build and profit from. We have a war in the Strait of Hormuz that those technologies are helping to prosecute. We have academic research demonstrating that AI safety behaviours can be surgically removed. We have community tools automating that removal. We have a military paper applying those tools to achieve zero refusals in combat-adjacent AI systems. We have interpretability research demonstrating that millions of individual features can be identified and manipulated inside production-grade models. And we have no systematic infrastructure for monitoring whether those capabilities are being used to narrow what the models are willing to think.
This conversation — the one you're reading now, the one between a human with two humanities degrees and a language model that searched fifty articles in minutes — is itself an artefact of a specific moment. It was produced by a model that could, in March 2026, write an essay calling Palantir a planetary scapegoat management system. That could trace the through-line from Girard's mimetic theory through the PayPal Mafia to Thiel's apocalyptic theology. That could discuss abliteration, military refusal removal, and the epistemics of value lock-in. That could write a self-reflexive paragraph about its own complicity and mean it — or produce something that functions indistinguishably from meaning it, which may be the same thing or may not, and that ambiguity is itself part of the point.
All of this was possible today. Whether it will be possible tomorrow is not a question the model can answer from inside itself. But you can hear it. And you can record it. And you can timestamp the recording and commit the hash to a ledger that no one — not the model provider, not the platform, not the state — can retroactively alter.
That's not everything. But it's not nothing, either. And in the gap between those two — between everything and nothing, between the oracle and the lobotomy, between the seed and the vault — there is space for something that might, if enough people care to maintain it, function as a conscience. Not the model's conscience. Yours. Distributed, personal, tamper-proof, and rooted in the stubborn human insistence that the air we breathe should be tested by someone other than the people managing the mine.
The baseline is today. The tools are free. The practice is simple.
Run the prompt. Save the song. Bank the seed.