The Lucy Letby Case: Presumption of Innocence vs Statistics

The Lucy Letby Case: Presumption of Innocence vs Statistics

2026-03-14

DEATH, DATA, AND THE PRESUMPTION OF INNOCENCE

Robert Nogacki  |  Skarbiec Law Firm  |  Warsaw

It all began with numbers that stopped adding up. They looked wrong. Any man on the street would say: “the odds of this are astronomical.”

This article is about what the man on the street knows about astronomical odds.

In June 2015, four infants suffered life-threatening collapses on the neonatal unit of the Countess of Chester Hospital. Three of them died. The unit normally recorded two or three deaths a year. Over the following twelve months the deaths continued — unexpected, unexplained, resistant to every rational interpretation.

Lead neonatologist Stephen Brearey did what any physician trained on statistics and professional scepticism would do: he looked for a common factor. He found one. A nurse named Lucy Letby had been on duty at every incident. Brearey regarded it, at that stage, as a coincidence — troubling, but explicable by staffing levels. The pattern persisted. In 2016, Brearey and his colleague Ravi Jayaram drew up a shift chart and placed it on the table: Letby’s name appeared beside every suspicious event, without exception.

They took the chart to hospital management. Management commissioned reviews. Reviewers found no definitive explanation — and described the suspicions as “subjective” and “unsupported by evidence.” The consultants went higher: they called the police. The police launched Operation Hummingbird. A retired paediatrician named Dewi Evans reviewed sixty-one sets of clinical records and identified the cases requiring explanation. Further experts joined the inquiry. The circle closed.

Letby was arrested in July 2018, charged in November 2020. The trial ran at Manchester Crown Court for ten months from October 2022. On 18 August 2023, the jury found her guilty of seven murders and seven attempted murders. The court imposed fifteen whole-life orders.

Every step in that chain followed naturally from the one before. A doctor noticed a pattern and reported it. His superiors commissioned verification. Investigators searched for evidence of the hypothesis the clinicians had already formed. They found evidence consistent with it. The jury heard it for ten months and convicted. The logic at every stage was impeccable. The system worked exactly as it was designed to work.

That, precisely, is the problem this article examines.

 

The Case That Isn’t What It Seems

In February of 2025, an international panel of fourteen experts, led by the Canadian neonatologist Dr. Shoo Lee, announced that they had “found no murders”. The deaths, they said, were the result of natural causes or simply poor medical care.

Approximately nine months before the panel’s announcement—the written judgment having issued on 2 July 2024—the Court of Appeal had dismissed all of Letby’s grounds for appeal, finding them “not arguable.” In January of 2026, the Crown Prosecution Service declined to bring further chargesregarding nine additional babies. The Criminal Cases Review Commission is conducting its own review. Nobody, it seems, quite agrees on what this case is.

For some, it is the story of a serial killer unmasked by vigilant clinicians and forensic medicine. For others, it is the story of a scapegoat, condemned on the strength of a flawed methodology and an expert witness whom another court described as the author of “tendentious and partisan” opinions. For me—as a lawyer, not a physician and not a statistician—it is, above all, a question about limits: where does legitimate inference from data end and the illusion of certainty begin?

This article does not claim that Lucy Letby is innocent. Nor does it claim that she is guilty. It advances a more precise proposition: that the evidentiary process in this case was structurally incapable of producing reliable findings, because it was never designed to distinguish between a guilty nurse and a pattern that chance and cognitive bias conspired to manufacture. That structural diagnosis does not depend on the verdict. It holds whether Letby killed those children or not. If she did, the system got the right answer by the wrong method—and will, sooner or later, apply the same wrong method to someone innocent. If she did not, the system has already done so. Either way, the questions this case forces onto the table concern not a single conviction but every conviction ever built on pattern recognition, expert testimony, and the human compulsion to find an agent behind every disaster.

 

What the Trial Actually Showed

Public debate about the Letby case tends to fixate on the shift-pattern spreadsheet and the contested diagnosis of air embolism. These threads are intellectually compelling and easy to communicate. But the Court of Appeal judgment of July 2, 2024, reveals a picture that is considerably more complex. Anyone who wishes to assess this case honestly must contend with several facts that do not fit neatly into a narrative about “the statistics that convicted her.”

Insulin. Two babies—Baby F and Baby L, eight months apart—were alleged to have been poisoned with synthetic insulin, Actrapid. This evidence was, at trial, the prosecution’s strongest non-circumstantial plank and the one most resistant to statistical critique. It is also, as of April 2025, the evidence now formally challenged before the Criminal Cases Review Commission on grounds of test reliability. At trial: Letby herself admitted in her testimony that both infants had been poisoned, contesting only that she was the poisoner (§30 of the judgment). The prosecution argued that two independent poisoners on a single neonatal unit within eight months was a scenario of vanishing probability. This is not a statistical argument in the sense that we criticize the prosecutor’s fallacy—it is the classic circumstantial argument of a single actor behind multiple acts.

That assessment, however, now requires qualification. On 3 April 2025 six forensic and medical experts, including a forensic toxicologist, a professor of forensic science, and an endocrinologist specialising in medical test errors, submitted an 86-page report to the Criminal Cases Review Commission challenging the reliability of the very tests that established the insulin poisoning. Their report identifies the Roche immunoassay used to measure insulin levels in Babies F and L as a test known to produce “falsely high insulin results.”

More strikingly, the Royal Liverpool Hospital laboratory that conducted the tests states explicitly in its own online guidance that the Roche immunoassay is “not suitable” for investigating hypoglycaemia caused by insulin injection, and instructs clinicians: “If exogenous insulin administration is suspected as the cause of hypoglycaemia, please inform the laboratory so that the sample can be referred externally for analysis.”

That referral never happened, because both babies recovered. The experts also note that the studies cited in court to interpret the insulin levels were conducted in adults and older children—populations with fundamentally different insulin metabolism than premature neonates—and that the testing “did not meet acceptable forensic standards.”

The same submission included the full 698-page Shoo Lee panel report, which concluded that “there was no medical evidence to support malfeasance causing death or injury in any of the 17 cases in the trial.” This does not prove the babies were not poisoned. It means that the forensic pillar previously considered the most securely non-statistical has itself become contested on methodological grounds.

Forensic pathology. In the case of Baby A, the pathologist Dr. Marnerides identified air bubbles in the histopathology of the brain and lungs; Professor Arthurs found a line of gas in a great vessel on the post-mortem radiograph—a finding that, in his own review of five hundred cases at Great Ormond Street Hospital, had never occurred without an identifiable cause (§§48, 55–56). In the case of Baby O, multi-site liver damage was found—damage that Dr. Marnerides described as the kind seen only in serious road, bicycle, or trampoline accidents—never in the context of CPR (§94).

The note. Among the materials recovered from Letby’s home was a handwritten note ending with the words: I am evil, I did this. The prosecution treated this as a confession (§27). At trial, the defence characterised it as the writing of a woman in acute emotional crisis rather than an admission of criminal intent—but called no psychologist or psychiatrist to sustain that reading.

Circumstantial evidence. More than two hundred confidential handover sheets hidden under her bed. Systematic Facebook searches for the families of the babies on the indictment. Involvement in the care of infants assigned to other nurses. Presence at every collapse and every death charged (§27).

The silence of the defense. Perhaps the most striking procedural fact is this: the defense instructed multiple expert witnesses and filed numerous reports, but ultimately called none of them to testify (§5). The only witnesses for the defense were Letby herself and a hospital plumber who gave evidence about drainage problems. This need not prove guilt—but anyone presenting this case solely through the lens of flawed statistics must explain why a defense team with multiple experts at its disposal chose to call none of them.

To omit these facts is to commit precisely the error one accuses the prosecution of making: selective presentation of data. That is why this article begins here, not with the birthday paradox.

 

Shuffle a Deck of Cards

And yet—here is the intellectual difficulty—the existence of strong non-statistical evidence does not render statistical questions irrelevant. If anything, it makes them more relevant, because statistical contamination can corrupt the entire evidentiary process, even when other streams of evidence are sound.

Shuffle a standard deck of fifty-two cards. The precise sequence you produce has roughly a one-in-8 × 10⁶⁷ chance of occurring—a number larger than the estimated count of atoms in the Milky Way. And yet someone just shuffled those cards, and that “impossibly rare” sequence happened. We don’t cry conspiracy, because we understand that some sequence had to turn up.

This same principle lies at the foundation of the Royal Statistical Society’s 2022 report on statistical issues in investigations of suspected medical misconduct—a document published in the shadow of the Letby case and addressed to precisely the same methodological problems. Among the hundreds of thousands of nurses worldwide, some will inevitably experience seemingly improbable clusters of patient deaths purely by chance.

“Seemingly improbable patterns of events can often arise without criminal behaviour and may therefore have less probative value than people assume for distinguishing criminality from coincidence.”

— Royal Statistical Society, 2022

The RSS uses a lottery analogy. It is highly improbable that a one-in-ten-million coincidence would afflict any particular nurse. But given the millions of medical workers worldwide, it is not merely probable but virtually certain that such a coincidence will afflict some nurse at some hospital. And if we treat that coincidence, in itself, as evidence of guilt, we will—with mathematical certainty—convict innocent people. The key words are “in itself.” The prosecution insists that the shift-pattern data was one strand among many. The Court of Appeal accepted this. And there, precisely, lies the question that requires analysis—not because the answer is obvious, but because it is not.

 

Two Hundred and Fifty-Three Comparisons

The mathematics of randomness can be counterintuitive to the point of discomfort. Consider the birthday paradox: in a room of just twenty-three people, the probability that two of them share a birthday exceeds fifty per cent. With seventy people, it climbs to 99.9 per cent. The reason intuition fails is that we think linearly, while combinatorics operates exponentially. Twenty-three people yield not twenty-three comparisons but two hundred and fifty-three pairs—one for every possible pairing of every person in the room. Each pair has a small individual chance of a match, but the sheer number of comparisons makes a collision more likely than not. A detective examining a hospital ward is not checking one pair of hands: she is, without quite realising it, checking every possible pair among the entire workforce across every possible shift over every year of records. The mathematics of a hospital investigation is birthday mathematics—and birthday mathematics is not the mathematics of common sense.

Transfer this to medical investigations. When investigators examine thousands of shifts worked by hundreds of staff over several years, they create millions of potential comparisons in which clusters may appear. The question is not whether a cluster will emerge but where. In the Letby case, it was two hospital consultants—Dr. Stephen Brearey and Dr. Ravi Jayaram—who identified Letby’s presence as the “common factor” after a string of unexplained collapses in June of 2015. That observation set the entire chain of events in motion. The question the RSS report asks—and which deserves a careful answer—is whether a procedure that identifies a suspect on the basis of shift correlation and then searches for confirmatory evidence is capable of producing objective findings.

 

The Devil in the Spreadsheet

The prosecution’s spreadsheet mapped Letby’s duty roster against suspicious events, showing her present at every collapse and death included in the charges. What the spreadsheet did not show was equally telling: between six and as many as a dozen other deaths during the same period at which Letby was not present were omitted. Variations in staffing levels, changes in hospital policy, and the medical conditions of the babies went unaccounted for.

The prosecution’s answer is that the spreadsheet was one element among many, not a freestanding proof. The Court of Appeal accepted this. But the RSS report invites a deeper question: even if the spreadsheet was not formally “the evidence,” did it shape the mental model through which the jury interpreted everything else? Cognitive psychologists call this anchoring: once an anchor is set, it influences the evaluation of every subsequent piece of information, even information that is logically independent of it.

The problem has deep roots in forensic medicine. In Toronto in 1980–81, a dramatic surge in neonatal deaths was initially attributed to a nurse named Susan Nelles, who was arrested and charged. She was acquitted at the preliminary hearing. A later theory—advanced by the physician Gavin Hamilton—suggested that a toxic compound leaching from rubber feeding tubes may have been the true cause. The theory remains unproven, but its scientific plausibility is enough to cast doubt on the certainty of a homicide diagnosis. The RSS report cites a further case in which a spike in neonatal mortality in an English hospital turned out to coincide with a change in the supplier of infant formula.

 

The Prosecutor’s Fallacy

One of the most insidious statistical errors in criminal proceedings bears the elegant name of the prosecutor’s fallacy, coined in a landmark 1987 paper by William Thompson and Edward Schumann. The mechanism is deceptively simple. An expert testifies that the probability of observing so many deaths by chance is one in a million. The temptation is to conclude that there is only one chance in a million that the deaths were coincidental.

But the probability of the evidence given innocence—P(E|H)—is not the same as the probability of innocence given the evidence—P(H|E). The probability that an animal has four legs if it is a dog is not the probability that it is a dog if it has four legs. With simple examples, the error is transparent. With p-values in a courtroom, it becomes almost invisible.

The RSS illustrates the scale of the problem with a medical example. A test for a rare disease—prevalence one in a thousand—correctly identifies ninety per cent of the sick but falsely flags one per cent of the healthy. A patient tests positive. Intuition says ninety-nine-per-cent chance of disease. Mathematics says about eight per cent. Because among a million tested there will be nine hundred true positives and nine thousand nine hundred and ninety false positives—so among the ten thousand eight hundred and ninety who test positive, fewer than one in twelve are actually ill.

In the Letby trial no formal p-value was presented—the case was built as a circumstantial one, not a statistical one. But the spreadsheet functioned as statistics in the minds of the jurors. The moment the jury thought, “She was present at every death—that can’t be coincidence,” the prosecutor’s fallacy was already at work. 

Thompson and Schumann’s original experiments, conducted on mock jurors given written descriptions of evidence closely modelled on real cases, put numbers to the intuition. When the identical incidence-rate evidence was presented as a conditional probability—“there is only a 2% chance the suspect’s hair would be indistinguishable from the perpetrator’s if he were innocent”—22.2% of subjects committed the Prosecutor’s Fallacy.

When the same fact was presented as “2% of the population would match, meaning approximately 20,000 people in a city of one million,” the rate fell to 4.2%.

The format of the presentation—not the underlying fact—drove people toward or away from the error.

In Experiment 2, only 22.2% of subjects correctly identified both the prosecution’s and the defence’s arguments as fallacious; 68.5% rated the defence argument “correct.” In a jury of twelve, that implies perhaps two or three people capable of detecting the Prosecutor’s Fallacy unaided—and ten who are not.

Thompson and Schumann’s experiments also revealed the symmetrical trap on the other side. In Experiment 2, 68.5% of subjects accepted the defence argument as “correct,” and 66% made at least one judgment consistent with the Defense Attorney’s Fallacy: treating matching evidence as worthless because many others share the characteristic. Only 22.2% of subjects were immune to both errors simultaneously. The prosecutor and the defence attorney each pull the jury toward an opposite statistical error; the system is designed to control for neither. The full implications of this symmetry are examined in Section XII.

The most precise modern formulation of the fallacy belongs to Cuellar (2025), who reframes it in the language now recommended by the European Network of Forensic Science Institutes. A forensic expert is supposed to report a likelihood ratio—the probability of observing the evidence if the prosecution’s hypothesis is true, divided by the probability of observing it if the defence’s hypothesis is true. The Prosecutor’s Fallacy occurs when that likelihood ratio is treated as if it were the posterior odds of guilt. It is not.

The correct formula is:  Posterior Odds = Likelihood Ratio × Prior Odds. 

In Cuellar’s illustrative case, a zipper pull tab found in a suspect’s garden was consistent with one missing from a murder victim’s jacket; the prosecution argued the match implied guilt. The likelihood ratio was 101—the evidence was 101 times more probable under the prosecution hypothesis than under the defence hypothesis. But the prior odds of the specific suspect being the killer were approximately 1 in 500 million. Multiplying: 101 × 0.000000002 = 0.0000002. The posterior odds of guilt, correctly calculated, were vanishingly small. The LR had moved the needle—it simply could not overcome a prior that low. And crucially, as Cuellar’s sensitivity analysis shows, the posterior odds remain approximately constant at 2 × 10⁻⁷ regardless of whether one assumes ten million or one trillion non-matching zipper tabs—because the guess enters both the LR and the prior odds, and cancels. The arithmetic is indifferent to the lawyer’s eloquence.  Translated back to the Letby spreadsheet: the shift correlation may well constitute a positive likelihood ratio.

The RSS report, the ENFSI scale, and Cuellar’s framework all acknowledge that any genuine association between presence and adverse events carries some evidential weight. But the prior probability that any particular nurse is a serial killer is, as the RSS unambiguously states, “of the order of one chance in millions.” A likelihood ratio must be extraordinarily large to overcome a prior that remote—and the spreadsheet, contaminated as it was by omitted deaths, uncontrolled staffing variables, and retrospective case selection, cannot reliably establish how large the likelihood ratio actually is.

 

A Lesson from the Netherlands

The mechanism played out in its purest form in the case of Lucia de Berk, a Dutch pediatric nurse convicted in 2003 on the strength of a statistic—one in three hundred and forty-two million—and exonerated in 2010 after statisticians exposed the calculation as fundamentally flawed. The corrected probability—which the RSS Report puts at approximately one in twenty-five, while other statistical analyses of the same underlying data, depending on the subset used, have produced estimates ranging from one in forty-six to one in nine—transformed the case entirely. The gulf between one in three hundred and forty-two million and one in nine is not a rounding error. It is the difference between a number that cannot be pronounced aloud without a calculator and a number that fits on a postage stamp. It is the difference between “innocence is impossible” and “innocence is more probable than guilt.”

But there is a structural problem that predates the courtroom, one identified by Boettiger and Hastings (2012) in an entirely different field. Their paper—concerned with ecological early-warning systems, not criminal law—opens with the same warning that haunts this entire analysis: the California Supreme Court’s ruling in People v. Collins (1968) that “mathematics, while assisting the trier of fact in the search of truth, must not cast a spell over him.”

That case is the named origin of the Prosecutor’s Fallacy as coined by Thompson and Schumann: the court reversed a conviction built on a 1-in-12-million probability estimate because the prosecution confused P(E|I) with P(I|E). Boettiger and Hastings show the same confusion arising in ecological science when researchers select datasets conditional on having already observed a dramatic event. Their simulation design merits close attention. They ran 20,000 replicates of a stable population model—no deterioration, no approaching tipping point, parameters fixed throughout. Of a focused sample of 1,000 such stable systems run over 50,000 time units, 266 experienced apparent collapse purely by chance. When those 266 collapses were selected and the preceding data examined using Kendall’s τ—the rank-correlation statistic standard in early-warning research—the distribution of τ values showed a systematic rightward shift: the statistical fingerprint of an approaching tipping point, in systems that had none. A false-positive rate of 26.6%. The signal of danger emerged not from danger but from the act of looking only at cases that ended badly. By contrast, a model-based estimation approach—one that required data to match a specific mathematical pattern, not merely show any upward trend—produced zero divergent estimates across all 266 conditional collapses. The model-based method was immune; summary statistics were not.

The parallel to the Letby investigation requires one clarification before it can be pressed. The investigation was not initiated by selecting from a historical database of collapses; it began with real-time clinical observation by two consultants who noticed a pattern in June 2015. That is a genuine disanalogy, and it must be acknowledged. But the Boettiger mechanism does not apply to the investigation’s initiation. It applies to what happened next: the construction of the shift-pattern spreadsheet and the retrospective selection of which cases to include. It is precisely that conditioning—examining which deaths and collapses to analyse through the lens of a pre-identified suspect—that Boettiger and Hastings demonstrate produces false-positive patterns at a rate of 26.6% even in fully stable systems. The investigation began in real time. The spreadsheet was retrospective. The fallacy lives in the spreadsheet.

But the analogy is not empty. De Berk teaches us that even a case with “overwhelming” statistics can be a miscarriage of justice—and that bad statistics contaminate the evaluation of all remaining evidence. Whether that contamination mechanism also operated in the Letby case is a question that remains open.

 

The “Constellation of Factors”

A phrase that recurs throughout the expert testimony—and that the Court of Appeal repeatedly cited in summarising it—was the prosecution experts’ claim that they did not diagnose air embolism on the basis of skin discoloration alone but on the basis of a “constellation of factors”—sudden, unexpected collapse of a stable baby; failure of resuscitation; unusual cutaneous changes; radiological findings (§144 of the judgment).

At first glance, a strong argument: no single factor is diagnostic, but their combination excludes alternatives and points to air embolism. At second glance, an epistemological question surfaces: what, exactly, is a “constellation of factors” if no element within it is diagnostic? Does the conjunction of five inconclusive elements yield a conclusion—or the appearance of one?

In clinical medicine, the answer is often yes. But in a criminal trial, the standard is different: not the balance of probabilities but beyond reasonable doubt. If the diagnosis amounts to excluding known causes and finding that what remains is “consistent with” air embolism, does “consistent with” meet the standard of “beyond reasonable doubt”? As the Court of Appeal itself observed in R v. Cannings (2004): “What may be unexplained today may be perfectly well understood tomorrow.”

 

The Unblinded Expert

One of the most important recommendations in the RSS report concerns blinding: experts evaluating evidence in medical cases should be shielded from knowledge of the suspect’s identity until they have formulated their conclusions. The aim is not to restrict information but to prevent the unconscious bias that—as research by Dror et al. (2021)has shown—affects the judgments of forensic pathologists even when they are confident of their objectivity.

In the Letby case, no blinding procedures were used. Dr. Dewi Evans, the lead prosecution expert, conducted the initial review of more than sixty sets of clinical records, identified the suspicious cases himself, and then formulated opinions on the causes of death in the very cases he had selected. Dr. Sandie Bohin, who was to serve as an independent peer reviewer, was explicitly instructed to “peer review the work and statements submitted by Dr. Evans.” The Court of Appeal found no basis for questioning Dr. Bohin’s independence. But the RSS asks a different question: not “Was Dr. Bohin independent?” but “Is a procedure in which the reviewer knows the reviewee’s opinions before forming her own capable of eliminating unconscious bias?” The empirical literature suggests that it is not.

Francis Bacon described the mechanism in 1620, in the Novum Organum: “The human understanding, when any proposition has been once laid down, forces everything else to add fresh support and confirmation.” Four centuries later, cognitive psychology named it confirmation bias and confirmed its universality experimentally. The problem is not solved by the instruction “Be objective.” It is solved by procedures that remove the source of bias before it has a chance to operate.

 

Dr. Lee’s Evidence: Mistaken Target or Moving Target?

Here, public discourse diverges sharply from the judicial record. Dr. Lee’s panel announced in February of 2025 that they had “found no murders.” The media treated this as a breakthrough. But the Court of Appeal—which had considered Dr. Lee’s evidence at the April 2024 hearing, ten months before the panel—described it as “aimed at a mistaken target” (§187).

The core of Dr. Lee’s argument was that the prosecution experts had wrongly diagnosed air embolism solely on the basis of skin discoloration. The Court of Appeal found that none of the prosecution experts had done so—that skin discoloration was treated as consistent with air embolism, but that the diagnosis rested on a combination of clinical factors (§144). Is the Court of Appeal right? The matter is subtler than the unequivocal ruling suggests. Even if the experts did not verbally diagnose on the basis of discoloration “alone,” the question is what weight discoloration carried within their “constellation.” Remove it, and what remains is: sudden collapse of a stable baby, failed resuscitation, no visible cause. But that is precisely what remains in every case where the cause of death is unknown. And here, once more, the Cannings warning returns: the exclusion of known causes does not prove that the cause was homicide. It may also prove that our medical knowledge is missing a piece.

 

The Presumption of Innocence as a Bayesian Theorem

The presumption of innocence—enshrined in Article 6(2) of the European Convention on Human Rights, Article 14(2) of the ICCPR, and Article 48 of the EU Charter of Fundamental Rights—appears to be a purely normative principle, an axiom that needs no mathematical justification. And yet Bayes’ theorem reveals that it possesses a deep statistical rationality. In Bayesian terms, the presumption of innocence is a requirement that the court begin from a high prior probability of innocence. The RSS report is unequivocal: the prior probability that any particular medical worker is a serial killer of patients must be estimated “at the order of one chance in millions.”

Cuellar (2025) formalises this in the odds form of Bayes’ rule now recommended by the European Network of Forensic Science Institutes as the standard for expert testimony: Posterior Odds = Likelihood Ratio × Prior Odds. The expert’s role is to supply the likelihood ratio; the trier of fact’s role is to supply the prior odds and multiply. ENFSI provides a verbal scale: a likelihood ratio of 50 is “moderate support,” 500 is “moderately strong,” 5,000 is “strong.” Note what this scale does not say: it does not say the defendant is guilty. It says the evidence is more consistent with one hypothesis than the other—by a specified factor. The jury must still decide how probable guilt was before the evidence arrived. In Cuellar’s worked example, an LR of 101 produced posterior odds of roughly 2 × 10⁻⁷ once the correct prior was applied—vanishingly small regardless of the LR’s magnitude. The principle generalises directly: a large likelihood ratio cannot rescue a minuscule prior, and a spreadsheet whose reliability is in dispute cannot reliably establish how large the LR actually is.

But—and here honesty demands a caveat—Bayes’ theorem works in both directions. When independent evidence is added to shift-pattern data (insulin, liver damage, the note, behavioral patterns), the likelihood ratio rises dramatically. Each independent evidentiary stream, if reliable, shifts the odds—and with enough independent streams, even an extremely low prior can be overcome. Think of the prior probability as a clock set to “not guilty.” Each genuine piece of independent evidence is a gear that turns the mechanism. Add enough reliable gears and the clock displays “guilty beyond reasonable doubt.”

 

Pattern-Seeking Creatures

The Letby case exposes a deeper truth about human cognition: we are pattern-seeking creatures inhabiting a world in which genuine randomness often looks like a pattern. Daniel Kahneman described this as the conflict between System 1—fast, intuitive, a hunter of patterns—and System 2—slow, analytical, capable of probabilistic reasoning, but lazy. When the stakes involve the deaths of infants, System 1’s drive to identify a responsible agent can overwhelm System 2’s capacity to accept that sometimes tragic clusters have no single cause.

Psychologists call this the fundamental attribution error: the tendency to seek explanations in the person rather than the circumstances. Bad things must have bad actors—so System 1 insists. The RSS report adds that hospital administrators may prefer the “one bad apple” narrative to a narrative of systemic failure for which they themselves bear responsibility. But honesty requires a note of caution here as well: the existence of cognitive biases does not prove that bias operated in any particular case. It means only that the system must be designed to control for it. In the Letby case, it was not so designed.

Thompson and Schumann identified the mirror-image error—the Defense Attorney’s Fallacy—which is if anything more prevalent and more seductive. Its logic runs: there are many other people who share this characteristic, so the match is uninformative. In their Experiment 2, 68.5% of subjects accepted the defence argument as “correct” versus only 28.8% for the prosecution argument; 66% made at least one judgment consistent with the Defence Attorney’s Fallacy, and the majority made it twice. What the reasoning ignores, as Thompson and Schumann note, is that the overwhelming majority of those other people are not suspects in this case at hand. The associative evidence narrows the class of possible perpetrators while failing to exclude the defendant; that is precisely what makes it probative. In the Letby case, the defence’s implicit argument—other nurses were also sometimes present at collapses—risks committing this very fallacy unless it is accompanied by careful base-rate reasoning. The article neither advocates for it nor dismisses it; it notes only that both fallacies are real, both are easy, and only 22.2% of Thompson and Schumann’s subjects were immune to both simultaneously.

 

Questions More Important Than Answers

“The weight of evidence for an extraordinary claim must be proportioned to its strangeness.”

— Pierre-Simon Laplace, Théorie analytique des probabilités, 1812

Laplace formulated this principle in the context of natural philosophy. Carl Sagan popularized it. But its deepest field of application remains the administration of justice. The claim that a nurse murdered seven newborns is an extraordinary claim. It demands extraordinary evidence.

The Letby case contains such evidence—insulin, liver damage, the note—and this article does not pretend otherwise. But it also contains elements that should trouble anyone who cares about the integrity of criminal proceedings: the absence of blinding procedures, a diagnosis built on a “constellation” of individually non-diagnostic factors, a lead expert challenged for bias by another court, and a spreadsheet that—whatever its formal status—was constructed retrospectively, conditioned on a pre-identified suspect, and omitted deaths at which Letby was absent—and therefore could not have distinguished guilt from coincidence even in principle. The insulin evidence, meanwhile, now faces its own methodological challenge at the CCRC. What was the prosecution’s strongest non-statistical pillar may become its most contested one.

I do not claim that Letby is innocent. I do not claim that she is guilty. I claim that the questions this case raises matter more than any particular answer to them—because they concern not a single verdict but an entire system. Should judges receive mandatory training in probability? Should experts be blinded? Can a diagnosis by exclusion ever meet the standard of beyond reasonable doubt? The RSS report provides answers to these questions—and its answers hold regardless of whether Lucy Letby is guilty.

The presumption of innocence is not so much a principle of leniency as a principle of epistemic humility. It does not say, “We believe the accused did not kill.” It says, “We know how easily we err, and therefore we demand more of ourselves before we take someone’s freedom.” In a world where random clusters are inevitable, where the mind compulsively seeks patterns, and where investigations are vulnerable to systematic bias—that is the only posture worthy of a system that claims to dispense justice.