“This guy should never be let near a witness-box again”

The above photo is of Dr Robert Worden, who was an expert witness for the Post Office in the Horizon trial, which was part of the Bates v Post Office group litigation. You can read about my experience of watching Dr Worden be cross-examined here, and you can read Mr Justice Fraser’s thoughts on the value of Dr Worden’s work here.

There follows, reprinted with permission, an article published yesterday by Jeremy Dawson. Jeremy lives in Australia and holds a PhD in Pure Maths and a Diploma of Law. He has spent 5 years in research and practical statistics, 3 years in legal work (deciding and litigating disputes) and spent a combined 30 years in the Australian Dept of Defence and Australian National University (ANU) – working on “correctness of software, mostly using computer programs to prove mathematical theorems, including software correctness properties.”

Jeremy is now retired, and has taken an interest in the Post Office Horizon IT scandal, most particularly, the evidence of Dr Robert Worden. He sent me his thoughts and I suggested he publish them, which he did, on the ANU’s servers, here. To make it easier to find, I have republished his article below.

I have no reason to doubt that this is anything other than Dr Dawson’s honest opinion. He doesn’t hold back.

The statistical evidence in the Bates (Horizon Issues) trial

by Jeremy Dawson

The judgment in the Bates (Horizon Issues) trial is at https://www.bailii.org/ew/cases/EWHC/QB/2019/3408.html.

Dr Worden’s first report (07 December 2018) to which I refer is at https://www.postofficetrial.com/2019/06/horizon-trial-post-office-independent.html.

For Nick Wallis’s book, The Great Post Office Scandal, see https://bathpublishing.com/products/the-great-post-office-scandal-first.

Dr Robert Worden’s evidence

In the Horizon Issues trial, statistical analyses were provided by the Post Office’s expert witness, Dr Robert Worden. The judge, at para 805 of the judgment, quotes section 8.8.1 of Dr Worden’s report (paras 759 to 766), as follows:

760. Because Post Office has had an average of 13,560 branches over the lifetime of Horizon, the total number of monthly branch accounts has been about 3 million. 761. Therefore, if a bug like the Suspense Account bug has occurred 16 times in the lifetime of Horizon, the chance of it having occurred in any given branch in any given month is about 16 in 3 million. [omitted text refines this calculation] 762. I have considered a bug similar to the suspense account bug, which occurred about 10 times, and had a mean financial impact of about £1000 per occurrence. How many similar bugs would be needed, to give a one in ten chance of one such bug occurring, with an impact of £1000, on a particular Claimant’s branch in a particular month? 763. The answer, given by elementary arithmetic which I describe in section 8.5, is that there would need to be 50,000 of these distinct bugs. 764. So the Claimants cannot credibly assert that their shortfalls were caused by bugs in Horizon, unless there were something of the order of 50,000 such bugs.

This is complete nonsense. It like arguing in a murder trial that because the homicide rate in the UK is only 1 per 100,000 per year, it is overwhelmingly unlikely that the defendant is guilty.

This is so obviously nonsense that it may be superfluous to ask what is the flaw in the argument, but here it is. You can use his calcuation to get the probability that a randomly chosen person committed the murder in question. But the person on trial is not a person chosen randomly.

Likewise you could calculate the probability that a randomly chosen subpostmaster has been affected by a Horizon bug, or at least make a conscientious attempt to do so, as Dr Worden has done. But the subpostmasters who claim to have suffered unexplained shortfalls, possibly Horizon related, are not a randomly chosen subset.

I’m told this statistical fallacy is surprisingly common. I would hope it is not common among expert witnesses. I’m sure it’s not common among competent expert witnesses.

At this point there is not much more that needs to be said. But as I have spent so long reading the expert’s report, the transcripts and the judgment, I’m going to say some more anyway. Skip it if you like, but do jump ahead to my heading “The Suspense Account” – there are some different angles I discuss there.

Fortunately the judge gets this exactly right. I suspect not all judges would. At para 766 he says

I deal with that point in further detail below at [821] and [822] below, but this amounts to an assumption by Dr Worden that a group of SPMs who specifically allege they have experienced the effects of bugs are to be treated, in statistical terms, as though they are a random group of SPMs of the same sample size drawn from the wider population of all SPMs. They plainly are not a randomly drawn sample of nearly 600 SPMs. They are a very specific group (or sample) of those who say their branch accounts have been impacted by, or have experienced, such incidents. In statistical terms, the correct term for the group is that they have a bias – they all allege that they have experienced the effects of bugs, errors and defects.

After recounting Dr Worden’s statistical evidence (para 805) he goes into a long digression, and then (para 821) comes to the key issue:

However, the claimants are not a random sample of SPMs … As a sample, they have already been filtered or selected in that these particular SPMs already complain of bugs, defects and errors in Horizon having affected their branch accounts. This means that they are not a random sample. The way this would be expressed in statistical terms is that the claimant SPMs do not accurately represent the population of SPMs as a whole (…). The claimants are essentially self-selected, from those who believe they have experienced shortfalls and discrepancies in their accounts from the impact of bugs, errors and defects … The group has a bias, in statistical terms. They plainly cannot be treated, in statistical terms, as though they are a random group of 587 SPMs.

Exactly. But I’d add that this is such an elementary blunder that this guy should never be let near a witness-box again.

Dr Worden’s report provides some other analyses, which essentially make the same error in different ways. Assuming that bugs in Horizon are equally spread across Post Office branches and over time, and adjusted for how busy branches were, the effect of bugs on all branches would have been 160 times the effect of bugs on the claimants.

In section 8.7.9 he analyses known problems with Horizon and produces an estimate of the total effect of all bugs on Post Office branches.

I assume for this discussion that it is possible to make a plausible estimate and that he has done so. Then his assumption that the effect is equally spread over branches leads him to estimate the total effect of bugs on the claimants’ branches. This assumption is obviously unsound.

Branches may well be equally likely to be affected, based on their characteristics such as size, location, etc, so if the claimants were a randomly chosen group of postmasters then his analysis would be fine. But the set of claimants consists of people who say they have been affected – they are not a randomly chosen subset.

In section 8.8.2 he does essentially the same thing, here discussing how if bugs affected all branches equally to the claimed effect on the claimants’ branches then the Call Centre (which took calls about problems with Horizon) would have been inundated. His reasoning here has the same flaw.

In cross-examination, this issue was discussed several times.

On day 18, see the transcript at https://www.postofficetrial.com/2019/06/horizon-trial-day-18-transcript.html about 80% down, or search for “Penny Black”. They agreed to assume that one person in 500,000 in the UK is a lady called Penny Black, and discussed the probability of finding one or more such at a party of 50 people. They then discussed the scenario of a party to which only people called Penny Black had been invited. Dr Worden said

“I should say generally that probability theory is what one uses in the absence of specific knowledge like you have just put to me, and that specific knowledge changes the whole ball game.”

However Dr Worden did not accept that this is relevant to his analyses about Horizon essentially because he should not assume that the claimants are correct in their stated belief that they have been adversely affected by bugs in Horizon.

This doesn’t change the fact that the claimants are not a random set of postmasters, but a self-selected sample.

But here is an analogy: since I mention coin-tossing later, I’ll do so here. Imagine a number of coins have been tossed and lying on the ground. Imagine then that someone, even somebody of doubtful honesty and worse eyesight claims to have separated out those showing a “head” (leaving them showing the same side as they fell).

So, for a coin among those which he has selected, what is the probability that it shows a head? You may not accept it as being 100% but you’re damn sure that you shouldn’t treat it as 50%!

Unless Dr Worden is saying that his opinion is based on the assumption that the claimants’ evidence is so unreliable as to be quite worthless. Now here is a legal, not a statistical, point, and one not noted in the judgment, so I may be wrong: an expert opinion, when based on a particular view of the primary facts, should say so, and if the court comes to a different view of those primary facts, then the expert opinion becomes irrelevant.

In any case, Dr Worden considers that he should disregard the claimants’ evidence, and uses this assumption to construct a statistical argument denying the validity of the claimants’ evidence, so that is a circular argument.

On day 19, see the transcript at https://www.postofficetrial.com/2019/06/horizon-trial-day-19-transcript_32.html about 40% down, or search for “tweeting”. The barrister introduced a scenario similar to the Penny Black party, and then said

“I’m going to put a point to you that I’d be happy to put to my 13-year-old daughter, which is that when you look at a statistical sample the first thing you should do is look at the nature of the sample and how they were selected?”

The ensuing discussion led to Dr Worden saying

“the claimants are a self-selected sample and they selected themselves long after they suffered their shortfalls. So the point you are putting to me effectively is these people selected themselves and that somehow caused Horizon several years previously to rain bugs on them. And so the causation is completely the wrong way round between Horizon affecting the claimants and the claimants self-selecting. It doesn’t make sense.”

and later

“it [the fact that a postmaster believes that he/she has suffered in the way which is the subject of the proceedings] is not a material factor in whether Horizon during your tenure caused bugs to you.”

After a bit more on this theme, perhaps the judge sees the issue clearly, he says “I think [this sequence of cross-examination] has probably gone on long enough.”

So, is the direction of causation an issue? In a word, no. As in my murder trial analogy: the fact that the police and prosecutors have come to suspect a particular individual doesn’t cause him to commit a murder some time previously.

Or another coin-tossing example. Suppose two coins are tossed, and you are interested in the probability that both show a “head”.

A preliminary point here, on our intuitions about the notion of probability. If a coin is to be tossed in the future, then to say that there is a 50% probability that it will show a head has one meaning – most easily expressed that if you were to do it repeatedly, then half of the trials would show a head.

If a coin has already been tossed, then its probability of being a head is either 100% or 0%, you just don’t know which. To say that the probability of it being a head is 50% is a description not of the facts, but of your estimate of the facts. And then, after looking at the coin, your assessment of the facts will change, you will now say that the probability of it being a head is 100%, or it is 0%, as the case may be. (And of course your looking at the coin doesn’t cause it to be a head or not).

So now consider two coins tossed. On your knowledge at this point, the probability of both being heads is 25% (50% squared). If someone looks at the first coin and tells you that it is a head, then the probability of both being heads is now 50%. (In the theory of probability, these are the prior and posterior probabilities of Bayes’ Theorem. Using Dr Worden’s words, this new knowledge changes the whole ball game, but probability theory is nonetheless still relevant.) The probability changed, with no causation involved. Now, a second scenario, you are told that in fact the first coin tossed is a double-headed coin. Again the probability of both being heads is now 50%. Here there is causation involved, the fact of one coin being double-headed causes it to be more likely that both show heads.

But the numbers are the same in each scenario, and for the same reason. Whether causation is involved or not is irrelevant.

The Suspense Account

So what of the judge’s “long digression” (paras 810-820), before he gets to the nub of why Dr Worden’s approach is nonsense? Well, it can be put into the context of my murder trial analogy, thus:

(a) you should take into account that the homicide rate varies between male and female, young and old, and adapt your numbers to the age and sex of the accused (etc)

(b) the homicide rate may be (a lot) higher than you are actually aware of

Both points are correct, but tinker around the edges of the issue: neither point changes the fact that Dr Worden’s approach is quite unsound. Which is why I call the passage a long digression.

But it is a really interesting digression. Because on point (b), the issue is that the Post Office ran a suspense account. This consisted of all the bits of money the Post Office had, but didn’t know why it had them (or, one must infer, whether it should have them).

This really made me think WTF??? The Post Office runs an accounting system which can’t tell where all their money has come from. So why does it think that its accounting system is good enough to tell it that missing amounts of money must be the fault of the postmasters?

Maybe I’m naive: maybe this is normal in such large organizations. Page 208 of Nick Wallis’s book suggests that it would be a “miracle of finance” not to require such a suspense account. But I stand by saying that if their accounting system can’t tell why they have the money they have, then it can’t possibly be adequate to tell them why they are missing the money they are missing.

And I won’t deny that the Post Office’s accounting is probably better than my own. For example, I often find myself wondering where all the money I took out of an ATM a week ago has gone. But I don’t go making accusations of theft against the visitors to my home during that week!

There is a further point here which I myself didn’t pick up until reading Paul Marshall’s submission to the Williams Inquiry: how can they be sure that none of the amounts in the suspense account are actually the very same amounts that are missing from the subpostmasters’ accounts? If these could be the same amounts, then the Post Office was prosecuting subpostmasters for missing money which was actually in the Post Office’s hands.

This is alluded to in para 810 of the judgment (quoting the claimants)

“38. The Defendant operated one or more suspense accounts in which it held unattributed surpluses including those generated from branch accounts. After a period of 3 years, such unattributed surpluses were credited to the Defendant’s profits and reflected in its profit and loss accounts. 39. The Defendant thereby stood to benefit and/or did benefit from apparent shortfalls wrongly attributed to the Claimants which did not represent real losses to the Defendant.”

and in Nick Wallis’s book at page 381:

‘The Post Office has improperly enriched itself through the decades,’ he [Second Sight’s Ron Warmington] thundered, ‘with funds that have passed through its own suspense accounts. Had its own staff more diligently investigated in order to establish who were the rightful owners of those funds, they would have been returned to them, whether they were Post Office’s customers or its Subpostmasters. …’

This is also mentioned in a submission by Paul Marshall to the Williams Inquiry, see link to Paul-Marshall in https://www.postofficehorizoninquiry.org.uk/key-documents/written-submissions-november-2021 and see pg 6 item c.

Second Sight identified the existence of unattributed/unallocated funds/receipts in Post Office suspense accounts. This raises the important, indeed troubling, question as to whether the Post Office had in fact received monies for which it variously prosecuted, or pursued civil claim against, postmasters. That is an issue/question that to my knowledge remains unresolved. See further Second Sight Final Report April 2015 [at https://www.jfsa.org.uk/uploads/5/4/3/1/54312921/report_9th_april_2016.pdf (sic)] paragraphs [2.15], [2.16].

Coincidentally, an illustration from my own experience

I read Nick’s book over four days. By quite a striking coincidence, during those very four days, I received a cheque for over £18000, paid to me in error. (It was for the redemption of a share fund investment – but the same amount had also been deposited into my bank account).

I am not making this up! Even though Dr Worden’s arguments would conclude that I am, as follows (paragraph numbers are references to the analogous paragraphs in his first report):

if the financial institution paid me double then it would most likely have paid everyone double, on average (see para 784.2)
if the financial institution paid out everyone double, then it would fairly quickly notice the situation (see paras 785, 787-791) (this point I can accept)
(therefore, we infer) it didn’t happen, to me or to anyone else

or, putting his argument another way

after a lot of effort making an educated guess, say that the institution’s accounting systems suffer a glitch like this at a rate of one per 10,000 customers (or some other plausible number) and the amount of money involved (averaged over all customers) is (according to the average account size) say £3 (see paras 746-748)
this is a tiny fraction of £18000, the amount of error I claim to have seen (see para 751)
(therefore, we infer) it is most unlikely that this has happened to me and so I must be making this up

The bottom line of all this is obviously that it is possible for a system to make errors occasionally, and not to make them all the time.

So how do we evaluate the famous statement by Lord Hoffmann “It is notorious that one needs no expertise in electronics to be able to know whether a computer is working properly.” (DPP v. McKeown and Jones [1997] 1 WLR 295, 201 C-D, https://publications.parliament.uk/pa/ld199697/ldjudgmt/jd970220/mcke02.htm ) in the context of this incident?

Apart from the fact that this is “expert” evidence given from the bench, and so not subject to cross-examination, by a person unqualified to give such evidence, it’s just plain wrong.

Mostly the institution’s systems work fine. This will be the experience of experts in electronics and non-experts alike. Sometimes (rarely), as on this occasion, they don’t. Almost all experts and non-experts alike will be unaware of that. The tiny fraction of people affected, experts and non-experts alike will be aware of it (at least when the amount involved is £18000).

Possibly the error was triggered by unusual or idiosyncratic human input. Or possibly human error, not caught by the computer-based accounting system. Who knows? And, in the context of prosecuting people on account of such errors, so what?

ENDS

4 responses to ““This guy should never be let near a witness-box again””

Leslie Green CEng MIEE

July 12, 2024 at 12:09 am

To be clear, I totally agree with Jeremy’s article.

————————————————————————–
I would like to clarify the response given by Double-Doc however.

You can add probabilities for RARE events. Consider two independent events, A and B.
Write “probability of event X” as P(X).

P(A) = a P(B) = b
a < 0.003 b < 0.003

P(not A) = 1 – a
P(not B) = 1 – b
P( neither A nor B ) = (1 –a) x (1 – b) = 1 – (a+b) + ab
P( either A or B, or both ) = 1 – [ 1 – (a +b) + ab) ] = (a + b) – ab which is approx equal to (a+b)

The point is that ab is sufficiently small that it can be neglected for these rare events.
Adding rare event probabilities is very reasonable (and usual).
——————————————————————————
I have some general comments on the main topic:

Fujitsu had a “reconciliation” service running at night to fix up the errors introduced in the accounts. We haven’t heard too much about that. How many errors did it have to fix? This is evidence of poor quality transactions (along with the whole suspense account thing).

Rare events can’t be “systemic”. RUBBISH! Badly written software can have “collision” events with extremely low probability. You cannot reasonably test for such errors (except by writing test code to hammer the interface with thousands or millions of events to evaluate collision probability and check the collision handling code.

Typically hardware can have many inputs via “buttons” (soft or otherwise). These can be pressed in arbitrary orders and at various times. The same operator may well have a particular way of working which is not expected as far as the software is concerned. Thus a particular operator can create problems that others may not see, (I’m not “blaming” the operator. It is the job of the programmer to ensure correct activity.)

Operators can get frustrated with a slow response and press buttons many times to “get it to work”. If the software can’t handle that, then again that is poor software quality. Our old quality manager used to press buttons randomly on the interface, something no trained user would (hopefully) ever do. But he often “broke” the software, forcing the softies to code it more robustly!

Software engineers with no hardware understanding “test” their code by testing functions, then declare it is fully working. Recall the Therac-25 case where they swore it couldn’t possibly produce a “death ray”? Except they didn’t consider the hardware timings, and it killed or injured people.

I have been given “fully tested” code at work, which nevertheless didn’t actually work because the idiot (most senior programmer) never tested its speed. It wasn’t able to keep up with the hardware system, and therefore just dropped commands randomly. I had to put an oscilloscope on it, with physically testable outputs, to prove it wasn’t working fast enough.

A particular mechanical design software system at work was to be upgraded. It would take a week. It took 6 months! The software had several bolt-on (third party) components, so nobody knew how the whole system worked. Expert after expert tried and failed to upgrade the software version, and the core design repository. Total nightmare. Horizon had bolt-on bits like riposte, and Tivoli, and the IBM banking app. I have yet to find a system diagram that maps out all the different third party software systems.

Ever encountered timing issues? Different hardware typically has different real time clocks. If they get out of sync, what happens? Messages can appear to be in the wrong order, and that breaks things! Unless you are a FOREX trader, selling something you don’t yet have can be problematic.

I was at Sainsbury’s the other week. One item failed to scan correctly (it compares the weight to the code). An assistant came over, used his magic code, and “fixed it”. Except it wasn’t fixed. The barcode was a code for a different (more expensive product). He evidently overrode the weight limit, and ended up with two items at £1.80, rather than the single item at £1.30 which should have been reduced to £1.00 due to Nectar pricing. If I hadn’t spotted that duplicated (and wrong price) item I would have been out of pocket by £2.60. That is just a cash till, which is relatively easy compared to Horizon.

Chris Marsden

May 26, 2024 at 4:18 pm

My mind boggles, with thanks for widening my Horizons!

Surely there is a corollary: what is the statistical probability of 930+ SPMs, with ‘skin in the game’ knowing they’d have to make good the losses ‘it’s in the contract mate’, going ‘rogue’ after perhaps 20 years as a magistrate, or a retired police officer, or serving as a councillor, or having run a sub-post office for decades, and putting many tens of £k buying into a dream?

Double-Doc (Physics and Mathematics)

April 12, 2024 at 2:25 am

It is almost beyond belief that someone with a Ph.D. in particle physics, as Worden states he is, could possibly make the elementary statistical blunder of computing the probability of a compound event by -adding- together the individual probabilities of the *independent* events concerned.

Correct is instead to -multiply- them together, as observed by The Honourable Mr Justice Fraser at 830 in [2019] EWHC 3408 (QB), and expanded upon by Jeremy Dawson.

To add them instead is, put bluntly, nonsensical.

What follows is not covered by either Dawson or the judge in Alan Bates and Others (Claimants) v. Post Office Limited (Defendant) – Judgment (No. 6) “Horizon Issues”.

The first part is a simple mathematical excursion and the second is shocking.

Q. Could these operations (+, x) EVER yield the same result for a pair of probabilities?

A. Only when both probabilities are zero (i.e., both events impossible, which is inapplicable to the use to which Worden put them).

Proof:

Let us first find the solution set for x+y = xy

This equation is of a HYPERBOLA and (x,y) must lie on it.

I display it here – https://ibb.co/9yqBgkX and, like with all real-valued hyperbolae, it has two limbs; one I’ve coloured red, the other blue.

But there is a constraint – since x and y are probabilities, each must lie between 0 and 1, both inclusive. (x,y) must therefore lie somewhere within or on the edge of the yellow-shaded square.

The red limb of the hyperbola never gets even close to intersecting that square. The blue limb touches it at one point, the origin (i.e., (0,0)).

So x=0 and y=0 is the only solution for xy = x+y where the constraints 0<=x<=1 and 0<=y<=1 apply.

Proved.

This is elementary maths. I could have done this when I was seven, though less concisely.

But now we come to the most SHOCKING part of it all.

From Worden's own CV downloaded from https://is.gd/wordenCV we see the academic qualifications he (claims to) possesses:
1964 – 1965 Université de Grenoble
1965 – 1968 Cambridge University; MA (1st Class) in Natural Sciences
1968 – 1971 Cambridge University and Cal Tech; PhD in Theoretical Particle Physics

Here, "Natural Sciences" connotes Physics.

The divide between his Ph.D. specialism of Particle Physics and Quantum Mechanics is very blurry – uncertain, one could say.

And, Quantum Mechanics is all about the application of probability theory!

So Worden must at some point in his life have been competent with probabilities.

My jaw is on the floor.

This makes the blunder even more scandalous, and opens up all manner of speculation from which I will refrain.

Don't all the cases – or at least the recent ones – where he's appeared as an expert witness merit further examination?

I happen to be a bona fide and expert at mathematics, physics, programming and accounting, with some relatively humble qualifications in law too, but not at creating time out of nothingness.

Shame, that….

1. Nick Wallis
  
  April 22, 2024 at 11:11 am
  
  Thanks for this!