ANALYSIS OF THE HO & SHAW PAPERS
By Mark Craddock
I have completed a preliminary analysis of the papers by Ho and Shaw
which appeared in Nature, January 12, 1995. My considered opinion is that
they are total rubbish. I seriously doubt that the two groups really have
any idea what they are doing when they construct their supposed models
of the interaction of the virus and the immune system. The models when
analysed properly do not do what they think they do.
To begin with, a few oddities and yet another remark about QC-PCR. In
the Shaw paper (Wei et al) they study 22 patients with CD4 counts between
18 and 251 . They claimed that plasma viral RNA levels in the 22 subjects
at baseline ranged from 10^4.6 and 10^7.2 with geomteric mean 10^5.5. (The
notation 10^4.6 is 10 to the power of 4.6, which you can work out on any
scientific calculator. 10^4.6 =39810, or 19,905 virions per ml of blood.
10^7.2 = 15848932, or 7924466 virions per ml. 10^5.5 = 316228, or 158114
virions per ml. Of course the accuracy given here is ludicrous. They can't
really mean that they can measure things this accurately) Note that they
say geometric mean. What is the geometric mean I hear you ask? Well this
is obtained by multiplying the numbers together and taking the nth root.
So for two numbers you multiply them together and take the square root.
For 3 numbers you take the cube root of the product and so on. The Geometric
mean is always less than the arithmetic mean, or average , except when
the numbers are identical, in which case the two means are equal. Why have
they used the geomteric mean here? Well the only reason we could think
of (myself and a colleague) is
that the geometric mean smooths out ratio changes. This might be important
to make their estimates of viral load by QC-PCR look more consistent than
they really are. If they are estimating changes in viral load by taking
ratios of QC-PCR measurements at different times, then the geometric mean
of these variations will show less variability than the arithmetic mean.
It makes their results look more consistant than they really are.
The idea behind QC-PCR is to amplify (mass produce) target DNA together
with some DNA which acts as a control , and is used to estimate the size
of the unknown target. So if there is X amount of target present, which
you do not know, you add Y amount of the control, and amplify the two together.
After n PCR cycles, you end up with Xn target DNA, and Yn control DNA.
The assumption is that Xn/Yn = X/Y for all n. Because you can now measure
Xn/Yn, and you know what Y is already, you can work out X. The critical
assumption is that these 2 ratios are the same. But as Todd Miller has
explained, this assumption is not correct, and data obtained by this method
will be wrong. How wrong? Well the formulas for Xn and Yn give you the
way to work this out.
Xn = X(1+Ex)^n
Yn = Y(1+Ey)^n
Ex and Ey are the two efficiences. Using QC-PCR assumes these two numbers
are EXACTLY equal. Close is not good enough. The error in the QC-PCR estimate
is given by ((1+Ex)/(1+Ey))^n. This gets bigger the more cycles n you choose.
In a paper by Piatak et al in Science in 1993, it was claimed that QC-PCR
was detecting millions of HIV RNA molecules per ml of blood plasma. They
used 45 PCR cycles. Well if in one experiment ((1+Ex)/(1+Ey)) = 1.25, say,
then after 45 cycles you overestimate the size of the unknown by a factor
of 23,000. You can also underestimate the size of X. As Luc Raeymaekers
pointed out, the published results on QC-PCR actually contain evidence
that Ex and Ey are not equal. I suspect that in Piatak et al's reconstruction
experiments (The senior author of this paper was none other than G. Shaw)
they fitted a bunch of straight lines to their data and got an answer that
was close to the true value and so assumed that the process would always
work. There are very good reasons to believe that it does not work at all.
Now to return to the Shaw paper in Nature. I suspect that their QC-PCR
estimates bounced around all over the place, and so they used the geometric
mean as a way of smoothing this variation out. I can't prove that since
they supply no data at all. But it would be good to see their actual numbers.
Another point to make before moving to the main theme is that they do
silly things with data. Like drawing straight lines through clouds of data
points and pretending that their straight lines have some meaning. Have
they never heard of polynomial interpolation, or Time series? Obviously
not. The point is that there are advanced mathematical techniques for handling
this kind of data, but they seem wedded to the idea of sticking a straight
line through any old collection of points and calling it a regression analysis.
Now to the new "mathematical understanding of the immune system
that their work provides" (to paraphrase Maddox in an English Newspaper).
Well it's bollocks. Pure and simple. Anybody who doesn't like mathematics
be advised that there is some coming up. I will try and make this as simple
as possible. And I will concentrate on the Ho paper. Similar comments can
be made about the Wei paper.
Ho et al estimate the rate of viral clearance by studying the equation
dV/dt = P - cV.
P is the rate of viral production and c is the rate of viral clearance.
dV/dt is the rate at which V changes with time. It is a differential equation,
and by a happy coincidence my area of research is in differential equations.
They make the fundamental assumption that the virus is in a steady state,
meaning that dV/dt = 0. This means that P = cV. In other words they are
assuming that the rate of virus production, this is before drugs are introduced,
is exactly equal to the rate of viral clearance. A more correct approach
would be to model viral production by
dV/dt = (a-c)V
and treat c as a parameter. When they come to study the interaction
of the virus and T cells, you have a problem in what is called bifurcation
theory. This studies what happens when parameters used in equations are
changed. The interaction of HIV and T cells in this problem depends upon
the behaviour of c. c cannot be a constant , because it varies depending
upon antibody production and the state of the immune system. Clearly as
the immune system declines, the ability to fight the virus must weaken,
and so c must decrease. And in the 3 months or more before antibody production,
c would be very small, perhaps close to zero. What Ho et al have done is
assumed that c always matches the rate of viral production, which is impossible.
Then they model the behaviour of the T cells by
dT/dt = P -muT
P here is the rate of T cells clearance, and is not to be confused with
P above. They should use different notation. This is a very badly written
paper. mu (that's the Greek letter mu) represents the cell decay rate.
This is a very odd equation. It predicts that the T cells decline exponentially
quickly to P/mu. Presumably they are assuming that this models T cell behaviour
when drugs are being administered, but why is not clear. There is also
a major problem ? Where is HIV? If they are assuming that the T cells are
declining because of the effects of HIV, then this equation must contain
some term involving the amount of virus present. It does not. So what the
hell is it supposed to mean? Well I would guess that they are assuming
that the amount of virus is constant, and so the effect of V is constant.
So presumably the term mu represents the constant effects of the virus
on the T cell population.
But this bears no resemblance to what actually happens in AIDS patients.
Here you have an exponential decline in T cell numbers to a steady state
P/mu, which could be quite high depending on mu. In AIDS we have a slow
decline over ten years or more to close to zero. So Ho's model does not
describe what actually happens in AIDS patients.
However it is actually a good deal worse than this. Because the parameter
c does not always match the viral production rate a. If c > a, the virus
is rapidly cleared from the body and the T cell count remains high. In
other words the patient recovers. If a > c, which would be the case
before antibodies appear, and we have 3 months or more in which this is
the case remember, then we have
V(t) = V(0)exp(bt) b = a-c > 0
Now let us look at our equation for T. If we put the effect of V into
the equation, we must have
dT/dt = P -mu(V)T - f(V).
So the rate of cell decline mu depends on the amount of virus present.
This must be the case. You can think of this term as perhaps modelling
the effects of apoptosis. The term f(V) represents the decline caused by
direct killing of T cells by HIV. If V = V(0)exp(bt), then what does this
say about the behaviour of T over time? To model this you need an expression
for mu(V) and f(V) . Choose f(V) = 0. And approximate mu(V) by a Taylor
series to first order.
mu(V) = mu(0) + mu'(0)V
Substituting into the equation for T, we get
dT/dt = P - (mu(0) + mu'(0)V(0)exp(bt))T
solving this equation, and picking b = (log2)/4 (Ho and Shaw estimate
that a is twice this value in their paper, and so this is a conservative
estimate) and picking V(0) =1, so there is only 1 virus to start with,
and picking some other very conservative values, you find that T drops
to less than 5% of the original value in 20 days. In other words you should
have AIDS in 20 days with this rate of viral production if T cells are
dying from apoptosis.
If we assume there is no apoptosis and that direct killing is responsible,
we get mu = constant and pick some form for f(V). I chose as the simplest
most conservative form f(V) = kV. That is, the killing is directly proportional
to the amount of virus present. This gives with the same parameters above,
and picking k very small (meanining that a lot of virus is needed to kill
one cell), then AIDS develops within 60 days of infection. By AIDS I mean
it takes about 60 days for every single T cell in the body to be killed.
Ho uses the analogy of a sink with water pouring into it but the drain
is open. He argues that the virus is killing slightly more cells than the
body can replace and so you get a slow decline in T cells. In terms of
his analogy, the water flows out of the sink slightly faster than it flows
in. A better analogy would be that as the water level drops, the drain
gets bigger. So the process speeds up. Ho and Shaws data if you read it
correctly predicts that AIDS should develop in a matter of days after infection
or at most a few months. This is what exponential growth is all about.
The virus grows exponentially , doubling in number every 2 days in the
absence of an immune response they say. So when the immune response is
weakest, before antibody production, it should kill every T cell in the
body quickly. This does not happen. I wonder how they explain this? *
Mark Craddock PhD
School of Mathematics
University of New South Wales, Australia