Readme file

SERIES C  
Applied Statistics

Effectiveness of potent antiretroviral therapy on progression of human immunodeficiency virus: Bayesian modelling and model checking via counterfactual replicates, C. Berzuini and C. Allemani
Appl. Statist., 53 (2004), 633 - 650

-------------------------- INTRODUCTION -------------------------

The files contain the data set and the WinBUGS code used in the model
fitting process as part of the application described in the paper. For
further information please contact:

Carlo Berzuini,
MRC Biostatistics Unit, Institute of Public Health, Cambridge CB2 2SR,UK
E-mail: carlo.berzuini@mrc-bsu.cam.ac.uk
Tel: +44 1223 330 369
Fax: +44 1223 330 388

This document contains a section describing the data set, followed by
a section describing the input data format required by our WinBUGS
model fitting program.

--------------------- SECTION 1: THE DATA SET -------------------

The data set analyzed in the paper, kindly provided by P. Pezzotti
and G. Rezza, Istituto Superiore della Sanita', Roma, Italy,
originates from the multicenter Italian Seroconversion Study.
The following three references to the Study may be useful:

REFERENCE 1: Rezza, G., Lazzarin, A., Angarano, G. et al (1989):
The natural history of HIV infection in intravenous drug users:
risk of disease progression in a cohort of seroconverters. AIDS,
3, 87-90.

REFERENCE 2: The Italian Seroconversion Study (1992):
Disease progression and early predictors of AIDS in HIV-seroconverted
injecting drug users. AIDS, 6, 421-426.

REFERENCE 3: Pezzotti, P., Pappagallo, M., Phillips, A.N., Boros,
S., Valdarchi, C., Sinicco, A., Zaccarelli, M. and Rezza, G. (2001):
Response to highly active antiretroviral therapy according to
duration of HIV infection. Journal of Acquired Immune Deficiency
Syndromes, 26, 473-479.

The data have been generated by the follow-up of 457 HIV-positive
Italian homosexual men.

The ISSDATA.TXT data file available on this site is an ASCII flat
file containing a rectangular data matrix, where each row corresponds
to a single CD4 measurement on a specific subject. Consequently, the
the number of rows in the matrix equals the total number of CD4
measurements in the analysis. The columns are separated by blank
spaces and correspond (in the order) to the following variables:

COLUMN 1: subject serial number (integer number)
COLUMN 2: serial identifier of the observation within the subject
(integer number)
COLUMN 3: total number of CD4 measurements within the current subject
(integer number)
COLUMN 4: date of seroconversion for the current subject, as an
alphanumeric string ("ddmmyyyy"). This date was taken to be the middle
point between the subject's last documented negative HIV test result
and his first documented positive result.
COLUMN 5: date of birth for the current subject, as an alphanumeric
string ("ddmmyyyy")
COLUMN 6: ethnic group for current subject (irrelevant to our analysis).
COLUMN 7: date of AIDS onset as an alphanumeric string ("ddmmyyyy").
This date was defined in a way which is independent of CD4 level.
Those AIDS onset dates which are unavailable due to censoring are
represented as "."
COLUMN 8: Censoring indicator, which takes value 1 if AIDS onset
has been observed in the current subject, and value 2 otherwise.
COLUMN 9: last date at which the subject was seen AIDS-free
(if censored) or AIDS onset date (if the subject is uncensored).
This column is motivated by the fact that in certain subjects
who were last seen AIDS-free much later than their last CD4 measurement.
COLUMN 10: date of current CD4 measurement
COLUMN 11: observed CD4 count
COLUMN 12: date started on mono-therapy
COLUMN 13: date started on dual therapy
COLUMN 14: date started on HAART therapy.

-------------- SECTION 2: WINBUGS INPUT DATA FORMAT --------------

The data contained in the ISSDATA.TXT file must be processed and
re-formatted for analysis by our WinBUGS program. In our analysis,
the processing was carried out by special-purpose SPLUS routines,
and involved a number of steps. First, we eliminated the small subset
of observations taken after AIDS onset (we have left them in the file).
We also eliminated from subsequent analysis very few subjects
with a single CD4 observation. Finally, the data information was
converted into suitable data structures for input to WinBUGS
analysis. These structures are described in the following.

First we describe the format for CD4 measurement data and then
the format for event history data information.

CD4 measurement data are arranged as shown in the table below,
whose three columns contain subject identifiers (IDENT), log CD4
measurements and CD4 measurement times (CUM.TIME). The first line
in the table, for example, indicates that the observed log CD4 level
in subject 1 at time 2 was 6.38. Line 12 indicates that the observed
log CD4 level in subject 2 at time 27 was 6.2, and so on. Measurement
times are, in general, irregularly spaced.

(1) (2) (3)
IDENT log(CD4) CUM.TIME
_____________________________
1 6.38 2
1 5.81 4
1 6.31 6
1 6.34 7
1 5.62 8
1 5.92 10
1 5.77 11
1 6.08 12
1 5.82 14
1 5.67 15
1 5.78 16
2 6.2 27
2 6.1 28
2 5.9 29
2 5.68 32
2 5.83 34
2 5.68 36
3 6.74 41
3 6.13 42
... ...... ......

In the above table, the third column represents time measured along
a single, linear, scale called CUMULATIVE time. This is described
by the diagram below, where the the total (pre-AIDS) person time
of observation is subdivided into a sequence of disjoint segments,
each segment representing the pre-AIDS observation period of a subject:

|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-v-|-> cumulative time
|<--- subject 1 --->|<--- subject 2 --->|<--- ....

Event history data include information about failure times and about
HAART treatment intervals. In our analysis the "failure" is defined to
be the time of onset of AIDS. These data are arranged as shown in the
table below. Every row of the table corresponds to a specific time
interval (indicated in column "TIME SINCE SEROCONVERSION") of the
the pre-AIDS (post-seroconversion) history of a specific subject,
identified by column "IDENT". Column "CENS" indicates whether that
subject in that interval developed AIDS (CENS = 1) or not (CENS = 0).
Column "HAART" indicates whether that subject in that interval
received HAART (HAART = 1) or not (HAART = 0). For example, the first
11 rows of the table below tell us that subject 1 was observed for
11 time intervals post-seroconversion, during which he did not
develop AIDS. Subject 1 was started on HAART during interval 8.
Subject 2 was followed up for the full time to AIDS onset, which
occurred in his 6th time interval before HAART could be started.
Column "NEWSUBJECT" marks transitions from one subject to the next.

(4) (5) (6) (7) (8)
IDENT CENS HAART CUMULATIVE TIME SINCE NEWSUBJECT
TIME SEROCONVERSION
_______________________________________________________
1 0 0 1 1 1
1 0 0 2 2 0
1 0 0 3 3 0
1 0 0 4 4 0
1 0 0 5 5 0
1 0 0 6 6 0
1 0 0 7 7 0
1 0 1 8 8 0
1 0 1 9 9 0
1 0 1 10 10 0
1 0 1 11 11 0
2 0 0 12 1 1
2 0 0 13 2 0
2 0 0 14 3 0
2 0 0 15 4 0
2 0 0 16 5 0
2 1 1 17 6 0
3 0 0 18 1 1
3 0 0 19 2 0
... ... ... ... ... ...

Finally, the following data table reports, for each subject in study,
the time at which HAART treatment was started (THAART), meeasured
from the subject's seroconversion, and the estimated log CD4 level
at that time (BASELINECD4):

(9) (10)
SUBJECT THAART BASELINECD4
________________________________
1 8 5.78
2 6 5.68
3 0 5.04
4 10 6.56
5 12 6.0
... ... ...

Those columns of the above tables which have been numbered (1)-(10)
are represented in the BUGS program by the
following linear vectors:
(1) IDENT.MEAS,
(2) LOGCD4.MEAS
(3) CUM.TIME
(4) IDENT.BIN,
(5) CENS.BIN
(6) HAART.BIN
(7) TIME.BIN
(8) NEWSUBJECT.BIN
(9) THAART
(10) BASELINECD4

The following quantities must also be provided to BUGS as part of
the input data:

NCD4: number of rows of the first table illustrated above
BINCOUNT number of rows of the second table illustrated above
NSUBJECTS: total number of subjects in the analysis
BINMAX: maximum length of an individual subject's observation period

  • Datasets (bugsprogram.zip, size - 64KB)
Journals

SERIES A
Statistics in Society

SERIES B
Statistical Methodology

SERIES C
Applied Statistics

SERIES D
The Statistician