Readme file

SERIES C  
Applied Statistics

Analysis of functional status transitions by using a semi-Markov process model in the presence of left-censored spells, by L. Cai, N. Schenker and J. Lubitz
Journal of the Royal Statistical Society, Series C, Applied Statistics, Volume 55 (2006), 477 - 491

SAS Programs in text format

These programs are written in SAS. They are recommended to run in SAS 9.0 or later. When using these programs, please change the extension of these files from .txt to .sas.

1. SEMAG1A.txt and SEMAG1B.txt
These are the main programs to fit the SMP model using the stochastic EM algorithm. The two programs are identical in structure and purpose, except SEMAG1A.txt estimates for the overall population of 65-year-olds and SEMAG1B.txt estimates for male and female 65-year-olds separately.

Both programs use the same initial assumption of R=0 to start the algorithm and run 45 iterations. The E-step and M-step are performed at each iteration. Between iterations 26 and 45, the program takes the current estimates of transition probabilities to estimate annual health status for 250000 65-year-olds and record their simulated data in separate data sets to derive summary estimates. An example of such data is described below.

2. BS_SEMLEA.txt, BS_EMREPA.txt, BS_EMREPA_1.txt, BS_EMREPA_2.txt and BS_PREV.txt
These programs are used to estimate the bootstrap standard errors for life expectancy at 65 for the entire population. BS_SEMLEA.txt is the main program that controls the resampling of 50 bootstrap samples. Two samples are generated and fit simultaneously in a batch by BS_EMREPA_1.txt and BS_EMREPA_2.txt. BS_EMREPA.txt is the program that fits the SMP model to the bootstrap sample using the stochastic EM algorithm with the R=0 assumption. The first 20 iterations are the 'burn in' period, and during each of the following five iterations the program simulates annual status for a cohort of 100000 65-year-olds. The average life expectancy estimates from these five cohorts are taken as the estimates for this bootstrap sample. BS_PREV.txt is used to calculate weighted disability prevalence estimates from each bootstrap sample.

3. BS_SEMLEB.txt, BS_EMREPB.txt, BS_EMREPB_1.txt, BS_EMREPB_2.txt and BS_PREV.txt
These programs are similar to those described in 3, except they are used to derive standard errors for male and female 65-year olds.

4. MSLE65.txt, BS_MSLE.txt, BS_MSREP.txt, BS_MSREP_1.txt, BS_MSREP_2.txt and BS_PREV.txt
MSLE65.txt estimates the multistate life-table transition probabilities and simulates 400000 65-year-olds separately for male, female and both genders combined. Estimates for life expectancy, disability incidence and recovery are summarized from these simulated data. The other programs estimate the bootstrap standard errors for life expectancy at 65. Like the bootstrap programs for SMP-EM, 50 bootstrap samples are generated, and two are fitted simultaneously in a batch.

Data

Due to space constraints, C5808_DATA.txt is only a small subset of the full data set (250000 persons) that we use to derive summary estimates. It contains simulated health status from age 65 to death for the first 5000 persons who have no functional limitations at age 65. Each row of the data is an episode or spell of health states. The description of each column is given below.

Column 1: Personal identification number.
Column 2: Sex (Male=1, Female=2).
Column 3: Age at the beginning of the spell.
Column 4: Status at the beginning of the spell (1=no limitations, 2=1+ limitations in physical functioning, 3=1+ limitations in IADL, 4=1+ limitations in ADL).
Column 5: Status at the end of the spell (1=no limitations, 2=1+ limitations in physical functioning, 3=1+ limitations in IADL, 4=1+ limitations in ADL, 5=death).
Column 6: Age at the end of the spell.
Column 7: Duration of the spell, in years. The duration of the last event, death, is calculated at the middle of the annual age interval, that is, the difference between column 6 and column 3 minus 0.5.

Liming Cai
National Center for Health Statistics
3311 Toledo Road, Room 6330
Hyattsville
MD 20782
USA

E-mail: lcai@cdc.gov
Tel: +1 301-458-4133

Datasets (cai2.zip, size - 226KB)

Journals

SERIES A
Statistics in Society

SERIES B
Statistical Methodology

SERIES C
Applied Statistics

SERIES D
The Statistician