Readme file

SERIES A 
Statistics in Society

Improved estimation procedures for multilevel models with binary response: a case-study by, G. Rodríguez and
N. Goldman
Journal of the Royal Statistics Society, Series A, Statistics in Society, Volume 164 (2001), Part 2, pages 339-355

The data made available with this paper include four archives of simulated data and one archive with actual data from Guatemala.

Simulated Data:

iles rgs3bb1.zip to rgs3bb4.zip have zipped versions of 100 simulated datasets corresponding to a three-level structure with large family and community effects. Each of the four archives contains 25 datasets. The datasets are named s3bb1.dat to s3bb100.dat. Each dataset is in ascii format and has 8 variables on 2449 cases.

The variables are:

1 child id (2449 kids, ids in 1..14684)
2 family id (1558 families, ids in 2782)
3 community id
(161 communities, ids in 1..242)
4 binary dependent variable (0,1)
5 corresponding latent variable (logistic)
6 child-level covariate
(mean .0955621, range -.452557 .541957)
7 family-level covariate
(mean -.083816 , range -1.485043 to 2.284106)
8 community-level covariate
(mean -.6857591, range -1.818267 to -.0097808)
 

All datasets were generated using true parameter values equal to 0.665267 for the constant and equal to 1 for each of the three covariates, and using normal variates with variance 1 for the family and community effect. Note that only variables 4 and 5 vary across datasets.

Actual Data:

File rggudat.zip contains the two datasets analyzed in this paper.

guImmun.dat: The first dataset refers to complete immunization among children receiving any immunization. It has 2159 observations on 19 variables. The very first line is a header with variable names, so the file can be read into R or S-Plus using read.table(filename,header=T). The variables include child, family and community id numbers, the outcome coded 0-1, and a set of individual, family and community variables used as predictors. These appear in exactly the same order as Table 2 in the paper:

1 kid: child id (2159 kids)
2 mom: family id (1595 families)
3 cluster: cluster id (161 communities)
4 immun: whether fully immunized
(1=yes, 0=no)
5 kid2p: child aged 2+ years
6 mom25p: mother aged 25+ years
7 order23: birth order 2-3
8 order46: birth order 4-6
9 order7p: birth order 7+
10 indNoSpa: indigenous, speaks no spanish
11 indSpa: indigenous, speaks spanish
12 momEdPri: mother's education primary
13 momEdSec: mother's education secondary+
14 husEdPri: husband's education primary
15 husEdSec: husband's education secondary+
16 husEdDK: husband's education missing
17 momWork: mother ever worked
18 rural: rural residence
19 pcInd81: proportion indigenous in 1981
 

The last predictor is a continuous variable. All others are 0-1 dummy variables, representing discrete factors coded using the reference cell method. The omitted categories are child aged 1 year, mother's age less than 25, birth order 1, ladino, mother with no education, husband with no education, mother never worked, and urban residence.

guPrenat.dat: The second dataset refers to use of modern prenatal care among women using some form of prenatal care. It has 2449 observations on 25 variables. The first line is a header with variable names, so the file can be read into R or S-Plus using read.table(filename,header=T). The variables include level ids, the outcome, and individual, family and community-level predictors. These appear in the same order as Table 3 in the paper.

1 kid: child id (2449 kids)
2 mom: family id (1558 families)
3 cluster: cluster id (161 communities)
4 prenat: used modern prenatal care
(1=yes, 0=no)
5 kid3p: child aged 3-4 years
6 mom25p: mother aged 25+ years
7 order23: birth order 2-3
8 order46: birth order 4-6
9 order7p: birth order 7+
10 indNoSpa: indigenous, speaks no spanish
11 inSpa: indigenous, speaks spanish
12 momEdPri: mother's education primary
13 momEdSec: mother's education secondary+
14 husEdPri: husband's education primary
15 husEdSec: husband's education secondary+
16 husEdDK: husband's education missing
17 husProf: husband professional, sales, clerical
18 husAgrSelf: husband agricultural self-employed
19 husAgrEmp: husband agricultural employee
20 husSkilled: husband skilled service
21 toilet: modern toilet in household
22 tvNotDaily: television not watched daily
23 tvDaily: television watched daily
24 pcInd81: proportion indigenous in 1981
25 ssDist: distance to nearest clinic
 

All predictors are either continuous variables (numbers 24 and 25) or 0-1 dummy variables (all others) representing discrete factors coded using the reference cell method. Omitted categories are child aged 0-2, mother aged <25, birth order 1, ladino, mother with no education, husband with no education, husband not working or in unskilled occupation, no modern toilet in household, and no television in the household.

For more information please visit the authors' website at http://data.princeton.edu/multilevel or email the author for correspondence:

German Rodriguez
Office of Population Research
Wallace Hall
Princeton University
Princeton
NJ 08544-2091

E-mail: grodri@princeton.edu

Dataset (rgs3bb1.zip, 916kb)
Dataset (rgs3bb2.zip, 917kb)
Dataset (rgs3bb3.zip, 917kb)
Dataset (rgs3bb4.zip, 917kb)
Dataset (rggudat.zip, 40kb)

 

Journals

SERIES A
Statistics in Society

SERIES B
Statistical Methodology

SERIES C
Applied Statistics

SERIES D
The Statistician