Title of paper -------------- Bayesian analysis of two-component mixture distributions applied to estimating malaria attributable fractions. By P. Vounatsou, T. Smith and A. F. M. Smith Applied Statistics, volume 47 (1998), part 4 Description of program (V1.0) ----------------------- The program performs the Gibbs sampling algorithm to estimate, non parametrically, the mixing proportion of a two-component mixture distribution, according to the approach discussed in the above mentioned paper. It is written in FORTRAN 77 and it uses the NAG libraries (MARK 17) to perform maximizations, to generate random numbers and for sorting elements of vectors. To run the program you will need to set up the parameters in the PARAMETER statement in LINE 5. The main parameters that should be specified are 1) The number of categories (ng) 2) The way of defining the categories (icutoff) in order to calculate the mm(i) (no of observations from the training sample in category i), n(i) (no of observations from the mixture in category i) and the boundaries of each category i. The m(i), n(i) as well as the boundaries of the categories can be either calculated in the program (in this case you need to read the data from a file) or can be calculated outside the program (in this case you need to provide them using DATA statements) 3) Parameters relevant to the gibbs sampling implementation (t,d,m1,isim,iconv,seed). These parameters can be left with their DEFAULT values. Details Input parameters/data: The ng parameter in the PARAMETER statement (LINE 5) specifies the number of ordered categories over the range of x values (parasite densities in the context of the paper) of the first component. The total number of categories is kkk=ng+2 (one category for x equal to 0 and another one for x outside the range of sample values from g1(.). The program has two options. Option 1 calculates the number of observations n(i) from the mixture that belong to category i, i=1,2,...,kkk and the number of observations mm(i) from the training sample that belong to category i, by using the grouping() subroutine. Option 2 allows for external specification of the n(i) and mm(i) with a DATA statement. Option 1: The data are read from a file named . The filename is specified in the PARAMETER statement (LINE 5), by giving any name of 8 characters long in . This file should have two columns. The first one consists of 0s and 1s and specifies whether observation, x, come from the training sample (0) or from the mixture (1). The second column consists of the observations, x. An example file, read.dat is included with this diskette as a template. You need also to specify the following parameters in the PARAMETER statement of the program (LINE 5): n0 : number of rows in the read.dat file icutoff : 0 or 1 according to the way the categories are defined. If icutoff = 1 you need to specify the cut-off values of the categories over the range of x in a DATA statement eg. data bounds /0.,1.,2000.,4000.,6000.,8000.,10000.,12500., 15000.,17500.,20000.,25000.,30000./ This statement defines the categories as follows : Category Range of x 1 0 2 1<=x<=2000 3 2000=2*kkk-1 Implementation details of the Gibbs sampling algorithm ------------------------------------------------------ The algorithm allows for replicate chains. Parameters that should be specified in the PARAMETER statement (LINES 5 - 6) relevant to the Gibbs sampling implementation: m : number of replicate chains t : number of burn-in iterarions seed: seed for random number generator d : number of iterations which are used to calculate ergodic averages m1 : size of the final sample drawn from the posterior. It should be a multiple of d isim: If isim=1 then a sample from the posterior is drawn after convergence is obtained. l : number of batches of iterations (of length d). For each l, the ergodic average of the parameters is obtained. The total number of iterations will be: t+l*m You can assess convergence by plotting for each parameter ergodic averages vs iteration number. The Gelman Rubin diagnostic can be also calculated for every d iterations by specifying iconv=1 in the PARAMETER statement of LINE 5. FILES GENERATED BY THE PROGRAM: bounds.dat : Cut-off x values of the categories i cases.dat : number of observations n(i) from the mixture in category i controls.dat: number of observations mm(i) from the training sample in category i. ergmean.dat : file with the ergodic means. The columns of the file have as follows: Col 1 -> Iteration sequence Col 2:(kkk-1) -> Ergodic averages of theta_{1},theta_{2} ... theta_{k-1} Col kkk:(2*kkk-2)-> Ergodic averages of z_{1},z_{2},...z_{k-1} Col 2*kkk-1 -> lambda ergvar.dat : file with estimates of the standard errors of the parameters. The columns of the file have as follows: Col 1 -> Iteration sequence Col 2:(kkk-1) -> Estimates of standard errors of theta_{1},theta_{2} ... theta_{k-1} Col kkk:(2*kkk-2)-> Estimates of standard errors of of z_{1},z_{2},...z_{k-1} Col 2*kkk-1 -> Estimate of standard errors of lambda samples.dat : file with samples from the posterior.The columns of the file have as follows: Col 1:(kkk-1) -> Samples from marginal posteriors of the parameters: theta_{1},theta_{2} ... theta_{k-1} Col kkk:(2*kkk-2)-> Samples from marginal posteriors of z_{1},z_{2},...z_{k-1} Col 2*kkk-1 -> Sample from the marginal posterior of lambda gelm.dat : file with convergence diagnostics for each parameter.The columns of the file have as follows: C1 -> Iteration sequence Col 2:(kkk-1) -> Convergence diagnostic("estimated potential scale reduction", R) for parameters theta_{1},theta_{2} ... theta_{k-1}, respectively. Col kkk:(2*kkk-2)-> Estimates of R for parameters z_{1},z_{2},...z_{k-1}, respectively. Col 2*kkk-1 -> Estimate of R for the parameter lambda Contact address --------------- Penelope Vounatsou Swiss Tropical Institute Socinstrasse 57 Basle 4002 Switzerland Tel: +41 61 284 81 09 Fax: +41 61 271 79 51 email: vounatsou@ubaclu.unibas.ch This software is under revision. We plan also to make available updated versions from our web site: www.wb.unibas.ch/sti Copyright notice ------------------ This program is public domain software but WITHOUT ANY WARRANTY. If this software is used in work presented for publication, kindly reference the related paper Acknowledgements ---------------- This work was supported by a Swiss National Foundation grant 32-43527.95