![]() |
|
Readme fileSERIES C
|
||||||||||||||||||||||||||||||||||||||
| n1 <- 73 n2 <- 73 p <- 231 | |
| seqfile1 <- 'p24_73C.fas' | # subtype C |
| seqfile2 <- 'p24_73B.fas' | # subtype B |
| seqfileref <- 'Referenceseq.txt' | # Reference sequence, the C/B consensus sequence |
| alpha <- 0.05 | |
Unweighted analysis:
Blosmat <- matrix(rep(1,21*21),ncol=21)
weightprior <- rep(1,p)
Delta <- matrix(rep(1,21*21),ncol=21)
fastAtobinary(n1,n2,p,seqfile1,seqfile2,seqfileref,Blosmat,Delta,weightprior,alpha)
fastAtobinary writes out the three input files inputexample.dat,
input1.dat, and input2.dat that are used by the Fortran program
genomescanTaroneFDR.f.
input1.dat contains the sequence data for group 1
and input2.dat contains the sequence data for group 2, in the binary
format described above. The file inputexample.dat contains the
following variables:
| n1 n2 p | # nk = No. of sequences in group k, k=1,2 # p = No. of positions in the sequences, i.e., the length of the aligned sequences |
| alpha | # The type I error rate (e.g., 0.05) |
| 1st row of Blosmat | # Blosmat is a 21 by 21 matrix that weights # amino acid substitutions; the 20 rows |
| 2nd row of Blosmat ... | # are for the 20 amino acids and the 21st row # is for a gap (insertion or deletion). # The simplest choice for Blosmat is simply # a matrix of all 1's. |
| 21st row of Blosmat | |
| 1st row of Delta | # Delta is a 21 by 21 matrix that weights # biostructural importance of amino acid |
| 2nd row of Delta ... | # substitutions for a different test statistic # than Fisher's exact test statistic. For the # current program Delta is set to the matrix # of all 1's. |
| 21st row of Delta | |
| weightprior | # a p-vector of prior weights on the p positions # 1, ... , p. These weights are not used for Fisher's # exact test; they are used for other test statistics # studied in other work. Thus, weightprior is set # to be a vector of 1's for the current application. |
| yyref | # the sequence data for the reference sequence |
| entry 1, 1, 21 entries | |
| entry 1, 2, 21 entries ... | |
| entry 1, p, 21 entries | |
| entry 2, 1, 21 entries |
and so on... (a large file of 1's and 0's)
After the Splus function fastAtobinary is run, the Fortran program can be run at a Unix or Linux command line. The Fortran function can be compiled with the command f77 -O3 genomescanTaroneFDR.f -o genomescanTaroneFDR.out, and run with the command: genomescanTaroneFDR.out
The file genomescanTaroneFDR.f contains documentation describing the output of the Fortran program.
Contact information:
Peter B. Gilbert
Fred Hutchinson Cancer Research Center
1100 Fairview Avenue North
PO Box 19024
Seattle
WA 98109
USA
Phone: 206 667-7299
Fax: 206 667-4812
E-mail: pgilbert@scharp.prg
SERIES
A
Statistics
in Society
SERIES B
Statistical
Methodology
SERIES C
Applied
Statistics
SERIES D
The
Statistician