Readme file
SERIES
B
Statistical
Methodology
Local additive estimation, by J. Park and B. Seifert, pages 171–191
This file describes implementation of the local additive estimator proposed in the paper and Park and Seifert (2008), by using R functions.
1. Files included
There are five text files:
- ladd_sbf.R: R source code for the function ladd.sbf, main function to compute the estimator based on smooth backfitting estimator.
- band_cal.R: R source code for the functions band.cal and bandcmp
- sim_MISE_ladd.R: R source code for the function sim.MISE.ladd, which computes bias^2 and variance component for MISE approximation.
- res_var.R: R source code for the function res.var, which is used to estimate \sigma.
- Example_ladd.R: example saved in text file.
The required parameters are explained inside each file.
2. Required package
Additionally, it requires R package SBF2, which is also available to download SBF2_0.3-3.tar.gz. To install the package, type:
R CMD INSTALL SBF2_0.3-3.tar.gz
After starting R, type:
library(SBF2) % required
3. Estimation
Main function: ladd.sbf is the main function which takes arguments of x, y, bandll for h, bandadd for w, and xout for output points.
laddtmp=ladd.sbf(x, y, bandll, bandadd, xout)
The estimates are given by laddtmp$est. Additionally, laddtmp$num gives
number of observations used at each output point.
Creating an example data set in 2-dim:
Generate 400 observations from random uniform design on [-1,1]^2 and add normal
error with standard deviation (\sigma = 0.5) to the regression function.
## Data
# regression function
Reg.ex <-function(x,a){
return(x[,1]^2+x[,2]^2+a/(1-a)*x[,1]*x[,2])
}
D = 2
ndata = 400
# random uniform [-1,1]^d
x = matrix(runif(ndata*d, min=-1, max=1), ncol=d, byrow=T)
# [-1,1]^d
ytrue = Reg.ex(x,a=0.4)
sd = 0.5
y = ytrue + sd*rnorm(ndata)
Setting output grid points (xout):
To obtain estimates on [-1,1]^d at equally distant grid points,
# output points
NGRID=21
outgrid = seq(-1,1,l=ngrid)
xout=array(NA, c(ngrid^d, d))
xout[,1] = rep(outgrid, ngrid)
xout[,2] = rep(outgrid, each=ngrid)
ytrue.out = Reg.ex(xout,a=0.4)
nout = nrow(xout)
Setting bandwidths (bandll and bandadd):
The local bandwidth h (bandll) and the local region w (bandadd) can be set to any numerical number or vectors within the range (0<h<1 and 0<w<1 for this example).
If scalar values are given, the same bandwidth will be used for all dimensions. For
simulation purposes, one might consider several values for systematic search. For example, the local bandwidth h is set between 0.05 and 1, 21 values equally spaced on
log scale. The local region w is determined relative to h so the ratio w/h is given between 1 and
5, 21 values in linear on log scale and then converted to proper w. Alternatively w can be set directly.
# set bandwidths (for simulation)
NH = 21
Nr = 21
hmin = 0.05
hmax = 1
bandalls = hmin*(exp((log(hmax)-log(hmin))/(Nh-1)))^(0:(Nh-1))
# equally spaced on log scale
ratadds = 1.*(exp((log(5)-log(1.))/(Nr-1)))^(0:(Nr-1))
# equally spaced on log scale
For individual use,
iband = 15
irat = 3
bandll = bandalls[iband] # or 0.5
bandadds = bandll*ratadds
bandadd = bandadds[irat] # or 0.6
laddtmp=ladd.sbf(x,y, bandll, bandadd, xout)
Boundary correction (boundary):
The main function ladd.sbf can take an argument of boundary. When this is
set, the bandwidths at boundary points are increased accordingly. Interior points are not
affected.
LADDTMP.BOUND=LADD.SBF(X,Y, BANDLL, BANDADD, XOUT, BOUNDARY=BOUND)
# boundary correction for bandwidths
BOUND = REP(1, TIMES=NCOL(X))%O%C(-1,1)
# no boundary correction (default)
bound = rep(1, times=ncol(x))%o%c(-100,100)
Support of x (xbound):
The support of x is assumed to be [-1,1]^d by default. To change the support,
# support on [0,1]^d
xbound = rep(1, times=ncol(x))%o%c(0,1)
laddtmp.xbound=ladd.sbf(x,y, bandll, bandadd, xout, xbound=xbound)
3. Computation of MISE
The main function used to evaluate MISE is sim.MISE.ladd. This computes bias and
variance components separately.
## Computing MISE
# computing Bias^2
misetmp.b2 = sim.MISE.ladd(1, x, xout, ytrue, ytrue.out, bandalls=bandalls[15:16],
ratadds=ratadds[1:2])$MSE
# computing Var
nsim = 100
misetmp.var = sim.MISE.ladd(0, x, xout, bandalls=bandalls[15:16], ratadds=ratadds[1:2],
nsim=nsim)$MSE
mise = misetmp.b2$MSE + sd^2*misetmp.var$MSE # Nh x Nr matrix
4. Confidence intervals
For finite sample, the confidence intervals are computed based on the hat matrix H. The
estimators are $\hat y = H\times y$ and the variance of $a^\prime\hat y$ is given by
$tr(a^\prime H H^\prime a)\sigma^2$.
Using linear relation, the hat matrix can be computed using the estimator.
# Computing hat matrix
# at outgrid points: xout = xout
# at design points: xout = x; ndata = nrow(xout)
Hmat = array(NA, c(nout, ndata))
for (iy in 1:ndata)
{
ytmp = rep(0,ndata)
ytmp[iy] = 1
Hmat[,iy] = ladd.sbf(x,ytmp,bandll,bandadd,xout)$est
}
$\sigma^2$ is estimated using the function
res.var.
# Estimating sigma^2
sig2 = res.var(x,y)
The variance of the estimator is computed and thus the confidence intervals are obtained.
# 95% CI
yhat = Hmat%*%y
var.yhat = apply(Hmat^2, 1, sum)
std = sqrt(var.yhat*sig2)
CI.low = yhat - 1.96*std
CI.high = yhat + 1.96*std
5. References
Go to http://www.maths.lancs.ac.uk/~parkj1/software/locadd_computing.html.
Park, J. and Seifert, B. (2008) On properties of local additive estimation based on
the smooth backfitting estimator.
Supplemantary material to local additive estimation.
Juhyun Park
Department of Mathematics and Statistics
Fylde College
Lancaster University
Lancaster LA1 4YF UK
E-mail: juhyun.park@lancaster.ac.uk
Burkhardt Seifert
Biostatistics Unit
Institute for Social and Preventive Medicine
University of Zurich
Zurich 8001
Swizerland
E-mail: seifert@ifspm.uzh.ch
|