Readme file

SERIES B
Statistical Methodology

Local additive estimation, by J. Park and B. Seifert, pages 171–191


This file describes implementation of the local additive estimator proposed in the paper and Park and Seifert (2008), by using R functions.

1. Files included

There are five text files:

  • ladd_sbf.R: R source code for the function ladd.sbf, main function to compute the estimator based on smooth backfitting estimator.
  • band_cal.R: R source code for the functions band.cal and bandcmp
  • sim_MISE_ladd.R: R source code for the function sim.MISE.ladd, which computes bias^2 and variance component for MISE approximation.
  • res_var.R: R source code for the function res.var, which is used to estimate \sigma.
  • Example_ladd.R: example saved in text file.

The required parameters are explained inside each file.

2. Required package

Additionally, it requires R package SBF2, which is also available to download SBF2_0.3-3.tar.gz. To install the package, type:

R CMD INSTALL SBF2_0.3-3.tar.gz

After starting R, type:

library(SBF2) % required

3. Estimation

Main function: ladd.sbf is the main function which takes arguments of x, y, bandll for h, bandadd for w, and xout for output points.

laddtmp=ladd.sbf(x, y, bandll, bandadd, xout)

The estimates are given by laddtmp$est. Additionally, laddtmp$num gives number of observations used at each output point.

Creating an example data set in 2-dim:

Generate 400 observations from random uniform design on [-1,1]^2 and add normal error with standard deviation (\sigma = 0.5) to the regression function.

## Data

# regression function

Reg.ex <-function(x,a){
return(x[,1]^2+x[,2]^2+a/(1-a)*x[,1]*x[,2])
}

D = 2

ndata = 400

# random uniform [-1,1]^d

x = matrix(runif(ndata*d, min=-1, max=1), ncol=d, byrow=T)
# [-1,1]^d

ytrue = Reg.ex(x,a=0.4)

sd = 0.5

y = ytrue + sd*rnorm(ndata)

Setting output grid points (xout):

To obtain estimates on [-1,1]^d at equally distant grid points,

# output points

NGRID=21

outgrid = seq(-1,1,l=ngrid)

xout=array(NA, c(ngrid^d, d))

xout[,1] = rep(outgrid, ngrid)

xout[,2] = rep(outgrid, each=ngrid)

ytrue.out = Reg.ex(xout,a=0.4)

nout = nrow(xout)

Setting bandwidths (bandll and bandadd):

The local bandwidth h (bandll) and the local region w (bandadd) can be set to any numerical number or vectors within the range (0<h<1 and 0<w<1 for this example).

If scalar values are given, the same bandwidth will be used for all dimensions. For simulation purposes, one might consider several values for systematic search. For example, the local bandwidth h is set between 0.05 and 1, 21 values equally spaced on log scale. The local region w is determined relative to h so the ratio w/h is given between 1 and 5, 21 values in linear on log scale and then converted to proper w. Alternatively w can be set directly.

# set bandwidths (for simulation)

NH = 21

Nr = 21

hmin = 0.05

hmax = 1

bandalls = hmin*(exp((log(hmax)-log(hmin))/(Nh-1)))^(0:(Nh-1))

# equally spaced on log scale

ratadds = 1.*(exp((log(5)-log(1.))/(Nr-1)))^(0:(Nr-1))

# equally spaced on log scale

For individual use,

iband = 15

irat = 3

bandll = bandalls[iband] # or 0.5

bandadds = bandll*ratadds

bandadd = bandadds[irat] # or 0.6

laddtmp=ladd.sbf(x,y, bandll, bandadd, xout)

Boundary correction (boundary):

The main function ladd.sbf can take an argument of boundary. When this is set, the bandwidths at boundary points are increased accordingly. Interior points are not affected.

LADDTMP.BOUND=LADD.SBF(X,Y, BANDLL, BANDADD, XOUT, BOUNDARY=BOUND)

# boundary correction for bandwidths

BOUND = REP(1, TIMES=NCOL(X))%O%C(-1,1)

# no boundary correction (default)

bound = rep(1, times=ncol(x))%o%c(-100,100)

Support of x (xbound):

The support of x is assumed to be [-1,1]^d by default. To change the support,

# support on [0,1]^d

xbound = rep(1, times=ncol(x))%o%c(0,1)

laddtmp.xbound=ladd.sbf(x,y, bandll, bandadd, xout, xbound=xbound)

3. Computation of MISE

The main function used to evaluate MISE is sim.MISE.ladd. This computes bias and variance components separately.

## Computing MISE

# computing Bias^2

misetmp.b2 = sim.MISE.ladd(1, x, xout, ytrue, ytrue.out, bandalls=bandalls[15:16],

ratadds=ratadds[1:2])$MSE

# computing Var

nsim = 100

misetmp.var = sim.MISE.ladd(0, x, xout, bandalls=bandalls[15:16], ratadds=ratadds[1:2],

nsim=nsim)$MSE

mise = misetmp.b2$MSE + sd^2*misetmp.var$MSE # Nh x Nr matrix

4. Confidence intervals

For finite sample, the confidence intervals are computed based on the hat matrix H. The estimators are $\hat y = H\times y$ and the variance of $a^\prime\hat y$ is given by

$tr(a^\prime H H^\prime a)\sigma^2$.

Using linear relation, the hat matrix can be computed using the estimator.

# Computing hat matrix

# at outgrid points: xout = xout

# at design points: xout = x; ndata = nrow(xout)

Hmat = array(NA, c(nout, ndata))

for (iy in 1:ndata)
{
ytmp = rep(0,ndata)

ytmp[iy] = 1

Hmat[,iy] = ladd.sbf(x,ytmp,bandll,bandadd,xout)$est

}

$\sigma^2$ is estimated using the function
res.var.

# Estimating sigma^2

sig2 = res.var(x,y)

The variance of the estimator is computed and thus the confidence intervals are obtained.

# 95% CI

yhat = Hmat%*%y

var.yhat = apply(Hmat^2, 1, sum)

std = sqrt(var.yhat*sig2)

CI.low = yhat - 1.96*std

CI.high = yhat + 1.96*std

5. References

Go to http://www.maths.lancs.ac.uk/~parkj1/software/locadd_computing.html.

Park, J. and Seifert, B. (2008) On properties of local additive estimation based on the smooth backfitting estimator.

Supplemantary material to local additive estimation.


Juhyun Park
Department of Mathematics and Statistics
Fylde College
Lancaster University
Lancaster LA1 4YF UK

E-mail: juhyun.park@lancaster.ac.uk

Burkhardt Seifert
Biostatistics Unit
Institute for Social and Preventive Medicine
University of Zurich
Zurich 8001
Swizerland

E-mail: seifert@ifspm.uzh.ch

Journals

SERIES A
Statistics in Society

SERIES B
Statistical Methodology

SERIES C
Applied Statistics

SERIES D
The Statistician