Readme file

SERIES C  
Applied Statistics

Measuring firm performance by using linear and non-parametric quantile regressions, by M. Landajo, J. de Andrés and P. Lorca, pages 227–250;

CONTENTS

The following files are enclosed:

1. The file 'data.dat'. It is an ASCII text file containing the data set (the variable 'Total Assets' appears in the left column, and 'Return' is in the right column). Details on the data set appear in the paper (see Subsection 4.1).

2. A Matlab file named 'program', plus a number of Matlab m-functions (ASCII text files) containing modules which are executed when running 'program'.

INSTRUCTIONS

1. Copy 'data.dat' to the root directory C:\, and all the matlab files to a directory where they can be called by Matlab (you may need to modify the path in order to include the chosen directory).

2. Open the file 'program', within the Matlab Editor.

3. Provide the input information (default values appear in the file).

4. Run 'program'. (The Spline Toolbox of Matlab is required to properly
execute some functions.)

THE INPUTS

The following arguments are required in order to run 'program' (default values are provided):

a) The data file:

database='c:\data.dat';

b) The conditional quantiles (percentiles) to be analysed:

percentiles=[10,25,50,75,90];

c) The minimum ('minm') and maximum ('maxm') values permitted for the complexity index ('m', in the B-spline form) of spline models. For instance:

minm=3;
maxm=6;

d) The number of sheets in F-fold cross-validation (the data base is first reordered at random and then split into F sheets):

sheets=10;

e) The chosen degree of splines:

degree=3;

f) The number of resamples (B) for design matrix bootstrap:

B=500;

g) The battery of permitted complexities ('K', in the piecewise polynomial form, with K = m - degree) for splines in the linearity and predictive ability tests:

valuesofK=[1,2,3,4,5,6];

h) The part (ranging between 0 and 1) of the data base which is left out for prediction in the conditional predictive ability tests. (The data base is first reordered at random and then split in two independent (estimation/prediction) sets.)

prediction=0.14;

THE OUTPUT

When 'program' is run, the following results are obtained:

I. Results (chi-squared statistic, p-value) of Buchinsky's heteroskedasticity test (for details see Subsection 3.2 in the paper).

II. Results from linear and B-spline modelling for each percentile. These include, for each conditional quantile: (a) mean in-sample ALAD errors for the linear and (optimal cross-validated) B-spline models, as well as (b) cross-validated mean ALAD errors for each class of models and (c) the cross-validated complexity for spline estimates. (See Subsection 3.4 in the article for more information.)

III. Results (chi-squared statistics and p-values, for each conditional percentile and complexity index K) of the spline-based linearity tests. The technical details of the test are outlined in Subsection 3.5.

IV. Results (z-statistics and p-values, for each conditional percentile and each complexity index K) of conditional predictive ability ("binomial") tests. Subsection 3.6 in the paper provides details on these tests.

V. Results (z-statistics and p-values, for each conditional percentile and complexity index K) of conditional predictive ability ("normal") tests. Technical details also appear in Subsection 3.6.

VI. Two plots: (a) data vs. LQR estimates, and (b) data vs. spline estimates of conditional quantiles).

THE MATLAB FILES

The following Matlab m-functions are provided:

1) 'rq_fnm', 'lp_fnm' and 'bound': these functions are required to compute quantile regressions.

2) 'qregression' computes quantile regressions and their in-sample ALAD mean errors, by using the above m-functions.

3) 'bsplinebasis' builds the B-spline basis. It requires the Spline Toolbox of Matlab to be installed.

4) 'bsplinequantiles' computes the linear and B-spline quantile regressions, including in-sample and cross-validated diagnostics.

5) 'bootstrap' carries out design matrix bootstrap.

6) 'crossvlinear' computes cross-validated mean l_q errors of linear models for the specified battery of quantiles.

7) 'crossvspline' provides cross-validated mean l_q errors for the set of conditional quantiles, under the specified B-spline complexity (m).

8) 'selectbspline' selects the B-spline complexity (m) which minimizes cross-validated mean error for the q-th quantile.

9) 'heterosktest' computes the heteroskedasticity test proposed by Buchinsky (1998), particularised for bivariate quantile regressions (see Subsection 3.2).

10) 'testoflinearity' and 'linearitytest': they carry out the battery of linearity tests described in Subsection 3.5 of the paper.

11) 'binomialtest' computes a battery of conditional predictive ability ("binomial") tests (see Subsection 3.6 for details).

12) 'normaltest' provides the results of conditional predictive ability ("normal") tests (Subsection 3.6).

NOTE (LICENCE)

The files 'rq_fnm', 'lp_fnm' and 'bound' were written by Roger Koenker, and we downloaded them from the web site http://www.econ.uiuc.edu/~roger/research/rq/rq.html.

All the other programs in the above list were written by Manuel Landajo (2007). They are free software. You can redistribute and/or modify them under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or any later version.

FURTHER REMARKS

1. Some of the above functions (in particular, those which use the bootstrap) are computationally intensive. As an instance, for the default specifications in 'program', the whole set of computations takes about 25 minutes in a standard PC.

2. The m-file 'bsplinequantiles' requires the Spline Toolbox to be installed (more specifically, the m-function "bsplinebasis" uses Matlab's 'spcol' function). The tests of heteroskedasticity, linearity and conditional predictive ability can be run independently, and do not require the Spline Toolboox.

3. Successive executions of 'program' should produce slightly different results of the bootstrap-based tests and cross-validated diagnostics. Regarding the bootstrap tests ('heterosktest', 'linearitytest'), this is the natural effect of random variability in bootstrap resampling. As to the cross-validated diagnostics, variability is due to the fact that we included within the m-function 'bsplinequantiles' a step which randomly reorders the data before all the analyses are carried out. This reordering usually has small effect on F-fold cross-validated diagnostics, and none at all on leave-one-out cross-validation (which is obtained by setting 'sheets=519', i.e., the sample size in our data set minus 1). As to the out-of-sample (conditional predictive ability) tests ('binomialtest' and 'normaltest'), they can display considerable variability, as they depend on a random splitting of the data base which may generate fairly variable estimation/prediction sets. The small size of the available prediction sets in this paper may considerably reduce the power of out-of-sample tests.

4. Because of the especial structure of the data set (with very sparse data in some areas of the scatter plot), the spline estimates may vary considerably in specific zones (basically, on the right extreme of the plot).

5. A related issue is that singular design matrices may arise when too complex B-spline bases are chosen. This is due to the fact that columns of zeros may arise. When this pathology appears the results may be inaccurate, and a number of messages indicating rank problems will arise. The simplest solution for this is just to avoid too complex spline structures. A more elaborate strategy would imply to permit unevenly spaced spline knots. For this, a suitable mechanism should be included in the m-functions 'bsplinebasis', 'linearitytest', 'binomialtest' and 'normaltest'.

CONTACT INFORMATION

Manuel Landajo
Departamento de Economía Aplicada
Universidad de Oviedo
Avenida del Cristo, s/nº
33006 Oviedo
Asturias
Spain

E-mail: landajo@uniovi.es

Journals

SERIES A
Statistics in Society

SERIES B
Statistical Methodology

SERIES C
Applied Statistics

SERIES D
The Statistician