Multivariate calibration is one of the major success stories of chemometrics, in particular owing to the large number of applications in nearinfrared (NIR) spectroscopy. It should therefore not come as a surprise that the reliability of predictions obtained using multivariate calibration methods such as partial least squares (PLS) regression and principal component regression (PCR) has received considerable attention since the eighties.
A good discussion of this reliability aspect can be found in:
 H. Martens and M. Martens
Multivariate analysis of quality  an introduction, Wiley, Chichester (2000)
To download the data sets (ASCII, Matlab 5.2 and The Unscrambler 7.6), visit
Wiley's Chemometrics World.
Pascal Lemberge and Pierre Van Espen note in their excellent review (Journal of Chemometrics, 16 (2002) 633634) that:
Chapter 10 is what the previous bestseller of Harald Martens (and Tormod Næs), Multivariate Calibration, lacked, also because at the time research on this topic was far from completed.
N.B. The methodology discussed below intends to improve the one proposed by Martens and Martens, which is currently implemented in certain commercial packages.
This page is organized as follows:
Official literature
Guidelines have been issued by the International Union for Pure and Applied Chemistry (IUPAC) and the American Society for Testing and Materials (ASTM), see:
 K. Danzer, M. Otto and L.A. Currie
Guidelines for calibration in analytical chemistry
Part 2. Multispecies calibration
Pure & Applied Chemistry, 76 (2004) 12151225
Download (=250 kB: © IUPAC 2004)
 Practice E165500. ASTM Annual Book of Standards
West Conshohocken, PA 194282959 USA, Vol. 03.06 (2001) 573600
These guidelines contain expressions that have limited applicability because they do not account for all sources of prediction uncertainty, namely:
 the socalled equation error, which is the part of the true predictand variable (e.g. analyte concentration) that cannot be explained by the true model (i.e. regression vector) and true predictor variables (e.g. NIR spectrum),
 the measurement error in the predictor variables, which is often correlated and heteroscedastic with possibly different levels during training and prediction stages, and
 the measurement error in the predictand variable during the training stage.
We note that the ASTM expression has nevertheless been implemented in the Quant+ software (Version 4.51.02, PerkinElmer Inc., Wellesly, Maryland, USA).
Potential multivariate extensions of generally accepted univariate methodology are listed in:
 A. Olivieri, N.M. Faber, J. Ferré, R. Boqué, J.H. Kalivas and H. Mark
Guidelines for calibration in analytical chemistry
Part 3. Uncertainty estimation and figures of merit for multivariate calibration
Pure & Applied Chemistry, 78 (2006) 633661
Download (=645 kB: © IUPAC 2006)

Prediction intervals

Top

Currently, it is common practice to assess the predictive ability of multivariate models by comparing predictions with reference values for a test set. From the squared deviations, a root mean squared error of prediction (RMSEP) is calculated as
where N denotes the size of the test set, and and are the prediction and reference value for sample i, respectively.
This rather standard procedure, however, has serious weaknesses:
 The resulting RMSEP is a constant measure for prediction uncertainty that cannot lead to prediction intervals with correct coverage probabilities (say 95%).
 A crucial assumption is that the reference values are sufficiently precise; this is certainly not always true (octane rating, classical Kjeldahl)  often the prediction is even better than the reference value.
 The intrinsically high variability of the RMSEP estimate requires large test sets, which is wasteful.
We have derived expressions for estimating prediction intervals for PCR and PLS, see:
 N.M. Faber and B.R. Kowalski
Prediction error in least squares regression: further critique on the deviation used in The Unscrambler
Chemometrics and Intelligent Laboratory Systems, 34 (1996) 283292
 N.M. Faber and B.R. Kowalski
Propagation of measurement errors for the validation of predictions obtained by principal component regression and partial least squares
Journal of Chemometrics, 11 (1997) 181238
These expressions intend to generalize the formula that yields the prediction bands for the classical leastsquares straightline fit with intercept:

The PCR and PLS expressions have an interpretation in terms of multivariate analytical figures of merit. Moreover, they are consistent with expressions for other widely used multivariate quantities, e.g. the scores and loadings from a principal component analysis (PCA).
Click:
A particularly convenient expression has performed well for a number of NIR data sets, see:
 J.A. Fernández Pierna, L. Jin, F. Wahl, N.M. Faber and D.L. Massart
Estimation of partial least squares regression (PLSR) prediction uncertainty when the reference values carry a sizeable measurement error
Chemometrics and Intelligent Laboratory Systems, 65 (2003) 281291
Illustrative is the excellent agreement with resampling, which should be quite accurate:


Figure MVC 2: Various estimates for samplespecific prediction uncertainty.

The Unscrambler approach, however, on average underestimates prediction uncertainty by a factor of two for this data set.
The current proposal has also been shown to improve the one recommended in the ASTM guideline, see:
 N.M. Faber, X.H. Song and P.K. Hopke
Prediction intervals for partial least squares regression
Trends in Analytical Chemistry, 22 (2003) 330334
When applied to the PLS calibration of a NIR data set taken from the literature, one obtains the following results:

Note that the interval for the outlying sample 1 is sufficiently large to provide a realistic estimate of prediction uncertainty.
A direct comparison of the samplespecific prediction uncertainties with the setlevel criterion RMSEP is particularly revealing:


Figure MVC 4: For the large majority of test samples (21 out of 26), the prediction is better than the associated reference value.

From this plot it is clear that RMSEP (0.22%) severely underestimates the prediction uncertainty for extreme samples, while it overestimates the true average value. Obviously, the reason for providing a pessimistic average view is a spurious contribution, namely the error in the reference values (0.2%). In a strict sense, RMSEP is defined in terms of the 'truth', rather than noisy reference values, i.e.
Noisy reference values lead to a socalled apparent RMSEP, see:
 R. DiFoggio
Examination of some misconceptions about nearinfrared analysis
Applied Spectroscopy, 49 (1995) 6775
 N.M. Faber and B.R. Kowalski
Improved prediction error estimates for multivariate calibration by correcting for the measurement error in the reference values
Applied Spectroscopy, 51 (1997) 660665
 L.K. Sørensen
True accuracy of near infrared spectroscopy and its dependence on precision of reference data
Journal of Near Infrared Spectroscopy, 10 (2002) 1525

Regression coefficients

Top

This topic is treated in detail under Reliability of principal component analysis to demonstrate the internal consistency of the methodology presented on this site. By contrast, the currently popular jackknife gives misleading results.
It is further noted here that for correct application of the jackknife the data must constitute a random sample from 'some' population, simply because the jackknife is a resampling method. In other words, it cannot (in a formal sense) be applied if the data originate from a design. This condition certainly implies a severe limitation for the applicability of the jackknife in general.
The following example, taken from the book by Martens and Martens (Appendix A16), is illustrative of what may go wrong when applying the jackknife to designed data:


Figure MVC 5: Estimated standard errors for regression coefficient estimates.

The 14component PLS model is identical to ordinary least squares (OLS), for which an exact formula exists. It is observed that the jackknife severely overestimates the exact results. Moreover, most standard errors produced by the jackknife decrease when increasing the number of components from 6 to 14. This behaviour doesn't make sense.

Scores and loadings

Top

Likewise, this topic is treated in detail under Reliability of principal component analysis. Since the latter page is rather technical, it is easily overlooked that the jackknife cannot lead to uncertainty estimates for the scores.

Validation of a multivariate model

Top

There is a growing awareness that multivariate models must be validated similar to the straightline fit, i.e. in terms of their precision, linearity, &c. A nice example of the successful validation of a NIR model  using software of ABB Bomem  is given in:
 M. Laasonen, T. HarmiaPulkkinen, C. Simard, M. Räsänen and H. Vuorela
Development and validation of a nearinfrared method for the quantitation of caffeine in intact single tablets
Analytical Chemistry, 75 (2003) 754760
This work could easily be taken a step further by including samplespecific prediction intervals. It is stressed that the range of applications of the proposed methodology is not restricted to NIR spectroscopy.

References & further information

Top

Open a list of references. This list pays attention to work that surpasses the routine test set validation, which should always be carried out.
For further information, please contact Sven Serneels:

