Home About us Media Research Consultancy Training Site map Contact

Home » Research » Reliability of multivariate calibration

Multivariate calibration is one of the major success stories of chemometrics, in particular owing to the large number of applications in near-infrared (NIR) spectroscopy. It should therefore not come as a surprise that the reliability of predictions obtained using multivariate calibration methods such as partial least squares (PLS) regression and principal component regression (PCR) has received considerable attention since the eighties.
A good discussion of this reliability aspect can be found in:

  • H. Martens and M. Martens
    Multivariate analysis of quality - an introduction, Wiley, Chichester (2000)

To download the data sets (ASCII, Matlab 5.2 and The Unscrambler 7.6), visit
Wiley's Chemometrics World.

Pascal Lemberge and Pierre Van Espen note in their excellent review (Journal of Chemometrics, 16 (2002) 633-634) that:

Chapter 10 is what the previous best-seller of Harald Martens (and Tormod Næs), Multivariate Calibration, lacked, also because at the time research on this topic was far from completed.

N.B. The methodology discussed below intends to improve the one proposed by Martens and Martens, which is currently implemented in certain commercial packages.

This page is organized as follows:

Official literature

Guidelines have been issued by the International Union for Pure and Applied Chemistry (IUPAC) and the American Society for Testing and Materials (ASTM), see:

  • K. Danzer, M. Otto and L.A. Currie
    Guidelines for calibration in analytical chemistry
    Part 2. Multispecies calibration
    Pure & Applied Chemistry, 76 (2004) 1215-1225
    Download (icon_pdf.gif=250 kB: © IUPAC 2004)
  • Practice E1655-00. ASTM Annual Book of Standards
    West Conshohocken, PA 19428-2959 USA, Vol. 03.06 (2001) 573-600

These guidelines contain expressions that have limited applicability because they do not account for all sources of prediction uncertainty, namely:

  1. the so-called equation error, which is the part of the true predictand variable (e.g. analyte concentration) that cannot be explained by the true model (i.e. regression vector) and true predictor variables (e.g. NIR spectrum),
  2. the measurement error in the predictor variables, which is often correlated and heteroscedastic with possibly different levels during training and prediction stages, and
  3. the measurement error in the predictand variable during the training stage.

We note that the ASTM expression has nevertheless been implemented in the Quant+ software (Version 4.51.02, PerkinElmer Inc., Wellesly, Maryland, USA).

Potential multivariate extensions of generally accepted univariate methodology are listed in:

  • A. Olivieri, N.M. Faber, J. Ferré, R. Boqué, J.H. Kalivas and H. Mark
    Guidelines for calibration in analytical chemistry
    Part 3. Uncertainty estimation and figures of merit for multivariate calibration
    Pure & Applied Chemistry, 78 (2006) 633-661
    Download (icon_pdf.gif=645 kB: © IUPAC 2006)

Prediction intervals

Top Top blue.gif

Currently, it is common practice to assess the predictive ability of multivariate models by comparing predictions with reference values for a test set. From the squared deviations, a root mean squared error of prediction (RMSEP) is calculated as

MVC eq 1.gif

where N denotes the size of the test set, and ypredi.gif and yrefi.gif are the prediction and reference value for sample i, respectively.

This rather standard procedure, however, has serious weaknesses:

  1. The resulting RMSEP is a constant measure for prediction uncertainty that cannot lead to prediction intervals with correct coverage probabilities (say 95%).
  2. A crucial assumption is that the reference values are sufficiently precise; this is certainly not always true (octane rating, classical Kjeldahl) - often the prediction is even better than the reference value.
  3. The intrinsically high variability of the RMSEP estimate requires large test sets, which is wasteful.

We have derived expressions for estimating prediction intervals for PCR and PLS, see:

  • N.M. Faber and B.R. Kowalski
    Prediction error in least squares regression: further critique on the deviation used in The Unscrambler
    Chemometrics and Intelligent Laboratory Systems, 34 (1996) 283-292
  • N.M. Faber and B.R. Kowalski
    Propagation of measurement errors for the validation of predictions obtained by principal component regression and partial least squares
    Journal of Chemometrics, 11 (1997) 181-238

These expressions intend to generalize the formula that yields the prediction bands for the classical least-squares straight-line fit with intercept:


Figure MVC 1: Classical least-squares straight-line fit (Yellow line.gif) with bands (Cyan line.gif) that yield the 95%-prediction intervals. Prediction is relatively precise close to the model center.

The PCR and PLS expressions have an interpretation in terms of multivariate analytical figures of merit. Moreover, they are consistent with expressions for other widely used multivariate quantities, e.g. the scores and loadings from a principal component analysis (PCA).


A particularly convenient expression has performed well for a number of NIR data sets, see:

  • J.A. Fernández Pierna, L. Jin, F. Wahl, N.M. Faber and D.L. Massart
    Estimation of partial least squares regression (PLSR) prediction uncertainty when the reference values carry a sizeable measurement error
    Chemometrics and Intelligent Laboratory Systems, 65 (2003) 281-291

Illustrative is the excellent agreement with resampling, which should be quite accurate:


Figure MVC 2: Various estimates for sample-specific prediction uncertainty.

The Unscrambler approach, however, on average underestimates prediction uncertainty by a factor of two for this data set.

The current proposal has also been shown to improve the one recommended in the ASTM guideline, see:

  • N.M. Faber, X.-H. Song and P.K. Hopke
    Prediction intervals for partial least squares regression
    Trends in Analytical Chemistry, 22 (2003) 330-334

When applied to the PLS calibration of a NIR data set taken from the literature, one obtains the following results:


Figure MVC 3: 95%-Prediction intervals around the PLS results (Red dot.gif). The number of reference values (Yellow asterisk.gif) that is covered by these intervals is close to the nominal value.

Note that the interval for the outlying sample 1 is sufficiently large to provide a realistic estimate of prediction uncertainty.

A direct comparison of the sample-specific prediction uncertainties with the set-level criterion RMSEP is particularly revealing:


Figure MVC 4: For the large majority of test samples (21 out of 26), the prediction is better than the associated reference value.

From this plot it is clear that RMSEP (0.22%) severely underestimates the prediction uncertainty for extreme samples, while it overestimates the true average value. Obviously, the reason for providing a pessimistic average view is a spurious contribution, namely the error in the reference values (0.2%). In a strict sense, RMSEP is defined in terms of the 'truth', rather than noisy reference values, i.e.

MVC eq 2.gif

Noisy reference values lead to a so-called apparent RMSEP, see:

  • R. DiFoggio
    Examination of some misconceptions about near-infrared analysis
    Applied Spectroscopy, 49 (1995) 67-75
  • N.M. Faber and B.R. Kowalski
    Improved prediction error estimates for multivariate calibration by correcting for the measurement error in the reference values
    Applied Spectroscopy, 51 (1997) 660-665
  • L.K. Sørensen
    True accuracy of near infrared spectroscopy and its dependence on precision of reference data
    Journal of Near Infrared Spectroscopy, 10 (2002) 15-25

Regression coefficients

Top Top blue.gif

This topic is treated in detail under Reliability of principal component analysis to demonstrate the internal consistency of the methodology presented on this site. By contrast, the currently popular jackknife gives misleading results.

It is further noted here that for correct application of the jackknife the data must constitute a random sample from 'some' population, simply because the jackknife is a re-sampling method. In other words, it cannot (in a formal sense) be applied if the data originate from a design. This condition certainly implies a severe limitation for the applicability of the jackknife in general.

The following example, taken from the book by Martens and Martens (Appendix A16), is illustrative of what may go wrong when applying the jackknife to designed data:


Figure MVC 5: Estimated standard errors for regression coefficient estimates.

The 14-component PLS model is identical to ordinary least squares (OLS), for which an exact formula exists. It is observed that the jackknife severely overestimates the exact results. Moreover, most standard errors produced by the jackknife decrease when increasing the number of components from 6 to 14. This behaviour doesn't make sense.

Scores and loadings

Top Top blue.gif

Likewise, this topic is treated in detail under Reliability of principal component analysis. Since the latter page is rather technical, it is easily overlooked that the jackknife cannot lead to uncertainty estimates for the scores.

Validation of a multivariate model

Top Top blue.gif

There is a growing awareness that multivariate models must be validated similar to the straight-line fit, i.e. in terms of their precision, linearity, &c. A nice example of the successful validation of a NIR model - using software of ABB Bomem - is given in:

  • M. Laasonen, T. Harmia-Pulkkinen, C. Simard, M. Räsänen and H. Vuorela
    Development and validation of a near-infrared method for the quantitation of caffeine in intact single tablets
    Analytical Chemistry, 75 (2003) 754-760

This work could easily be taken a step further by including sample-specific prediction intervals. It is stressed that the range of applications of the proposed methodology is not restricted to NIR spectroscopy.

References & further information

Top Top blue.gif

Open blue.gif Open a list of references. This list pays attention to work that surpasses the routine test set validation, which should always be carried out.

For further information, please contact Sven Serneels: Sven Serneels.jpg