Home About us Media Research Consultancy Training Site map Contact

Home » Research » Reliability of univariate calibration

Believe it or not: with few exceptions, univariate calibration is not straightforward at all! This is an important observation, since a thorough understanding of the intricacies of univariate calibration is key to investigating the properties of multivariate/multiway extensions. A closer examination of univariate calibration can therefore be seen as a logical step towards getting more out of data in general.

Giving a systematic, complete overview of the theory is clearly out of the scope of this web site. Instead, we aim to make the relevant literature accessible by summarizing important contributions. We pay considerable attention to aspects that also play an important role when moving to higher-complexity predictor data. Moreover, we point out directions for further research.

This page is organized as follows:

Official literature

Univariate calibration has extensive coverage in the official literature. Guidelines have, for example, been issued by the International Union for Pure and Applied Chemistry (IUPAC), see:

  • K. Danzer and L.A. Currie
    Guidelines for calibration in analytical chemistry
    Part 1. Fundamentals and single component calibration
    Pure & Applied Chemistry, 70 (1998) 993-1014
    Download (icon_pdf.gif=1,091 kB: © IUPAC 1998)

Potential multivariate and multiway extensions of generally accepted univariate methodology are listed in:

  • A. Olivieri, N.M. Faber, J. Ferré, R. Boqué, J.H. Kalivas and H. Mark
    Guidelines for calibration in analytical chemistry
    Part 3. Uncertainty estimation and figures of merit for multivariate calibration
    Pure & Applied Chemistry, 78 (2006) 633-661
    Download (icon_pdf.gif=645 kB: © IUPAC 2006)

Noise in the predictand only: classical vs. inverse model

Top Top blue.gif

Without loss of generality, we will, unless otherwise mentioned, assume in the remainder that the data pairs to be modelled consist of analye content and instrumental signal for chemical samples. The basic statistics literature is mainly concerned with the following model:

UVC eq 1.gif

where y is the noisy signal, x is the errorless content, a is the true intercept, b is the true slope and e consists of random error. For what follows, it is necessary to make the (standard) assumption that e is iid normal.

The classical least squares (CLS) fit, i.e., 'forward' calibration through the regression of y onto x, leads to estimates a and b (for a and b) from which the content for an unknown sample is obtained by 'inverse' prediction as

UVC eq 2.gif

where the subscripts 'u' and 'cl' refer to the unknown sample and the classical model, respectively, and the 'hat' (ˆ) symbolizes prediction (of a random variable) or estimation (of a parameter).

This two-stage process is illustrated in:


Figure UVC 1: Classical least squares straight-line fit (Yellow line.gif) with bands (Cyan line.gif) that yield the 95%-prediction intervals. Prediction is relatively precise close to the model center.

With noise in the signal (y) only, the CLS fit is unbiased. Moreover, it yields parameter estimates with minimum variance. In statistics jargon, the CLS model is BLUE - best linear unbiased estimate. However, it has been known for some time that the classical prediction is not efficient for prediction! Instead of regressing (noisy) signal (y) onto (errorless) content (x), one should regress content (x) onto signal (y). Least squares regression of x onto y is known as inverse least squares (ILS). It leads to parameter estimates a' and b' from which the 'forward' prediction follows as

UVC eq 3.gif

where the subscript 'inv' refers to the inverse model.

An excellent overview of the relevant literature is given in:

  • J. Tellinghuisen
    Inverse vs. classical calibration for small data sets
    Fresenius Journal of Analytical Chemistry, 368 (2000) 585-588

The relative predictive ability of the classical and inverse model can be explained from the following approximate expressions for prediction bias and variance (square of the standard error):

UVC eq 4.gif

UVC eq 5.gif

UVC eq 6.gif

UVC eq 7.gif

   Bias.gif and sigma.gif denote bias and standard error of the associated quantity,
   s2 is the variance of e,
   xav.gif is the mean x-value for the training set of n samples,
   Ais.gif in which sx2.gif is the variance of the x-values in the training set,
   thetais.gif in which sy2.gif, and
   m is the number of replicates for the prediction sample.

Extensive Monte Carlo simulations have demonstrated the adequacy of approximations [4]-[7] for large as well as small n.

Prediction bias and variance can be combined into a mean squared error (MSE) of prediction using the well-known expression,

UVC eq 8.gif

It is important to note that the MSE criterion is meaningful only if systematic (bias) and random (standard error) deviations are equally harmful. This is often the case. An important exception is legal work where bias that overly incriminates the subject would violate the principle 'in dubio pro re'.

The following detailed remarks are adapted from the paper by Tellinghuisen:

  • For finite n, the classical prediction xucl.gif has infinite variance, since the denominator (b) in Equation [2] is normally distributed, hence a zero division occurs with non-zero probability. However, the expectation and variance of xucl.gif can be defined in an asymptotic sense (through approximations [4] and [6]), which will normally be adequate and meaningful.
  • The inverse prediction xuinv.gif has finite variance and mean squared error for nge4.gif.
  • Although the classical prediction is unbiased in the limit liminf.gif (hence is consistent), it is biased at finite n, with a bias magnitude comparable to that for the inverse prediction for small n (~5).
  • The biases in both predictions vanish when xuisxav.gif.
  • The range of x over which the inverse prediction is more efficient than the classical prediction, is greater for small n than for large n.
  • The distinction is relevant only for calibration data that are inherently very noisy, as the results for the two predictions differ insignificantly for sufficiently small s.

The improved predictive ability of the inverse model has the interpretation of a favorable bias-variance trade-off: the increase of bias is more than off-set by the decrease of variance. Ignoring the final terms inside the brackets of approximations [6] and [7], yields an approximate decrease by a factor thetalt1.gif. In fact, this decrease is mainly the result of the (favorable) negative (proportional) bias in the slope estimate:


Figure UVC 2: Linear calibration function without intercept - the simplest inverse model. The slope (b') is directly proportional to the amount of error propagation when predicting the true content (x) from the noisy signal (y). Consequently, from a variance perspective a small b' is preferable.

The motivation for inverse calibration is even stronger in the multivariate/multiway domain. The reason for this is, that inverse (multivariate/multiway) calibration enables one to predict for individual analytes whithout explicitly accounting for all interfering species in the unknown mixture. Interfering species are adequately compensated for implicitly by the model if their contribution to the training set predictors (spectra) is representative for future unknown samples. This is especially important for applications in, for example, the food, environmental, petrochemical and life sciences, where number and nature of interfering species is usually unknown. By contrast, the classical (multivariate/multiway) model requires the pure-component predictors (spectra) for all species to be known, which is often not practical.

A complication arises if excessive predictor noise leads to severe bias in the model parameter estimates (regression vector/matrix/array coefficients), hence severe prediction bias. An illustrative example is given in the classic textbook:

  • H. Martens and T. Næs
    Multivariate calibration, Wiley, Chichester (1989)

Martens and Næs performed Monte Carlo simulations, resulting in plots like:


Figure UVC 3: NIR prediction versus reference value for 109 test objects. Artificial noise was added to the predictors (spectra) of the calibration set (Xcal). The calibration formula was estimated by partial least squares (PLS) regression based on the data from 30 calibration objects. The solid diagonal indicates the ideal results. For the other simulation settings, see Figure 4.2 in Martens and Næs (pp. 242-243).

It is seen that the low-valued predictions are severely biased high - the lowest 18(!) reference values are predicted above the target line -, while the converse is true for the high values. As detailed above for the single-predictor case, this prediction bias must be the effect of bias in the model parameter estimates. Consequently, prediction bias can be eliminated to some extent when applying a bias correction to the model. A straightforward bias correction is possible when an appropriate bias expression is available. Then, a bias-corrected model simply follows by subtracting this bias estimate.

Approximate bias expressions are derived for the ILS model (multiple predictors) in:

  • S.D. Hodges and P.G. Moore
    Data uncertainties and least squares regression
    Applied Statistics, 21 (1972) 185-195
  • R.B. Davies and B. Hutton
    The effect of errors in the independent variables in linear regression
    Biometrika, 62 (1975) 383-391

For the PLS model, approximations are derived under various distributional assumptions in:

  • C.H. Spiegelman, M.J. McShane, M.J. Goetz, M. Motamedi, Q.L. Yue and G.L. Coté
    Theoretical justification of wavelength selection in PLS calibration: development of a new algorithm
    Analytical Chemistry, 70 (1998) 35-44
  • A.J. Burnham, J.F. MacGregor and R. Viveros
    Interpretation of regression coefficients under a latent variable regression model
    Journal of Chemometrics, 15 (2001) 265-284
  • B. Nadler and R.R. Coifman
    The prediction error in CLS and PLS: the importance of feature selection prior to multivariate calibration
    Journal of Chemometrics, 19 (2005) 107-118

It is noted that the first two contributions deal with the one-factor model.

Here it is suggested that a distribution-free approach could lead to a more generally applicable result. This approach has worked well for a multiway calibration method, see:

  • N.M. Faber, J. Ferré and R. Boqué
    Iteratively reweighted generalized rank annihilation method. 1. Improved handling of prediction bias
    Chemometrics and Intelligent Laboratory Systems, 55 (2001) 67-90

Approximate bias as well as variance expressions have in common that they are usually obtained by working out a truncated Taylor expansion. (Truncated after the first- and second-order term to obtain the approximate variance and bias, respectively.) Resampling methods are generally more accurate because they do not depend on this kind of approximations. An appropriate resampling method for bias estimation is simulation extrapolation (SIMEX), see:

  • R.J. Carroll, D. Ruppert and L.A. Stefanski
    Measurement error in nonlinear models, Chapman and Hall, London (1995)

It is recommended to always test the adequacy of approximate bias and variance expressions using an appropriate resampling method.

A final caveat seems to be in order. Bias correction leads to larger model parameter estimates (absolute value), hence to increased propagation of predictor noise in the prediction stage, cf. Figure UVC 2. Consequently, it seems best to work with two models when bias plays a dominating role, namely:

  1. the original model for samples close to the center, where bias is relatively unimportant, and
  2. the bias-corrected model for extreme samples for which bias is unacceptably large, e.g. near the limit of detection.

Noise in the predictor too: estimation vs. prediction

Top Top blue.gif

Non-negligible noise in the predictor variables is of course the common situation in (inverse) multivariate and multiway calibration - just think of instrument noise. It therefore makes sense to discuss this case in some detail for univariate calibration too. As has been explained in the preceding section, estimation and prediction are essentially distinct tasks with conflicting requirements owing to the associated uncertainty. These conflicting requirements were described as a bias-variance trade-off. Clearly, with 'errors in both axes', the same trade-off principle holds as well, i.e., noise in the predictors leads to deflated slope estimates. This bias is an undesirable complication when model interpretation is a major goal of the analysis. After all, how to intepret an estimate that is systematically too small? For a thorough discussion of this complication in connection with PLS, see:

  • A.J. Burnham, J.F. MacGregor and R. Viveros
    Interpretation of regression coefficients under a latent variable regression model
    Journal of Chemometrics, 15 (2001) 265-284

However, a negative slope bias leads to decreased propagation of predictor noise in the prediction phase, i.e., a decreased variance contribution to root mean squared error of prediction (RMSEP). The resulting decrease in prediction variance may actually more than outweigh the increase in prediction bias. The following illustrative example is adapted from:

  • C.D. Brown
    Discordance between net analyte signal theory and practical multivariate calibration
    Analytical Chemistry, 76 (2004) 4364-4373

Brown emphasizes that the optimum RMSEP is achieved when accepting a negative bias in the slope estimate:


Figure UVC 4: Left panel: Scatterplot of noisy x and y measurements (both corrupted with measurement error with s = 50), where the true values are actually related by a slope b = 1. Right panel: RMSEP as a function of the slope used to predict the true y from the noisy x. Also depicted are the true slope b = 1 (Yellow dash.gif), and the theoretically optimal RMSE prediction slope of 0.9 (Green dash.gif).

We have performed Monte Carlo simulations, resulting in 10,000 data sets generated according to the noise setting depicted in the left panel of Figure UVC 4. Models were estimated using ordinary least squares (OLS), total least squares (TLS) and corrected least squares (CLS). OLS simply regresses y onto x. This leads to biased slope estimates because plain variances are modelled in OLS. Since the predictor variance has a spurious noise contribution, x gets too much weight in the regression. This explains why the OLS slope estimate is biased low. TLS and CLS, on the other hand, are methods that yield slope estimates that are asymptotically unbiased. This is achieved by compensating for the spurious contribution of the predictor noise in the regression. Both methods require an estimate of the predictor noise variance to minimize the spurious contribution. Whereas TLS estimates the predictor noise from the data, CLS utlizes an independent noise estimate. For more details about TLS and CLS, see:

  • S. Van Huffel and J. Vandewalle
    The total least squares problem. Computational aspects and analysis, SIAM (1991)

The following table gives an overview of the uncertainties associated with estimation and prediction, where the root mean squared error (RMSE) is further broken down into the standard error (SE; square root of the variance) and bias:

Table UVC 1: Summary statistics obtained for the x-noise setting: s = 50 for both estimation and prediction.

UVC T1.gif

It is observed that the theoretical expectations are realized for OLS, CLS and TLS. OLS has an inferior RMSE for estimation in column 2 due to the relatively large bias in column 4, but yields a smaller RMSE for prediction in column 5 because the standard error is reduced. (Note that SE and bias are added in quadrature to obtain the RMSE.) CLS and TLS are almost unbiased for estimation and prediction. These methods are preferable if focus is on the model, e.g. for interpretation. It is noted that for this particular example the differences in RMSE are much smaller for prediction than for estimation. One might therefore opt for TLS or CLS to ensure a small bias at the expense of a slightly increased prediction RMSE. Finally, the standard error of estimation is slightly larger for CLS and TLS because a bias correction always introduces uncertainty.

The situation further complicates if the predictor noise has different magnitude during estimation and prediction. The following results are obtained by setting the predictor noise variance to zero during prediction. Although the opposite scenario is more typical in applied work, i.e., to have relatively noisy predictors in the prediction phase (see below), the results are nevertheless illustrative of the actions that can be taken. The estimation results in columns 2-5 are almost identical to the ones presented above - small differences can be observed that are caused by a different initialization of the pseudo-random number generator:

Table UVC 2: Summary statistics obtained for the x-noise setting: s = 50 for estimation, whereas s = 0 for prediction.

UVC T2.gif

Since the predictors are noise-free during prediction, there is obviously no bias-variance trade-off that might favor the use of OLS, hence TLS and CLS are to be preferred.

As mentioned above, it is more natural to have noisier predictor variables during prediction, especially in on-line applications, see:

  • C.M. Andersen, R. Bro and P.B. Brockhoff
    Quantifying and handling errors in instrumental measurements using the measurement error theory
    Journal of Chemometrics, 17 (2003) 621-629

With non-negligible predictor noise during prediction only, one obtains the following results:

Table UVC 3: Summary statistics obtained for the x-noise setting: s = 0 for estimation, whereas s = 50 for prediction.

UVC T3.gif

The OLS slope estimate is obviously unbiased because the predictors are error-free during estimation. TLS cannot be used because the predictor noise (during prediction) cannot be estimated from the training data. 'CLS' is the opposite of CLS in the sense that it introduces the bias that would be corrected by CLS when the training data were corrupted by the noise now encountered during prediction only. Some afterthought shows that OLS should behave similar to CLS and TLS in Table UVC 1, likewise 'CLS' and OLS: OLS is clearly superior for estimation hence intepretation, whereas 'CLS' is slightly better for prediction.

Extrapolation in prediction: limit of detection

Top Top blue.gif

The novice is usually taught that one should use calibration models to predict at interpolating positions only, because it is not safe to take an empirical model outside the calibrated range. However, some of the most interesting applications arise when extrapolating, e.g.:

  • prediction of future events;
  • development of a product with higher consumer appreciation using preference mapping;
  • search for a molecule with higher biological activity using a quantitative structure activity relationship (QSAR) model;
  • determination of analyte concentration using the method of standard additions; and
  • detection of lower analyte concentrations in trace analysis.

The remainder of this section is concerned with the latter application, namely limit of detection (LOD) estimation. The following example is taken from:

  • F.J. del Río Bocio, J. Riu, R. Boqué and F.X. Rius
    Limits of detection in linear regression with errors in the concentration
    Journal of Chemometrics, 17 (2003) 413-421

These detection limits are based on the prediction intervals developed in:

  • F.J. del Río Bocio, J. Riu and F.X. Rius
    Prediction intervals in linear regression taking into account errors on both axes
    Journal of Chemometrics, 15 (2001) 773-788

It is noted that this error analysis can be further refined using the results derived in:

  • M. Galea-Rojas, M.V. de Castilho, H. Bolfarine and M. de Castro
    Detection of analytical bias
    Analyst, 128 (2003) 1073-1081
  • M. de Castro, M. Galea-Rojas, H. Bolfarine and M.V. de Castilho
    Detection of analytical bias when comparing two or more measuring methods
    Journal of Chemometrics, 18 (2004) 431-440

The data sets under study were characterized by heteroscedastic errors in both axes. Three methods were considered to fit straight lines through the (x,y)-data points:

  1. ordinary least squares (OLS), which only accommodates for homoscedastic errors in the y-axis;
  2. weighted least squares (WLS), which generalizes OLS to heteroscedastic errors in the y-axis; and
  3. bivariate least squares (BLS), which generalizes total least squares (TLS) to heteroscedastic errors in both axes.

Results are briefly discussed for:

  1. X-ray fluorescence; and
  2. cappilary electrophoresis.

1. X-ray fluorescence (XRF)

The calibration samples for the XRF determination are 15 geological certified reference materials (CRMs). The errors in the CRMs (x-axis error) were calculated from a worldwide interlaboratory certification trial while the error in the instrumental response (y-axis error) was obtained from 7 replicated measurements of each CRM on different days. Interferences were taken into account and possible matrix effects were corrected with the incoherent radiation (Compton) of the sample.

Owing to the relatively large error in the x-axis, the models are quite different, as shown here for Na2O:


Figure UVC 5: Data pairs (Cyan circle.gif) with error bars (Cyan line.gif) and model estimates for Na2O. The data pairs are omitted in the blow-up to focus on the models.

Likewise for Sr:


Figure UVC 6: Data pairs (Cyan circle.gif) with error bars (Cyan line.gif) and model estimates for Sr. The data pairs are omitted in the blow-up to focus on the models.

Not only do the model estimates differ themselves but also the uncertainties associated with these models. The combination of these two effects leads to large differences among the estimates for limit of detection:

Table UVC 4: Detection limits for the nine analytes studied by XRF when a and b errors are set to 5%. All results are expressed in ppm.

UVC T4.gif

1 See Figure UVC 5
2 See Figure UVC 6

With few exceptions, the detection limit estimates are lowest for BLS.

2. Cappilary electrophoresis (CE)

Similar observations are made for this data set:

Table UVC 5: Detection limits for the three anions analyzed by CE when a and b errors are set to 5%. All results are expressed in ppm.

UVC T5.gif


Top Top blue.gif

Whether to prefer the classical or inverse model depends on the goal of the analysis. Biased parameter estimates can be unacceptable if focus is on interpretability. Total least squares (TLS) and corrected least squares (CLS) are examples of methods that reduce the bias caused by random errors. By contrast, a biased calibration model can be optimal in terms of RMSEP when prediction is the goal. The situation around prediction is further complicated when (1) the predictor noise is different during estimation and prediction and (2) extreme extrapolation is attempted. The same considerations play a role when moving to higher-complexity predictors. For example, the generalization of bivariate least squares (BLS) is maximum likelihood calibration.

References & further information

Top Top blue.gif

Open blue.gif Open a list of references. These references should supplement the ones that are well known from basic statistics.

For further information, please contact Jordi Riu: Jordi Riu.jpg