This tutorial gives an example how to:

  • write an import function for a spectrometer manufacturer’s proprietary ASCII files,
  • add further data columns to the spectra, and
  • set up a linear calibration (inverse least squares).

The data set flu in package hyperSpec consists of 6 fluorescence emission spectra of quinine solutions. They were acquired during an student practicum and were kindly provided by M. Kammer.

The concentrations of the solutions range from 0.05 to 0.30 mg/l. Spectra were acquired with a Perkin Elmer LS50-B fluorescence spectrometer at 350 nm excitation.

1 Writing an Import Function

The raw spectra are in Perkin Elmer’s ASCII file format, one spectrum per file. The files are completely ASCII text, with the actual spectra starting at line 55.

The function to import these files, read.txt.PerkinElmer(), is discussed in the “fileIO” vignette, please refer to that document for details.

It needs to be source()d before use:

source("read.txt.PerkinElmer.R")

folder <- system.file("extdata/flu", package = "hyperSpec")
files <- Sys.glob(paste0(folder, "/flu?.txt"))
flu <- read.txt.PerkinElmer(files, skip = 54)

Now the spectra are in a hyperSpec object and can be examined, e.g., by:

flu
#> hyperSpec object
#>    6 spectra
#>    2 data columns
#>    181 data points / spectrum
plot(flu)
Six spectra of `flu` dataset.

Figure 1.1: Six spectra of flu dataset.

2 Adding Further Data Columns

The calibration model needs the quinine concentrations for the spectra. This information can be stored together with the spectra, and also gets an appropriate label:

flu$c <- seq(from = 0.05, to = 0.30, by = 0.05)
labels(flu, "c") <- "c, mg/l"

flu
#> hyperSpec object
#>    6 spectra
#>    3 data columns
#>    181 data points / spectrum
save(flu, file = "flu.rda")

Now the hyperSpec object flu contains two data columns, holding the actual spectra and the respective concentrations. The dollar operator returns such a data column:

flu$c
#> [1] 0.05 0.10 0.15 0.20 0.25 0.30

3 Dropping data columns

Function read.txt.PerkinElmer() added a column with the file names that we don’t need.

flu$..
#>                                                                  filename    c
#> 1 /tmp/RtmpwJcUHX/temp_libpath18f414eab874/hyperSpec/extdata/flu/flu1.txt 0.05
#> 2 /tmp/RtmpwJcUHX/temp_libpath18f414eab874/hyperSpec/extdata/flu/flu2.txt 0.10
#> 3 /tmp/RtmpwJcUHX/temp_libpath18f414eab874/hyperSpec/extdata/flu/flu3.txt 0.15
#> 4 /tmp/RtmpwJcUHX/temp_libpath18f414eab874/hyperSpec/extdata/flu/flu4.txt 0.20
#> 5 /tmp/RtmpwJcUHX/temp_libpath18f414eab874/hyperSpec/extdata/flu/flu5.txt 0.25
#> 6 /tmp/RtmpwJcUHX/temp_libpath18f414eab874/hyperSpec/extdata/flu/flu6.txt 0.30

It is therefore deleted:

flu$filename <- NULL

4 Linear Calibration

As R is developed for the purpose of statistical analysis, tools for a least squares calibration model are readily available.

The original spectra range from 405 to 495 nm. However, the intensities at 450 nm are perfect for a univariate calibration. Plotting them over the concentration is done by:

plot_c(flu[, , 450])
Spectra intensities at 450 nm.

Figure 4.1: Spectra intensities at 450 nm.

The square bracket operator extracts parts of a hyperSpec object. The first coordinate defines which spectra are to be used, the second which data columns, and the third gives the spectral range.

We discard all the wavelengths but 450 nm:

flu <- flu[, , 450]
labels(flu, "spc") <- expression(I["450 nm"] / a.u.)

The plot could be enhanced by annotating the ordinate with the emission wavelength. Also the axes should start at the origin, so that it is easier to see whether the calibration function will go through the origin:

plot_c(flu, xlim = range(0, flu$c), ylim = range(0, flu$spc))
Spectra intensities at 450 nm with updated labels.

Figure 4.2: Spectra intensities at 450 nm with updated labels.

The actual calibration is a linear model, which can be fitted by the R function lm(). The function lm() needs a formula that specifies which data columns are dependent and independent variables.

The normal calibration plot gives the emission intensity as a function of the concentration, and the calibration function thus models \(I = f (c)\), i. e. \(I = m c + b\) for a linear calibration. This is then solved for \(c\) when the calibration is used.

However, R’s linear model is a quite strict in predicting: a model set up as \(I = f (c)\) will predict the intensity as a function of the concentration but not the other way round. Thus we set up an inverse calibration model[^As we can safely assume that the error on the concentrations is far larger than the error on the instrument signal, it is actually the correct type of model from the least squares fit point of view.]: \(c = f (I)\). The corresponding formula is c ~ I, or in our case c ~ spc, as the intensities are stored in the data column $spc:

In addition, lm() (like most R model building functions) expects the data to be a data.frame.

There are three abbreviations that help to get the parts of the hyperSpec object that are frequently needed:

  • flu[[]] returns the spectra matrix. It takes the same indices as [].
  • flu\$. returns the data as a data.frame.
  • flu\$.. returns a data.frame that has all data columns but the spectra.
flu[[]]
#>         450
#> [1,] 106.95
#> [2,] 213.50
#> [3,] 333.78
#> [4,] 446.63
#> [5,] 556.52
#> [6,] 672.53
flu$.
#>      450    c
#> 1 106.95 0.05
#> 2 213.50 0.10
#> 3 333.78 0.15
#> 4 446.63 0.20
#> 5 556.52 0.25
#> 6 672.53 0.30
flu$..
#>      c
#> 1 0.05
#> 2 0.10
#> 3 0.15
#> 4 0.20
#> 5 0.25
#> 6 0.30

Putting this together, the calibration model is calculated:

calibration <- lm(c ~ spc, data = flu$.)

The summary() gives a good overview of our model:

summary(calibration)
#> 
#> Call:
#> lm(formula = c ~ spc, data = flu$.)
#> 
#> Residuals:
#>         1         2         3         4         5         6 
#> -0.000987  0.002052 -0.000960 -0.000701  0.000864 -0.000267 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 3.85e-03   1.25e-03    3.09    0.037 *  
#> spc         4.41e-04   2.87e-06  153.60  1.1e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.00136 on 4 degrees of freedom
#> Multiple R-squared:     1,   Adjusted R-squared:     1 
#> F-statistic: 2.36e+04 on 1 and 4 DF,  p-value: 1.08e-08

In order to get predictions for new measurements, a new data.frame with the same independent variables (in columns with the same names) as in the calibration data are needed. Then the function predict() can be used. It can also calculate the prediction interval. If we observe e.g. an intensity of 125 or 400 units, the corresponding concentrations and their 99% prediction intervals are:

I <- c(125, 400)
conc <- predict(calibration,
  newdata = list(spc = as.matrix(I)), interval = "prediction",
  level = .99
)
conc
#>        fit     lwr      upr
#> 1 0.058943 0.05133 0.066556
#> 2 0.180149 0.17338 0.186922

Finally, we can draw the calibration function and its 99% confidence interval (also via predict()) together with the prediction example. In order to draw the confidence interval into the calibration graph, we can either use a customized panel function:

int <- list(spc = as.matrix(seq(min(flu), max(flu), length.out = 25)))
ci <- predict(calibration, newdata = int, interval = "confidence", level = 0.99)

panel.ci <- function(x, y, ..., intensity, ci.lwr, ci.upr, ci.col = "#606060") {
  panel.xyplot(x, y, ...)
  panel.lmline(x, y, ...)
  panel.lines(ci.lwr, intensity, col = ci.col)
  panel.lines(ci.upr, intensity, col = ci.col)
}

plot_c(flu,
  panel = panel.ci,
  intensity = int$spc, ci.lwr = ci[, 2], ci.upr = ci[, 3]
)
Calibration function and its 99% confidence interval.

Figure 4.3: Calibration function and its 99% confidence interval.

Or, we can add the respective data to the hyperSpec object. The meaning of the data can be saved in a new extra data column that acts as grouping variable for the plot.

First, the spectral range of flu is cut to contain the fluorescence emission at 450 nm only, and the new column is introduced for the original data:

flu$type <- "data points"

Next, the calculated confidence intervals are appended:

tmp <- new("hyperSpec",
  spc = as.matrix(seq(min(flu), max(flu), length.out = 25)),
  wavelength = 450
)
ci <- predict(calibration, newdata = tmp$., interval = "confidence", level = 0.99)
tmp <- tmp[rep(seq(tmp, index = TRUE), 3)]
tmp$c <- as.numeric(ci)
tmp$type <- rep(colnames(ci), each = 25)

flu <- collapse(flu, tmp)

Finally, the resulting object is plotted. Our prediction example is handled by another customized panel function:

panel.predict <-
  function(x, y, ..., intensity, ci, pred.col = "red", pred.pch = 19, pred.cex = 1) {
    panel.xyplot(x, y, ...)
    mapply(
      function(i, lwr, upr, ...) {
        panel.lines(c(lwr, upr), rep(i, 2), ...)
      },
      intensity, ci[, 2], ci[, 3],
      MoreArgs = list(col = pred.col)
    )
    panel.xyplot(ci[, 1], intensity, col = pred.col, pch = pred.pch, cex = pred.cex, type = "p")
  }

plot_c(flu,
  groups = type, type = c("l", "p"),
  col = c("black", "black", "#606060", "#606060"),
  pch = c(19, NA, NA, NA),
  cex = 0.5,
  lty = c(0, 1, 1, 1),
  panel = panel.predict,
  intensity = I,
  ci = conc,
  pred.cex = 0.5
)
calibration function and its 99% confidence interval and predicted points.

Figure 4.4: calibration function and its 99% confidence interval and predicted points.

Session Info

sessioninfo::session_info("hyperSpec")
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.3 (2024-02-29)
#>  os       Ubuntu 22.04.4 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en
#>  collate  C.UTF-8
#>  ctype    C.UTF-8
#>  tz       UTC
#>  date     2024-03-07
#>  pandoc   3.1.11 @ /opt/hostedtoolcache/pandoc/3.1.11/x64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────
#>  ! package        * version      date (UTC) lib source
#>    brio             1.1.4        2023-12-10 [2] RSPM
#>    callr            3.7.5        2024-02-19 [2] RSPM
#>    cli              3.6.2        2023-12-11 [2] RSPM
#>    colorspace       2.1-0        2023-01-23 [2] RSPM
#>    crayon           1.5.2        2022-09-29 [2] RSPM
#>    deldir           2.0-4        2024-02-28 [2] RSPM
#>    desc             1.4.3        2023-12-10 [2] RSPM
#>    diffobj          0.3.5        2021-10-05 [2] RSPM
#>    digest           0.6.34       2024-01-11 [2] RSPM
#>    dplyr            1.1.4        2023-11-17 [2] RSPM
#>    evaluate         0.23         2023-11-01 [2] RSPM
#>    fansi            1.0.6        2023-12-08 [2] RSPM
#>    farver           2.1.1        2022-07-06 [2] RSPM
#>    fs               1.6.3        2023-07-20 [2] RSPM
#>    generics         0.1.3        2022-07-05 [2] RSPM
#>    ggplot2        * 3.5.0        2024-02-23 [2] RSPM
#>    glue             1.7.0        2024-01-09 [2] RSPM
#>    gtable           0.3.4        2023-08-21 [2] RSPM
#>    hyperSpec      * 0.200.0.9000 2024-03-07 [1] local
#>    hySpc.testthat   0.2.1        2020-06-24 [2] RSPM
#>    interp           1.1-6        2024-01-26 [2] RSPM
#>    isoband          0.2.7        2022-12-20 [2] RSPM
#>    jpeg             0.1-10       2022-11-29 [2] RSPM
#>    jsonlite         1.8.8        2023-12-04 [2] RSPM
#>    labeling         0.4.3        2023-08-29 [2] RSPM
#>    lattice        * 0.22-5       2023-10-24 [4] CRAN (R 4.3.3)
#>    latticeExtra     0.6-30       2022-07-04 [2] RSPM
#>    lazyeval         0.2.2        2019-03-15 [2] RSPM
#>    lifecycle        1.0.4        2023-11-07 [2] RSPM
#>    magrittr         2.0.3        2022-03-30 [2] RSPM
#>    MASS             7.3-60.0.1   2024-01-13 [4] CRAN (R 4.3.3)
#>    Matrix           1.6-5        2024-01-11 [4] CRAN (R 4.3.3)
#>    mgcv             1.9-1        2023-12-21 [4] CRAN (R 4.3.3)
#>    munsell          0.5.0        2018-06-12 [2] RSPM
#>    nlme             3.1-164      2023-11-27 [4] CRAN (R 4.3.3)
#>    pillar           1.9.0        2023-03-22 [2] RSPM
#>    pkgbuild         1.4.3        2023-12-10 [2] RSPM
#>    pkgconfig        2.0.3        2019-09-22 [2] RSPM
#>    pkgload          1.3.4        2024-01-16 [2] RSPM
#>    png              0.1-8        2022-11-29 [2] RSPM
#>    praise           1.0.0        2015-08-11 [2] RSPM
#>    processx         3.8.3        2023-12-10 [2] RSPM
#>    ps               1.7.6        2024-01-18 [2] RSPM
#>    R6               2.5.1        2021-08-19 [2] RSPM
#>    RColorBrewer     1.1-3        2022-04-03 [2] RSPM
#>    Rcpp             1.0.12       2024-01-09 [2] RSPM
#>  R RcppEigen        <NA>         <NA>       [?] <NA>
#>    rematch2         2.1.2        2020-05-01 [2] RSPM
#>    rlang            1.1.3        2024-01-10 [2] RSPM
#>    rprojroot        2.0.4        2023-11-05 [2] RSPM
#>    scales           1.3.0        2023-11-28 [2] RSPM
#>    testthat         3.2.1        2023-12-02 [2] RSPM
#>    tibble           3.2.1        2023-03-20 [2] RSPM
#>    tidyselect       1.2.0        2022-10-10 [2] RSPM
#>    utf8             1.2.4        2023-10-22 [2] RSPM
#>    vctrs            0.6.5        2023-12-01 [2] RSPM
#>    viridisLite      0.4.2        2023-05-02 [2] RSPM
#>    waldo            0.5.2        2023-11-02 [2] RSPM
#>    withr            3.0.0        2024-01-16 [2] RSPM
#>    xml2             1.3.6        2023-12-04 [2] RSPM
#> 
#>  [1] /tmp/RtmpwJcUHX/temp_libpath18f414eab874
#>  [2] /home/runner/work/_temp/Library
#>  [3] /opt/R/4.3.3/lib/R/site-library
#>  [4] /opt/R/4.3.3/lib/R/library
#> 
#>  R ── Package was removed from disk.
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────

References