## 1 Introduction

This document discusses baseline correction methods that can be used with package hyperSpec. Package hyperSpec provides two fitting functions for polynomial baselines, spc_fit_poly() and spc_fit_poly_below(). Another possibility is spc_rubberband(), a “rubberband” method that determines support points by finding the convex hull of each spectrum. The baselines are then piecewise linear or (smoothing) splines through the support points.

Please note that a specialized package for baseline fitting, package baseline[1,2], exists that provides many more methods to fit baselines to spectroscopic data. Using package baseline with hyperSpec objects is demonstrated in vignette ("hyperspec").

## 2 Polynomial Baselines

In contrast to many other programs that provide baseline correction methods, package hyperSpec’s polynomial baseline functions do least squares fits. However, the baselines can be forced through particular points, if this behaviour is needed.

The main difference between the two functions is that spc_fit_poly() returns a least squares fit through the complete spectrum that is given in fit.to whereas spc_fit_poly_below() tries to find appropriate spectral regions to fit the baseline to.

### 2.1 Syntax and Parameters

spc_fit_poly(fit.to, apply.to = fit.to, poly.order = 1)

spc_fit_poly_below(fit.to, apply.to = fit.to, poly.order = 1, npts.min = NULL, noise = 0)
• fit.to: hyperSpec object with the spectra whose baselines are to be fitted.
• apply.to: hyperSpec object giving the spectral range, on which the baselines should be evaluated. If apply() is NULL, a hyperSpec object with the polynomial coefficients is returned instead of evaluated baselines.
• poly.order: polynomial order of the baselines.
• npts.min: minimal number of data points per spectrum to be used for the fit. Argument npts.min defaults to the larger of 3 times (poly.order + 1) or $$\frac{1}{20th}$$ of the number of data points per spectrum. If npts.min $$\leq$$ poly.order, a warning is issued and npts.min <- poly.order + 1 is used.
• noise: a vector giving the amount of noise, see below.

### 2.2 General Use

Both functions fit the polynomial to the spectral range given in hyperSpec object fit.to. If apply.to is not NULL, the polynomials are evaluated on the spectral range of apply.to. Otherwise, the polynomial coefficients are returned.

Subtracting the baseline is up to the user, it is easily done as hyperSpec provides the - (minus) operator.

### 2.3 Fitting Polynomial Baselines Using Least Squares

Commonly, baselines are fit using (single) support points that are specified by the user. Also, usually $$n + 1$$ support point is used for a polynomial of order $$n$$. This approach is appropriate for spectra with high signal to noise ratio.

Such a baseline can be obtained by restricting the spectra in fit.to to the respective points (see figure 2.1):

# Load Raman spectra
file <- system.file("extdata/chondro_mini.rds", package = "hyperSpec")
chondro_mini <- readRDS(file)
bl <- spc_fit_poly(chondro_mini[, , c(633, 1788)], chondro_mini)

plot(chondro_mini, plot.args = list(ylim = c(200, 600)), col = 1:2)
plot(chondro_mini[, , c(633, 1788)],
add = TRUE, col = 1:2,
lines.args = list(type = "p", pch = 20)
)
plot(bl, add = TRUE, col = 1:2)

However, if the signal to noise ratio is not ideal, a polynomial with $$n + 1$$ supporting points (i.e. with zero degrees of freedom) is subject to a considerable amount of noise. If on the other hand, more data points consisting of baseline only are available, the uncertainty on the polynomial can be reduced by a least squares fit.

Both spc_fit_poly() and spc_fit_poly_below() therefore provide least squares fitting for the polynomial.

Function spc_fit_poly() fits to the whole spectral region of fit.to. Thus, for baseline fitting the spectra need to be cut to appropriate wavelength ranges that do not contain any signal.

In order to speed up calculations, the least squares fit is done by using the Vandermonde matrix and solving the equation system by qr.solve().

This fit is not weighted. A spectral region with many data points therefore has greater influence on the resulting baseline than a region with just a few data points. It is up to the user to decide whether this should be corrected for by selecting appropriate numbers of data points (e.g. by using replicates of the shorter spectral region).

### 2.4 The Mechanism of Automatically Fitting the Baseline in spc_fit_poly_below()

Function spc_fit_poly_below() tries to automatically find appropriate spectral regions for baseline fitting. This is done by excluding spectral regions that contain signals from the baseline fitting. The idea is that all data points that lie above a fitted polynomial (initially through the whole spectrum, then through the remaining parts of the spectrum) will be treated as signal and thus be excluded from the baseline fitting.

The supporting points for the baseline polynomials are calculated iteratively:

1. A polynomial of the requested order is fit to the considered spectral range, initially to the whole spectrum given in fit.to
2. Only the parts of the spectrum that lie below this polynomial plus the noise are retained as supporting points for the next iteration.

These two steps are repeated until either:

• no further points are excluded, or
• the next polynomial would have less than npts.min supporting points.

The baselines and respective supporting points for each iteration of spc_fit_poly_below(chondro_mini[1], poly.order = 1) are shown in figure 2.2.

bl <- spc_fit_poly_below(chondro_mini[1], poly.order = 1, max.iter = 8, npts.min = 2, debuglevel = 3L)

### 2.5 Specifying the Spectral Range

plot(chondro_mini[1, , 1700 ~ 1750],
lines.args = list(type = "b"),
plot.args = list(ylim = range(chondro_mini[1, , 1700 ~ 1750]) + c(-50, 0))
)

bl <- spc_fit_poly_below(chondro_mini[1, , 1700 ~ 1750], NULL, poly.order = 1)
abline(bl[[]], col = "black")

plot(
chondro_mini[1, , 1720 ~ 1750],
lines.args = list(type = "p", pch = 19, cex = .6),
col = "blue",
)

bl <- spc_fit_poly_below(chondro_mini[1, , 1720 ~ 1750], NULL, poly.order = 1)
abline(bl[[]], col = "blue")

It is possible to exclude spectral regions that do not contribute to the baseline from the fitting, while the baseline is used for the whole spectrum. This selection of appropriate spectral regions is essential for spc_fit_poly(). But also spc_fit_poly_below() can benefit from narrower spectral ranges: the fitting gains speed. The default value for npts.min depends on the number of data points per spectrum. Thus one should also consider requiring more support points than the default value suggests.

system.time(spc_fit_poly_below(chondro_mini, NULL, npts.min = 20))
#>    user  system elapsed
#>   0.007   0.000   0.007
system.time(spc_fit_poly_below(chondro_mini[, , c(min ~ 700, 1700 ~ max)], NULL, npts.min = 20))
#>    user  system elapsed
#>   0.004   0.000   0.004

The choice of the spectral range in fit.to influences the resulting baselines to a certain extent, as becomes clear from figure 2.3.

### 2.6 Fitting Polynomials of Different Orders

bl <- spc_fit_poly_below(chondro_mini[1], poly.order = 0, debuglevel = 2, max.iter = 5)
#> Fitting with npts.min = 8
#> start: 150 support points
#> spectrum      1:    10 support points, noise = 0.0,   5 iterations
bl <- spc_fit_poly_below(chondro_mini[1], poly.order = 1, debuglevel = 2, max.iter = 5)
#> Fitting with npts.min = 8
#> start: 150 support points
#> Warning in spc_fit_poly_below(chondro_mini[1], poly.order = 1, debuglevel = 2, : Stopped after 5
#> iterations with 14 support points.
#> spectrum      1:    14 support points, noise = 0.0,   5 iterations
bl <- spc_fit_poly_below(chondro_mini[1], poly.order = 2, debuglevel = 2, max.iter = 5)
#> Fitting with npts.min = 9
#> start: 150 support points
#> spectrum      1:    11 support points, noise = 0.0,   5 iterations

Figures 2.4, 2.5, 2.6 show the resulting baseline polynomial of spc_fit_poly_below (chondro_mini [1], poly.order = order) with order $$=$$ 0 to 3 for the first spectrum of the chondro_mini data set.

### 2.7 The Noise Level

spc <- new("hyperSpec", spc = matrix(rnorm(100, mean = 100, sd = 2), ncol = 100))
noise <- 4
spc_fit_poly_below(spc, poly.order = 0, noise = noise, debuglevel = 2)
#> hyperSpec object
#>    1 spectra
#>    1 data columns
#>    100 data points / spectrum
plot(spc_fit_poly_below(spc, poly.order = 0), col = "black", add = T)
bl <- spc_fit_poly_below(chondro_mini[1], poly.order = 1, noise = 15, max.iter = 8, debuglevel = 3L)
#> Warning in spc_fit_poly_below(chondro_mini[1], poly.order = 1, noise = 15, : Stopped after 8
#> iterations with 13 support points.

Besides defining a minimal number of supporting points, a “noise level” may be given. Consider a spectral range consisting only of noise. The upper part of figures 2.7, 2.8 illustrate the problem. As the baseline fitting algorithm cannot distinguish between noise and real bands appearing above the fitted polynomial, the resulting baseline (black) is too low if the noise parameter is not given.

Setting the noise level to 4 (2 standard deviations), the fitting converges immediately with a much better result. The resulting baselines for spc_fit_poly_below (chondro_mini [1], poly.order = 1, noise = 12) of the whole spectrum are shown in the middle and lower part of figure 2.8.

The noise may be a single value for all spectra, or a vector with the noise level for each of the spectra separately.

## 3 Rubberband Method

Particularly Raman spectra often show increasing background towards $$\Delta\tilde\nu = 0$$. In this case, polynomial baselines often either need high order or residual background is left in the spectra.

In that case, smoothing splines fitted through the supporting points are a good alternative.

For the paracetamol spectrum (fig. 3.1, 3.2), a noise level of 300 counts and the equivalent of 20 degrees of freedom work well.

bl <- spc_rubberband(paracetamol[, , 175 ~ 1800], noise = 300, df = 20)
plot(paracetamol[, , 175 ~ 1800])
plot(bl, add = TRUE, col = 2)
plot(paracetamol[, , 175 ~ 1800] - bl)

However, there is possibly some background left between 1200 and 1750 cm-1 where the original spectrum is slightly concave. This can be corrected by bending the spectrum before applying the rubberband correction (fig. 3.3, 3.4):

bend <- 5e4 * wl_eval(paracetamol[, , 175 ~ 1800], function(x) x^2, normalize.wl = normalize01)
bl <- spc_rubberband(paracetamol[, , 175 ~ 1800] + bend) - bend
plot(paracetamol[, , 175 ~ 1800] + bend)
plot(bl + bend, add = T, col = 2)
plot(paracetamol[, , 175 ~ 1800] - bl)

## Session Info

sessioninfo::session_info("hyperSpec")
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.3 (2022-03-10)
#>  os       macOS Big Sur/Monterey 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language en
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       UTC
#>  date     2022-04-06
#>  pandoc   2.7.3 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────
#>  package        * version      date (UTC) lib source
#>  brio             1.1.3        2021-11-30 [1] CRAN (R 4.1.0)
#>  callr            3.7.0        2021-04-20 [1] CRAN (R 4.1.0)
#>  cli              3.2.0        2022-02-14 [1] CRAN (R 4.1.2)
#>  colorspace       2.0-3        2022-02-21 [1] CRAN (R 4.1.2)
#>  crayon           1.5.1        2022-03-26 [1] CRAN (R 4.1.2)
#>  desc             1.4.1        2022-03-06 [1] CRAN (R 4.1.2)
#>  diffobj          0.3.5        2021-10-05 [1] CRAN (R 4.1.0)
#>  digest           0.6.29       2021-12-01 [1] CRAN (R 4.1.0)
#>  dplyr            1.0.8        2022-02-08 [1] CRAN (R 4.1.2)
#>  ellipsis         0.3.2        2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate         0.15         2022-02-18 [1] CRAN (R 4.1.2)
#>  fansi            1.0.3        2022-03-24 [1] CRAN (R 4.1.2)
#>  farver           2.1.0        2021-02-28 [1] CRAN (R 4.1.0)
#>  generics         0.1.2        2022-01-31 [1] CRAN (R 4.1.2)
#>  ggplot2        * 3.3.5        2021-06-25 [1] CRAN (R 4.1.0)
#>  glue             1.6.2        2022-02-24 [1] CRAN (R 4.1.2)
#>  gtable           0.3.0        2019-03-25 [1] CRAN (R 4.1.0)
#>  hyperSpec      * 0.200.0.9000 2022-04-06 [1] local
#>  hySpc.testthat   0.2.1.9000   2022-04-06 [1] local
#>  isoband          0.2.5        2021-07-13 [1] CRAN (R 4.1.0)
#>  jpeg             0.1-9        2021-07-24 [1] CRAN (R 4.1.0)
#>  jsonlite         1.8.0        2022-02-22 [1] CRAN (R 4.1.2)
#>  labeling         0.4.2        2020-10-20 [1] CRAN (R 4.1.0)
#>  lattice        * 0.20-45      2021-09-22 [2] CRAN (R 4.1.3)
#>  latticeExtra     0.6-29       2019-12-19 [1] CRAN (R 4.1.0)
#>  lazyeval         0.2.2        2019-03-15 [1] CRAN (R 4.1.0)
#>  lifecycle        1.0.1        2021-09-24 [1] CRAN (R 4.1.0)
#>  magrittr         2.0.3        2022-03-30 [1] CRAN (R 4.1.2)
#>  MASS             7.3-55       2022-01-16 [2] CRAN (R 4.1.3)
#>  Matrix           1.4-0        2021-12-08 [2] CRAN (R 4.1.3)
#>  mgcv             1.8-39       2022-02-24 [2] CRAN (R 4.1.3)
#>  munsell          0.5.0        2018-06-12 [1] CRAN (R 4.1.0)
#>  nlme             3.1-155      2022-01-16 [2] CRAN (R 4.1.3)
#>  pillar           1.7.0        2022-02-01 [1] CRAN (R 4.1.2)
#>  pkgconfig        2.0.3        2019-09-22 [1] CRAN (R 4.1.0)
#>  pkgload          1.2.4        2021-11-30 [1] CRAN (R 4.1.0)
#>  png              0.1-7        2013-12-03 [1] CRAN (R 4.1.0)
#>  praise           1.0.0        2015-08-11 [1] CRAN (R 4.1.0)
#>  processx         3.5.3        2022-03-25 [1] CRAN (R 4.1.2)
#>  ps               1.6.0        2021-02-28 [1] CRAN (R 4.1.0)
#>  purrr            0.3.4        2020-04-17 [1] CRAN (R 4.1.0)
#>  R6               2.5.1        2021-08-19 [1] CRAN (R 4.1.0)
#>  RColorBrewer     1.1-3        2022-04-03 [1] CRAN (R 4.1.2)
#>  rematch2         2.1.2        2020-05-01 [1] CRAN (R 4.1.0)
#>  rlang            1.0.2        2022-03-04 [1] CRAN (R 4.1.2)
#>  rprojroot        2.0.3        2022-04-02 [1] CRAN (R 4.1.2)
#>  rstudioapi       0.13         2020-11-12 [1] CRAN (R 4.1.0)
#>  scales           1.1.1        2020-05-11 [1] CRAN (R 4.1.0)
#>  testthat         3.1.3        2022-03-29 [1] CRAN (R 4.1.2)
#>  tibble           3.1.6        2021-11-07 [1] CRAN (R 4.1.0)
#>  tidyselect       1.1.2        2022-02-21 [1] CRAN (R 4.1.2)
#>  utf8             1.2.2        2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs            0.4.0        2022-03-30 [1] CRAN (R 4.1.2)
#>  viridisLite      0.4.0        2021-04-13 [1] CRAN (R 4.1.0)
#>  waldo            0.4.0        2022-03-16 [1] CRAN (R 4.1.2)
#>  withr            2.5.0        2022-03-03 [1] CRAN (R 4.1.2)
#>  xml2             1.3.3        2021-11-30 [1] CRAN (R 4.1.0)
#>
#>  [1] /Users/runner/work/_temp/Library
#>  [2] /Library/Frameworks/R.framework/Versions/4.1/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────────────

## References

[1] K.H. Liland, B.-H. Mevik, Baseline: Baseline correction of spectra, 2020. https://CRAN.R-project.org/package=baseline.

[2] K.H. Liland, T. Almøy, B.-H. Mevik, Optimal choice of baseline correction for multivariate calibration of spectra, Applied Spectroscopy. 64 (2010) 1007–1016.