vignettes/fileio.Rmd
fileio.Rmd
#> Warning: Code evaluation in fileio.Rmd DISABLED!
Consistent naming of file import functions in version >= 0.99
From now on, all import functions have names starting with
read.*()
. All functions previously named
scan.*()
have been renamed accordingly.
Supported File Formats
From now on, all import functions have names starting with
read.()
. All functions previously named
scan.()
have been renamed accordingly.
Package hyperSpec supports several file formats relevant for different types of spectroscopy. This file format is naturally only a subset of the file formats produced by different spectroscopic equipment.
If you use package hyperSpec with data formats not mentioned in this document, please open a new issue in hyperSpec’s GitHub repository so that this document can be updated. The information should include:
If you need help finding out how to import your data, please search
and eventually ask on Stackexchange with tags [r]
and
[spectroscopy]
.
Reproducing the Examples in this Manual
To run the code examples, create a folder named “fileio” in your
working directory and copy the contents of the directory https://github.com/r-hyperspec/hyperSpec/tree/r-hyperspec/Vignettes/fileio.
This online directory contains the required datasets (via
git-lfs
).
This document describes how to import files containing spectra into hyperSpec
objects, as well as exportinghyperSpec
objects as files.
The most basic funtion to create hyperSpec
objects is new("hyperSpec")
(section 2).
It makes a hyperSpec
object from data already in R’s workspace.
Thus, after spectra importation into R, conversion to hyperSpec
objects is straightforward.
Additionally, the hyperSpec package comes with predefined import functions for different data formats.
This document divides the discussion into dealing with ASCII files (section 5) and binary file formats (section 6).
If data export for the respective format is possible, we discuss it in the same sections.
As sometimes the actual data written by the spectrometer software exhibits peculiarities, package hyperSpec offers several specialized import functions.
In general, the naming convention is the data format followed by the manufacturer (e.g., read.ENVI.Nicolet
).
Overview lists of the directly supported file formats are in the appendix: sorted by file format (9.1), manufacturer (9.2), and by spectroscopy (9.3).
hyperSpec
object with new()
To create a hyperSpec
object from data in R’s workspace, use:
spc <- new("hyperSpec", spc, wavelength, data, labels)
With the arguments:
spc |
the spectra matrix (may also be given as matrix inside
column $spc of data ) |
wavelength |
the wavelength axis vector |
data |
the extra data (possibly already including the spectra
matrix in column spc ) |
labels |
a list with the proper labels.
Do not forget the wavelength axis label in $.wavelength
and the spectral intensity axis label in $spc . |
Thus, once the data is in R’s workspace, creating a hyperSpec
object is easy.
We suggest wrapping the code to import the data and the line joining it into a hyperSpec
object by a user-created import function.
Users are more than welcome to contribute such import code to package hyperSpec.
Secion 8 discusses examples of custom import functions.
hyperSpec
Object from a Data Matrix (Spectra Matrix)
As spectra matrices are the internal format of hyperSpec
, the constructor can directly be used:
spc <- new("hyperSpec", spc, wavelength, data, labels)
hyperSpec
Object from a Data Cube (Spectra Array)
Roberto Moscetti asked how to convert a hyperspectral data cube into a hyperSpec
object:
The problem is that I have a hypercube with the following dimensions: 67 × 41 × 256 y = 67 x = 41 wavelengths = 256
I do not know the way to import the hypercube.
Data cubes (i.e., 3-dimensional arrays of spectral data) result from spectral imaging measurements, where spectra are supplied for each pixel of an \(px.x × px.y\) imaging area. They have three directions, usually \(x\), \(y\), and the spectral dimension.
The solution is to convert the array into a spectra matrix and have separate \(x\) and \(y\) coordinates.
Assume data
is the data cube, and x
, y
and wl
hold vectors with the proper \(x\) and \(y\) coordinates and the wavelengths:
data <- array(1:24, 4:2)
wl <- c(550, 630)
x <- c(1000, 1200, 1400)
y <- c(1800, 1600, 1400, 1200)
data
Such data can be converted into a hyperSpec
object by:
d <- dim(data)
dim(data) <- c(d[1] * d[2], d[3])
x <- rep(x, each = d[1])
y <- rep(y, d[2])
spectra <- new("hyperSpec",
spc = data,
data = data.frame(x, y), wavelength = wl
)
If no proper coordinates (vectors x
, y
and wl
) are available, they can be left out.
In the case of \(x\) and \(y\), map plotting will then be impossible, missing wavelength
s will be replaced by column indices counting from 1
to d[3]
automatically.
Of course, such sequences (the row/column/pixel numbers) can be used instead of the original x
and y
as well:
Data cubes often come from spectral imaging systems that use an “image” coordinate system counting \(y\) from top to bottom.
Note that this should be accounted for in the decreasing order of the original y
vector.
hyperSpec
Object
Many of the function described below will work on one file, even though derived functions such as read.spc.KaiserMap()
(see section 7.5.2) may take care of measurements consisting of multiple files.
Usually, the most convenient way to import multiple files into one hyperSpec
object is reading all files into a list of hyperSpec
objects, and then collapse()
ing this list into a single hyperSpec
object:
files <- Sys.glob("fileio/spc.Kaisermap/*.spc")
files <- files[seq(1, length(files), by = 2)] # import low wavenumber region only
spc <- lapply(files, read.spc)
length(spc)
spc[[1]]
spc <- collapse(spc)
spc
Note that in this particular case, the spectra are more efficiently read by read.spc.KaiserMap()
(see section 7.5.2).
If one regularly imports huge maps or images, writing a customized import function is highly encouraged. Users may gain speed and memory by using the internal workhorse functions for the file import. In that case, please contact the package maintainer (hyperSpec’s GitHub repository) for advise (contributions to package hyperSpec are welcome and all authors are listed appropriately in the function help page’s author section).
Currently, hyperSpec
provides two functions for general ASCII data import:
read.txt.long()
imports long format ASCII files, i.e., one intensity value per row.read.txt.wide()
imports wide format ASCII files, i.e., one spectrum per row.The import functions immediately return a hyperSpec
object.
Internally, they use read.table()
, a very powerful ASCII import function.
R supplies another ASCII import function, scan()
.
Function scan()
imports numeric data matrices and is faster than read.table()
, but cannot import column names.
If the data does not contain a header or is not important and can safely be skipped, it may want to import the data using scan()
.
Note that R allows the use a variety of compressed file formats directly as ASCII files (for example, see section 7.6).
Also, both read.txt.long()
and read.txt.wide()
accept connections instead of file names.
Richard Pena asked about importing another ASCII file type:
File
Triazine5_31.txt
corresponds to X ray powder diffraction data (Bruker AXS). The native files data”ra”are read with EVA software then they are converted into.uxd
file with the File Exchange software (Bruker AXS). The.uxd
file are opened with Excel software and saved as.txt
file,.csv
file (ChemoSpec) or.xls
.The first and following columns corresponds to the angle diffraction and the intensity values of samples respectively.
Thus, this file differs from the ASCII formats discussed above in that the samples are actually in columns whereas hyperSpec
expects them to be in rows.
The header line gives the name of the sample.
Import is straightforward, and just the spectra matrix needs to be transposed to make a hyperSpec
object:
file <- read.table("fileio/txt.t/Triazine 5_31.txt", header = TRUE, dec = ",", sep = "\t")
triazine <- new("hyperSpec",
wavelength = file[, 1], spc = t(file[, -1]),
data = data.frame(sample = colnames(file[, -1])),
labels = list(
.wavelength = expression(2 * theta / degree),
spc = "I / a.u."
)
)
triazine
plot(triazine[1])
Witec also saves ASCII data with spectra in columns (Export \(\rightarrow\) Table), see 7.9.
The NIST (National Institute of Standards and Technology) has published a data base of basic atomic emission spectra see http://physics.nist.gov/PhysRefData/Handbook/periodictable.htm with emission lines tabulated in ASCII (HTML) files.
Here’s an example how to extract the data of the Hg strong lines file:
file <- readLines("fileio/NIST/mercurytable2.htm")
# file <- readLines("http://physics.nist.gov/PhysRefData/Handbook/Tables/mercurytable2.htm")
file <- file[-(1:grep("Intensity.*Wavelength", file) - 1)]
file <- file[1:(grep("</pre>", file)[1] - 1)]
file <- gsub("<[^>]*>", "", file)
file <- file[!grepl("^[[:space:]]+$", file)]
colnames <- file[1]
colnames <- gsub("[[:space:]][[:space:]]+", "\t", file[1])
colnames <- strsplit(colnames, "\t")[[1]]
if (!all(colnames == c("Intensity", "Wavelength (Å)", "Spectrum", "Ref. "))) {
stop("file format changed!")
}
tablestart <- grep("^[[:blank:]]*[[:alpha:]]+$", file) + 1
tableend <- c(tablestart[-1] - 2, length(file))
tables <- list()
for (t in seq_along(tablestart)) {
tmp <- file[tablestart[t]:tableend[t]]
tables[[t]] <- read.fwf(textConnection(tmp), c(5, 8, 12, 15, 9))
colnames(tables[[t]]) <- c("Intensity", "persistent", "Wavelength", "Spectrum", "Ref. ")
tables[[t]]$type <- gsub("[[:space:]]", "", file[tablestart[t] - 1])
}
tables <- do.call(rbind, tables)
levels(tables$Spectrum) <- gsub(" ", "", levels(tables$Spectrum))
Hg.AES <- list()
for (s in levels(as.factor(tables$Spectrum))) {
Hg.AES[[s]] <- new("hyperSpec",
wavelength = tables$Wavelength[tables$Spectrum == s],
spc = tables$Intensity[tables$Spectrum == s],
data = data.frame(Spectrum = s),
label = list(
.wavelength = expression(lambda / ring(A)),
spc = "I"
)
)
}
Matlab files can be read and written using the package R.matlab, which is available at CRAN and can be installed by install.packages("R.matlab")
.
spc.mat <- readMat("fileio/spectra.mat")
If the .mat
file was saved with compression, the additional package Rcompression is needed.
It can be installed from omegahat:
install.packages("Rcompression", repos = "http://www.omegahat.org/R")
See the documentation of package R.matlab for more details and possibly needed further packages.
Function readMat()
imports the .mat
file’s contents as a list.
The variables in the .mat
file are appropriately named elements of the list.
The hyperSpec
object can be created using new()
, see section 2.
Again, users probably want to wrap the import of their Matlab files into a function.
package R.matlab’s function writeMat()
can be used to write R objects into .mat
files.
To save an hyperSpec
object x
for use in Matlab, you most likely want to save:
wl(x)
,x[[]]
, andx$..
labels(x)
.x$.
yields the extra data together with the spectra matrix.However, it may be convenient to transform the saved data according to how it is needed in Matlab.
The functions as.long.df()
and as.wide.df()
may prove useful for reshaping the data.
A custom import function for .mat
files written by Cytospec is available:
Note that Cytospec files can contain multiple versions of the data, the so-called blocks.
The block to be read can be specified with the block
argument.
With block = TRUE
, the function will read all blocks into a list:
read.mat.Cytospec("fileio/mat.cytospec/cytospec.mat", blocks = TRUE)
otherwise, select a block:
read.mat.Cytospec("fileio/mat.cytospec/cytospec.mat", blocks = 1)
Function read.cytomat
is
now defunct.
Function read.cytomat()
has
been renamed to read.mat.Cytospec()
to be more consistent with the general naming scheme of the file import
functions.
Please use read.mat.Cytospec()
instead.
ENVI files are binary data accompanied by an ASCII header file.
Package hyperSpec’s function read.ENVI()
can be used to import them.
Usually, the header file name is the same as the binary data file name, with the suffix replaced by .hdr
. Otherwise, the header file name can be given via parameter header file`{.r}.
As we experienced missing header files (Bruker’s Opus software frequently produced header files without any content), the data that would usually be read from the header file can also be handed to read.ENVI()
as a list in parameter **header**
.
Arguments are given in header
replace corresponding entries of the header file.
The help page gives details on what elements the list should contain, see also the discussion of ENVI files written by Bruker’s OPUS software (section 7.2.
Here is how to use read.ENVI()
:
spc <- read.ENVI("fileio/ENVI/example2.img")
spc
Please see also the manufacturer specific notes in section 7.1.
spc
Files
Thermo Galactic’s .spc
file format can be imported by read.spc()
.
Official File Format Documentation
The specification used to be available at Thermo Scientific. Anyone knowing where it moved please contact me (hyperSpec’s GitHub repository) — I’m looking for a reasonably official website (i.e. at Thermo) rather than some random site with a copy.
A variety of sub-formats exists. package hyperSpec’s import function read.spc()
does not support the old file format that was used before 1996.
In addition, no test data with w planes was available — thus, the import of such files could not be tested.
If you come across such files, please contact the package maintainer (hyperSpec’s GitHub repository).
The header and subheader blocks of spc files store additional information of pre-defined types (see the file format specification[1]). Further information can be stored in the so-called log block at the end of the file and should be in a key-value format (although even the official example files do not always). This information is often useful (Kaiser’s Hologram software, e.g., stores the stage position in the log block).
Function read.spc()
has four arguments that allow fine-grained control of storing such information in the hyperSpec
object:
keys.hdr2data |
parameters from the spc file and subfile headers that should become extra data columns. |
keys.log2data |
parameters from the spc file log block that should become extra data columns. |
keys.*2log |
parameters are deprecated because the logbook itself is depecated. |
The value of these arguments can either be logical (amounting to either use all or none of the information in the file) or a character vector giving the names of the parameters that should be used. Note that the header file field names are always lowercase.
Here is how to find out what extra information could be read from the header and log:
read.spc("fileio/spc.Kaisermap/ebroAVII.spc", keys.hdr2data = TRUE)
read.spc("fileio/spc.Kaisermap/ebroAVII.spc", keys.log2data = TRUE)
.spc
files may contain multiple spectra that do not share a common wavelength axis.
In this case, read.spc()
returns a list of hyperSpec
objects with one spectrum each.
Function collapse()
may be used to combine this list into one hyperSpec
object:
barbiturates <- read.spc("fileio/spc/BARBITUATES.SPC")
class(barbiturates)
length(barbiturates)
barbiturates <- collapse(barbiturates, collapse.equal = FALSE)
barbiturates
barbiturates[[, , 25 ~ 30]]
Many spectrometer manufacturers provide a function to export their spectra into ASCII files. The functions discussed above are written in a very general way and are highly customizable. We recommend wrapping these calls with the appropriate settings for the spectra format in an import function. Please consider contributing such import filters to package hyperSpec: send us the documented code (for details, see the box at the beginning of this document). If there is any format not mentioned in this document (even without the need of new converters), please let me know (details again in the box at the beginning of this document).
We use read.ENVI()
to import IR-Images collected with a Bruker Hyperion spectrometer with OPUS software.
As mentioned above, the header files are frequently empty.
We found the necessary information to be:
header <- list(
samples = 64 * no.images.in.row,
lines = 64 * no.images.in.column,
bands = no.data.points.per.spectrum,
`data type` = 4,
interleave = "bip"
)
No spatial information is given in the ENVI header (if written). The lateral coordinates can be set up by specifying origin and pixel size for \(x\) and \(y\) directions. For details, please see the help page.
The proprietary file format of the Opus software is not yet supported.
Also, Nicolet saves imaging data in ENVI files.
These files use some non-standard keywords in the header file that should reconstruct the lateral coordinates and the wavelength axes and units for wavelength and intensity axis.
Package hyperSpec has a specialized function read.ENVI.Nicolet()
that uses these header entries.
It seems that the position of the first spectrum is recorded in \(mu m\){}, while the pixel size is in mm.
Thus a flag nicolet.correction
is provided that divides the pixel size by 1000.
Alternatively, the correct offset and pixel size values may be given as function arguments.
spc <- read.ENVI.Nicolet("fileio/ENVI/example2.img", nicolet.correction = TRUE)
spc ## dummy sample with all intensities zero
Spectra obtained using Kaiser’s Hologram software can be saved either in their own .hol
format and imported into Matlab (from where the data may be written to a .mat
file readable by package R.matlab’s readMat()
.
Hologram can also write ASCII files and .spc
files.
We found working with .spc files the best option.
Hologram usually interpolates the spectra to an evenly spaced wavelength (or \(\Delta\tilde\nu\)) axis unless the spectra are saved in a by-pixel manner.
In this case, the full spectra consist of two files with consecutive file names: one for the low and one for the high wavenumber region.
See the example for .spc
import.
The ASCII files are long format that can be imported by read.txt.long()
(see section 5).
We experienced two different problems with these files:
2,
).
This may be a problem for certain conversion functions (read.table()
works fiThus care must be taken:
## 1. import as character
tmp <- scan("fileio/txt.Kaiser/test-lo-4.txt", what = rep("character", 4), sep = ",")
tmp <- matrix(tmp, nrow = 4)
## 2. concatenate every two columns by a dot
wl <- apply(tmp[1:2, ], 2, paste, collapse = ".")
spc <- apply(tmp[3:4, ], 2, paste, collapse = ".")
## 3. convert to numeric and create hyperSpec objectne, though).
spc <- new("hyperSpec", spc = as.numeric(spc), wavelength = as.numeric(wl))
spc
package hyperSpec provides the function read.spc.KaiserMap()
to easily import spatial collections of
.spc
files written by Kaiser’s Hologram software.
The filenames of all .spc
files to be read into one hyperSpec
object can be provided either as a character vector or as a wildcard expression (e.g., "path/to/files/*.spc"
).
The data for the following example was saved with the wavelength axis being camera pixels rather than the Raman shift. Thus two files for each spectrum were saved by Hologram. Thus, a file name pattern is difficult to give, and a vector of file names is used instead:
files <- Sys.glob("fileio/spc.Kaisermap/*.spc")
spc.low <- read.spc.KaiserMap(files[seq(1, length(files), by = 2)])
spc.high <- read.spc.KaiserMap(files[seq(2, length(files), by = 2)])
wl(spc.high) <- wl(spc.high) + 1340
spc
Renishaw’s Wire software comes with a file format converter.
This program can produce a long ASCII format, .spc
, or .jdx
files.
We experienced that the conversion to .spc
is not fully reliable: maps were saved as depth profiles, losing all spatial information.
Also, an evenly spaced wavelength axis was produced, although this was de-selected in the converter.
We, therefore, recommend using the ASCII format.
Otherwise the import using read.spc()
worked.
An optimized import function for the ASCII files is available: read.txt.Renishaw()
.
The file may be compressed via gzip, bzip2, xz or lzma.
Zip compressed files are read via read.zip.Renishaw()
.
The ASCII files can easily become very large, particularly with linefocus or streamline imaging.
Function read.txt.Renishaw()
provides two mechanisms to avoid running out of memory during data import.
The file may be imported in chunks of a given number of lines (see the last example).
Function read.txt.Renishaw()
can calculate the correct number of wavelengths (i.e., data points per spectrum) if the system command wc
is available on your computer.
Also, the processing of the long ASCII format into the spectra matrix is done by reshaping the vector of intensities into a matrix.
This process does not allow any missing values in the data.
Therefore it is not possible to import multi-spectra files with individually “zapped” spectra using read.txt.Renishaw()
.
The second argument to read.txt.Renishaw()
decides what type of experiment is imported.
Supported types are:
"xyspc" |
maps, images, multiple spectra with \(x\) and \(y\) coordinates (default) |
"spc" |
single spectrum |
"depth" , "zspc" |
depth series |
"ts" |
time series |
Instead of a file name, read.txt.Renishaw()
accepts also a connection.
paracetamol <- read.txt.Renishaw("fileio/txt.Renishaw/paracetamol.txt", "spc")
paracetamol
read.txt.Renishaw("fileio/txt.Renishaw/laser.txt.gz", data = "ts")
Very large files can be read in chunks to save memory:
read.txt.Renishaw("fileio/txt.Renishaw/chondro.txt", nlines = 1e5, nspc = 875)
R accepts a variety of compressed file formats for ASCII files:
read.txt.Renishaw("fileio/txt.Renishaw/chondro.gz")
read.txt.Renishaw("fileio/txt.Renishaw/chondro.xz")
read.txt.Renishaw("fileio/txt.Renishaw/chondro.lzma")
read.txt.Renishaw("fileio/txt.Renishaw/chondro.gz")
read.txt.Renishaw("fileio/txt.Renishaw/chondro.bz2")
read.zip.Renishaw("fileio/txt.Renishaw/chondro.zip")
Horiba’s Labspec software (e. g. LabRAM spectrometers) saves spectra in a wide ASCII format which is read by read.txt.Horiba()
, e. g.:
spc <- read.txt.Horiba("fileio/txt.HoribaJobinYvon/ts.txt",
cols = list(
t = "t / s", spc = "I / a.u.",
.wavelength = expression(Delta * tilde(nu) / cm^-1)
)
)
spc
Note that Labspec .txt
files can contain lots of spectra with zero intensity: Labspec saves a complete rectangular grid even if only part of a map was measured.
These spectra are by removed by default if option file.remove.emptyspc
is TRUE
(the default).
For convenience, functions to further wrappers to import maps (read.txt.Horiba.xy()
) and time series (read.txt.Horiba.t()
) are provided.
spc <- read.txt.Horiba.xy("fileio/txt.HoribaJobinYvon/map.txt")
if (any(dim(spc) != c(141, 4, 616)) ||
any(abs(spc) < .Machine$double.eps^.5) ||
is.null(spc$x) || any(is.na(spc$x)) ||
is.null(spc$y) || any(is.na(spc$y)) ||
length(setdiff(wl(spc), 1:616)) == 0L) {
stop("error in testing read.txt.Horiba.xy. Please contact ", maintainer("hyperSpec"))
}
spc
Andor Solis exports ASCII files that can be read with read.asc.Andor()
:
read.asc.Andor("fileio/asc.Andor/ASCII-Andor-Solis.asc")
The Witec project software supports exporting spectra as Thermo Galactic .spc
files.
.spc
is in general the recommended format for package hyperSpec import.
For imaging data, no spatial information for the set of spectra is provided (in version 2.10, this export option is not supported).
Imaging data (but also single spectra and time series) can be exported as ASCII X and Y files (Save ASCII X and Save ASCII Y, not supported in version 4).
These can be read by read.dat.Witec()
:
read.dat.Witec("fileio/txt.Witec/Witec-timeseries-x.dat")
read.dat.Witec(
filex = "fileio/txt.Witec/Witec-Map-x.dat",
points.per.line = 5, lines.per.image = 5, type = "map"
)
Note that the Y data files also contain wavelength information, but (at least Witec Project 2.10) this information is always wavelength in nm, not Raman shift in wavenumbers: this is provided by the X data file only.
Another option is Witec’s txt table ASCII export (Export \(\rightarrow\) Table), which produces ASCII files with each row corresponding to one wavelength.
The first column contains the wavelength axis; all further columns contain one spectrum each column.
Such files can be read with read.txt.Witec()
:
read.txt.Witec()
determines the number of wavelengths automatically.
Note that there are several Export Filter Options.
It is possible to determine which units should be used for the export (see XUnits tab).
It is also possible to export two additional header lines containing information about spectra labels and units.
Therefore parameters hdr.label
and hdr.units
have to be set properly.
Otherwise, either an error will be displayed like
or the one or two wavelengths will be skipped.
Depending on the used export options the header files should look like:
For imaging data set parameter type
to “map”.
If the label header is exported, the spatial information can be extracted from this line.
Otherwise, at least one, points.per.line
or lines.per.image
, has to be given manually, if not, a warning will be shown.
For line scans and z-stacks use type = "single"
because the provided information is looking the same as for time series, so no further information can be extracted from the header files.
Since version 4 WITec Project offers the Graph ASCII export (Export \(\rightarrow\) Graph ASCII), which produces three ASCII files, named Header containing additional information, X-Axis containing the wavelength values and Y-Axis containing the spectra one spectrum in each column.
Data exported in this way can be read with read.txt.Witec.Graph()
:
read.txt.Witec.Graph("fileio/txt.Witec/Witec-timeseries (Header).txt")
read.txt.Witec.Graph("fileio/txt.Witec/Witec-Map (Header).txt", type = "map")
read.txt.Witec.Graph("fileio/txt.Witec/nofilename (Header).txt", encoding = "latin1")
This function reads the spectra files automatically, if they are appropriately named and extracts additional information of the header file.
As for the other Witec functions, it is possible to read image data by selecting type = "map"
.
Line scans and z-stacks should be read as single spectra.
This section gives examples of how to write import functions. The first example implements an import filter for an ASCII file format, basically from scratch. The second example shows how to implement more details for an already existing import filter.
read.txt.PerkinElmer
The raw spectra of the flu
data set (see also the separate vignette) are in PerkinElmer’s ASCII file format, one spectrum per file.
We need a function that automatically reads all files specified by a pattern, such as *.txt
.
To gain speed, users should preallocate the spectra matrix after the first reading of the file.
A short examination of the files (flu*.txt
in directory txt.PerkinElmer
) reveals that the actual spectrum starts at line 55, after a line containing \#DATA
.
For now, no other information about the files is to be extracted.
It is thus easier to skip the first 54 lines than searching for the line after \#DATA
.
A fully-featured import function should support:
scan()
.
This comes handy in case the function is used later to import other data types.hyperSpec
object is a consistent result: There is no need to stop with an error, but it is polite to issue an additional warning.hyperSpec
object (column filename
) and deleting empty spectra.
These options can be globally switched on or off by options.# The contents of "read.txt.PerkinElmer.R"
read.txt.PerkinElmer <- function(files = stop("filenames needed"), ..., label = list()) {
## set default labels
label <- modifyList(
list(
.wavelength = expression(lambda / nm),
spc = expression(I[fl] / "a.u.")
),
label
)
if (length(files) == 0) {
warning("No files found.")
return(new("hyperSpec"))
}
## read the first file
buffer <- matrix(scan(files[1], ...), ncol = 2, byrow = TRUE)
## first column gives the wavelength vector
wavelength <- buffer[, 1]
## preallocate the spectra matrix:
## one row per file x as many columns as the first file has
spc <- matrix(ncol = nrow(buffer), nrow = length(files))
## the first file's data goes into the first row
spc[1, ] <- buffer[, 2]
## now read the remaining files
for (f in seq(along = files)[-1]) {
buffer <- matrix(scan(files[f], ...), ncol = 2, byrow = TRUE)
## check whether they have the same wavelength axis
if (!all.equal(buffer[, 1], wavelength)) {
stop(paste(files[f], "has different wavelength axis."))
}
spc[f, ] <- buffer[, 2]
}
## make the hyperSpec object
spc <- new("hyperSpec", wavelength = wavelength, spc = spc, label = label)
## consistent file import behaviour across import functions
hyperSpec::.spc_io_postprocess_optional(spc, files)
}
imports the spectra.
Note how the labels are set.
The label with the special name .wavelength
corresponds to the wavelength axis, all data columns should have a label with the same name.
The spectra are always in a data column called spc
.
Thus,
source("read.txt.PerkinElmer.R")
read.txt.PerkinElmer(Sys.glob("fileio/txt.PerkinElmer/flu?.txt"), skip = 54)
imports the spectra.
The hyperSpec package does not export this function: while it is already useful for importing files, it is not general enough to work immediately with new data, e.g., completely ignoring the file header. Thus information like the excitation wavelength is lost.
read.ENVI.Nicolet()
The function read.ENVI.Nicolet()
is an excellent example of a more specific import filter derived from a generic filter for the particular file type.
Nicolet FT-IR Imaging software saves some non-standard keywords in the header file of the ENVI data.
This information can be used to reconstruct the \(x\) and \(y\) axes of the images.
The units of the spectra are saved as well.
Function read.ENVI.Nicolet()
thus first adjusts the parameters for read.ENVI()
.
Then read.ENVI()
does the main work of importing the file.
The resulting hyperSpec
object is post-processed according to the special header entries.
For using the function, see section 7.3.
# The contents of "read.ENVI.Nicolet.R"
read.ENVI.Nicolet <- function(
file = stop("read.ENVI: file name needed"),
headerfile = NULL,
header = list(),
..., # goes to read.ENVI
x = NA, y = NA, # NA means: use the specifications from the header file if possible
nicolet.correction = FALSE) {
## the additional keywords to interprete must be read from headerfile
headerfile <- .find.ENVI.header(file, headerfile)
keys <- readLines(headerfile)
keys <- .read.ENVI.split.header(keys)
keys <- keys[c("description", "z plot titles", "pixel size")]
header <- modifyList(keys, header)
## most work is done by read.ENVI
spc <- read.ENVI(
file = file, headerfile = headerfile, header = header, ...,
x = if (is.na(x)) 0:1 else x,
y = if (is.na(y)) 0:1 else y
)
### From here on processing the additional keywords in Nicolet's ENVI header ****
## z plot titles ----------------------------------------------------------------
## default labels
label <- list(
x = expression(`/`(x, micro * m)),
y = expression(`/`(y, micro * m)),
spc = "I / a.u.",
.wavelength = expression(tilde(nu) / cm^-1)
)
## get labels from header information
if (!is.null(header$"z plot titles")) {
pattern <- "^[[:blank:]]*([[:print:]^,]+)[[:blank:]]*,.*$"
tmp <- sub(pattern, "\\1", header$"z plot titles")
if (grepl("Wavenumbers (cm-1)", tmp, ignore.case = TRUE)) {
label$.wavelength <- expression(tilde(nu) / cm^(-1))
} else {
label$.wavelength <- tmp
}
pattern <- "^[[:blank:]]*[[:print:]^,]+,[[:blank:]]*([[:print:]^,]+).*$"
tmp <- sub(pattern, "\\1", header$"z plot titles")
if (grepl("Unknown", tmp, ignore.case = TRUE)) {
label$spc <- "I / a.u."
} else {
label$spc <- tmp
}
}
## modify the labels accordingly
spc@label <- modifyList(label, spc@label)
## set up spatial coordinates --------------------------------------------------
## look for x and y in the header only if x and y are NULL
## they are in `description` and `pixel size`
## set up regular expressions to extract the values
p.description <- paste(
"^Spectrum position [[:digit:]]+ of [[:digit:]]+ positions,",
"X = ([[:digit:].-]+), Y = ([[:digit:].-]+)$"
)
p.pixel.size <- "^[[:blank:]]*([[:digit:].-]+),[[:blank:]]*([[:digit:].-]+).*$"
if (is.na(x) && is.na(y) &&
!is.null(header$description) && grepl(p.description, header$description) &&
!is.null(header$"pixel size") && grepl(p.pixel.size, header$"pixel size")) {
x[1] <- as.numeric(sub(p.description, "\\1", header$description))
y[1] <- as.numeric(sub(p.description, "\\2", header$description))
x[2] <- as.numeric(sub(p.pixel.size, "\\1", header$"pixel size"))
y[2] <- as.numeric(sub(p.pixel.size, "\\2", header$"pixel size"))
## it seems that the step size is given in mm while the offset is in micron
if (nicolet.correction) {
x[2] <- x[2] * 1000
y[2] <- y[2] * 1000
}
## now calculate and set the x and y coordinates
x <- x[2] * spc$x + x[1]
if (!any(is.na(x))) {
spc@data$x <- x
}
y <- y[2] * spc$y + y[1]
if (!any(is.na(y))) {
spc@data$y <- y
}
}
spc
}
Type | Format | Manufacturer | Spectroscopy | Function | Link | Notes |
---|---|---|---|---|---|---|
Andor Solis ASCII | ||||||
ASCII | Andor Solis ASCII | Andor | Raman |
read.asc.Andor()
|
7.8 | |
array | ||||||
binary | array | 2 | ||||
ASCII long | ||||||
ASCII | ASCII long |
read.txt.long()
|
5 | |||
ASCII | ASCII long | Renishaw | Raman |
read.txt.Renishaw()
|
7.6 | |
ASCII | ASCII long | Kaiser | Raman |
read.txt.long()
|
7.5.1 | Not recommended, see discussion |
ASCII | ASCII long | Perkin Elmer | Fluorescence |
read.txt.PerkinElmer()
|
8.1 | Reads multiple files, needs to be sourced. |
ASCII long (zipped) | ||||||
ASCII | ASCII long (zipped) | Renishaw | Raman |
read.zip.Renishaw()
|
7.6 | |
ASCII wide | ||||||
ASCII | ASCII wide |
read.txt.wide()
|
5 | |||
ASCII | ASCII wide | Horiba Jobin Yvon | Raman |
read.txt.Horiba()
|
7.7 | e.g., LabRAM spectrometers |
ASCII | ASCII wide | Horiba Jobin Yvon | Raman |
read.txt.Horiba.xy()
|
7.7 | e.g., LabRAM spectrometer maps |
ASCII | ASCII wide | Horiba Jobin Yvon | Raman |
read.txt.Horiba.t()
|
7.7 | e.g., LabRAM spectrometer time series |
ASCII wide transposed | ||||||
ASCII | ASCII wide transposed | Witec | Raman |
read.txt.Witec()
|
7.9 | Export Table |
ENVI | ||||||
binary | ENVI |
read.ENVI()
|
6.2 | |||
binary | ENVI | Bruker | Infrared Imaging |
read.ENVI()
|
7.2 | |
binary | ENVI | Nicolet | Infrared Imaging |
read.ENVI.Nicolet()
|
7.3 | |
hol | ||||||
binary | hol | Kaiser | Raman | 7.5 | via Matlab | |
JCAMP-DX | ||||||
ASCII | JCAMP-DX |
read.jdx()
|
?? | |||
ASCII | JCAMP-DX | Renishaw | Raman |
read.jdx()
|
?? | |
ASCII | JCAMP-DX | Shimadzu | GCxGC-MS |
read.txt.Shimadzu()
|
||
ASCII | JCAMP-DX | PerkinElmer | Infrared |
read.jdx()
|
?? | import for subset of the JCAMP-DX standard |
Matlab | ||||||
binary | Matlab | Matlab |
R.matlab::readMat()
|
6.1 | ||
binary | Matlab | Cytospec |
read.mat.Cytospec()
|
6.1.2 | ||
binary | Matlab | Witec | Raman |
read.mat.Witec()
|
||
matrix | ||||||
binary | matrix | 2 | ||||
Opus | ||||||
binary | Opus | Bruker | Infrared Imaging | 7.2 | ||
other | ||||||
ASCII | other | Shimadzu | GC,GC-MS |
read.jdx()
|
?? | import for subset of the JCAMP-DX standard |
spc | ||||||
binary | spc |
read.spc()
|
6.3 | |||
binary | spc | Kaiser | Raman Map |
read.spc.KaiserMap()
|
7.5.2 | Reads multiple files |
binary | spc | Kaiser | Raman |
read.spc.Kaiser()
|
Efficiently reads multiple files | |
binary | spc | Kaiser | Raman |
read.spc.KaiserLowHigh()
|
Reads multiple pairs of low and high wavenumber region spcs | |
binary | spc | Kaiser | Raman |
read.spc()
|
6.3 | |
binary | spc | Renishaw | Raman |
read.spc()
|
7.6 | Not recommended, see discussion of ASCII files. |
binary | spc | Witec | Raman |
read.spc()
|
6.3 | spc export not available for images |
binary | spc | Horiba | Raman |
read.spc()
|
6.3 | |
spe | ||||||
binary | spe | Princeton Instruments | Raman |
read.spe()
|
?? | WinSpec |
Witec ASCII | ||||||
ASCII | Witec ASCII | Witec | Raman |
read.dat.Witec()
|
7.9 | Save ASCII X, Save ASCII Y |
Witec Graph ASCII | ||||||
ASCII | Witec Graph ASCII | Witec | Raman |
read.txt.Witec.Graph()
|
7.9 | Export Table in 3 separate files (Header, X-Axis, Y-Axis) |
Type | Format | Manufacturer | Spectroscopy | Function | Link | Notes |
---|---|---|---|---|---|---|
Andor | ||||||
ASCII | Andor Solis ASCII | Andor | Raman |
read.asc.Andor()
|
7.8 | |
Bruker | ||||||
binary | ENVI | Bruker | Infrared Imaging |
read.ENVI()
|
7.2 | |
binary | Opus | Bruker | Infrared Imaging | 7.2 | ||
Cytospec | ||||||
binary | Matlab | Cytospec |
read.mat.Cytospec()
|
6.1.2 | ||
Horiba | ||||||
binary | spc | Horiba | Raman |
read.spc()
|
6.3 | |
Horiba Jobin Yvon | ||||||
ASCII | ASCII wide | Horiba Jobin Yvon | Raman |
read.txt.Horiba()
|
7.7 | e.g., LabRAM spectrometers |
ASCII | ASCII wide | Horiba Jobin Yvon | Raman |
read.txt.Horiba.xy()
|
7.7 | e.g., LabRAM spectrometer maps |
ASCII | ASCII wide | Horiba Jobin Yvon | Raman |
read.txt.Horiba.t()
|
7.7 | e.g., LabRAM spectrometer time series |
Kaiser | ||||||
ASCII | ASCII long | Kaiser | Raman |
read.txt.long()
|
7.5.1 | Not recommended, see discussion |
binary | hol | Kaiser | Raman | 7.5 | via Matlab | |
binary | spc | Kaiser | Raman Map |
read.spc.KaiserMap()
|
7.5.2 | Reads multiple files |
binary | spc | Kaiser | Raman |
read.spc.Kaiser()
|
Efficiently reads multiple files | |
binary | spc | Kaiser | Raman |
read.spc.KaiserLowHigh()
|
Reads multiple pairs of low and high wavenumber region spcs | |
binary | spc | Kaiser | Raman |
read.spc()
|
6.3 | |
Matlab | ||||||
binary | Matlab | Matlab |
R.matlab::readMat()
|
6.1 | ||
Nicolet | ||||||
binary | ENVI | Nicolet | Infrared Imaging |
read.ENVI.Nicolet()
|
7.3 | |
Perkin Elmer | ||||||
ASCII | ASCII long | Perkin Elmer | Fluorescence |
read.txt.PerkinElmer()
|
8.1 | Reads multiple files, needs to be sourced. |
PerkinElmer | ||||||
ASCII | JCAMP-DX | PerkinElmer | Infrared |
read.jdx()
|
?? | import for subset of the JCAMP-DX standard |
Princeton Instruments | ||||||
binary | spe | Princeton Instruments | Raman |
read.spe()
|
?? | WinSpec |
Renishaw | ||||||
ASCII | ASCII long | Renishaw | Raman |
read.txt.Renishaw()
|
7.6 | |
ASCII | ASCII long (zipped) | Renishaw | Raman |
read.zip.Renishaw()
|
7.6 | |
ASCII | JCAMP-DX | Renishaw | Raman |
read.jdx()
|
?? | |
binary | spc | Renishaw | Raman |
read.spc()
|
7.6 | Not recommended, see discussion of ASCII files. |
Shimadzu | ||||||
ASCII | other | Shimadzu | GC,GC-MS |
read.jdx()
|
?? | import for subset of the JCAMP-DX standard |
ASCII | JCAMP-DX | Shimadzu | GCxGC-MS |
read.txt.Shimadzu()
|
||
Witec | ||||||
ASCII | Witec ASCII | Witec | Raman |
read.dat.Witec()
|
7.9 | Save ASCII X, Save ASCII Y |
ASCII | ASCII wide transposed | Witec | Raman |
read.txt.Witec()
|
7.9 | Export Table |
ASCII | Witec Graph ASCII | Witec | Raman |
read.txt.Witec.Graph()
|
7.9 | Export Table in 3 separate files (Header, X-Axis, Y-Axis) |
binary | Matlab | Witec | Raman |
read.mat.Witec()
|
||
binary | spc | Witec | Raman |
read.spc()
|
6.3 | spc export not available for images |
ASCII | ASCII long |
read.txt.long()
|
5 | |||
ASCII | ASCII wide |
read.txt.wide()
|
5 | |||
ASCII | JCAMP-DX |
read.jdx()
|
?? | |||
binary | ENVI |
read.ENVI()
|
6.2 | |||
binary | spc |
read.spc()
|
6.3 | |||
binary | array | 2 | ||||
binary | matrix | 2 |
Type | Format | Manufacturer | Spectroscopy | Function | Link | Notes |
---|---|---|---|---|---|---|
Fluorescence | ||||||
ASCII | ASCII long | Perkin Elmer | Fluorescence |
read.txt.PerkinElmer()
|
8.1 | Reads multiple files, needs to be sourced. |
GC,GC-MS | ||||||
ASCII | other | Shimadzu | GC,GC-MS |
read.jdx()
|
?? | import for subset of the JCAMP-DX standard |
GCxGC-MS | ||||||
ASCII | JCAMP-DX | Shimadzu | GCxGC-MS |
read.txt.Shimadzu()
|
||
Infrared | ||||||
ASCII | JCAMP-DX | PerkinElmer | Infrared |
read.jdx()
|
?? | import for subset of the JCAMP-DX standard |
Infrared Imaging | ||||||
binary | ENVI | Bruker | Infrared Imaging |
read.ENVI()
|
7.2 | |
binary | Opus | Bruker | Infrared Imaging | 7.2 | ||
binary | ENVI | Nicolet | Infrared Imaging |
read.ENVI.Nicolet()
|
7.3 | |
Raman | ||||||
ASCII | ASCII wide | Horiba Jobin Yvon | Raman |
read.txt.Horiba()
|
7.7 | e.g., LabRAM spectrometers |
ASCII | ASCII wide | Horiba Jobin Yvon | Raman |
read.txt.Horiba.xy()
|
7.7 | e.g., LabRAM spectrometer maps |
ASCII | ASCII wide | Horiba Jobin Yvon | Raman |
read.txt.Horiba.t()
|
7.7 | e.g., LabRAM spectrometer time series |
ASCII | ASCII long | Renishaw | Raman |
read.txt.Renishaw()
|
7.6 | |
ASCII | ASCII long (zipped) | Renishaw | Raman |
read.zip.Renishaw()
|
7.6 | |
ASCII | ASCII long | Kaiser | Raman |
read.txt.long()
|
7.5.1 | Not recommended, see discussion |
ASCII | JCAMP-DX | Renishaw | Raman |
read.jdx()
|
?? | |
ASCII | Witec ASCII | Witec | Raman |
read.dat.Witec()
|
7.9 | Save ASCII X, Save ASCII Y |
ASCII | ASCII wide transposed | Witec | Raman |
read.txt.Witec()
|
7.9 | Export Table |
ASCII | Witec Graph ASCII | Witec | Raman |
read.txt.Witec.Graph()
|
7.9 | Export Table in 3 separate files (Header, X-Axis, Y-Axis) |
ASCII | Andor Solis ASCII | Andor | Raman |
read.asc.Andor()
|
7.8 | |
binary | Matlab | Witec | Raman |
read.mat.Witec()
|
||
binary | hol | Kaiser | Raman | 7.5 | via Matlab | |
binary | spc | Kaiser | Raman |
read.spc.Kaiser()
|
Efficiently reads multiple files | |
binary | spc | Kaiser | Raman |
read.spc.KaiserLowHigh()
|
Reads multiple pairs of low and high wavenumber region spcs | |
binary | spc | Kaiser | Raman |
read.spc()
|
6.3 | |
binary | spc | Renishaw | Raman |
read.spc()
|
7.6 | Not recommended, see discussion of ASCII files. |
binary | spc | Witec | Raman |
read.spc()
|
6.3 | spc export not available for images |
binary | spc | Horiba | Raman |
read.spc()
|
6.3 | |
binary | spe | Princeton Instruments | Raman |
read.spe()
|
?? | WinSpec |
Raman Map | ||||||
binary | spc | Kaiser | Raman Map |
read.spc.KaiserMap()
|
7.5.2 | Reads multiple files |
ASCII | ASCII long |
read.txt.long()
|
5 | |||
ASCII | ASCII wide |
read.txt.wide()
|
5 | |||
ASCII | JCAMP-DX |
read.jdx()
|
?? | |||
binary | Matlab | Matlab |
R.matlab::readMat()
|
6.1 | ||
binary | Matlab | Cytospec |
read.mat.Cytospec()
|
6.1.2 | ||
binary | ENVI |
read.ENVI()
|
6.2 | |||
binary | spc |
read.spc()
|
6.3 | |||
binary | array | 2 | ||||
binary | matrix | 2 |
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.0 (2024-04-24)
#> os Ubuntu 22.04.4 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en
#> collate C.UTF-8
#> ctype C.UTF-8
#> tz UTC
#> date 2024-05-27
#> pandoc 3.1.11 @ /opt/hostedtoolcache/pandoc/3.1.11/x64/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> bookdown 0.39 2024-04-15 [2] RSPM
#> brio 1.1.5 2024-04-24 [2] RSPM
#> bslib 0.7.0 2024-03-29 [2] RSPM
#> cachem 1.1.0 2024-05-16 [2] RSPM
#> cli 3.6.2 2023-12-11 [2] RSPM
#> colorspace 2.1-0 2023-01-23 [2] RSPM
#> deldir 2.0-4 2024-02-28 [2] RSPM
#> digest 0.6.35 2024-03-11 [2] RSPM
#> dplyr 1.1.4 2023-11-17 [2] RSPM
#> evaluate 0.23 2023-11-01 [2] RSPM
#> fansi 1.0.6 2023-12-08 [2] RSPM
#> fastmap 1.2.0 2024-05-15 [2] RSPM
#> generics 0.1.3 2022-07-05 [2] RSPM
#> ggplot2 * 3.5.1 2024-04-23 [2] RSPM
#> glue 1.7.0 2024-01-09 [2] RSPM
#> gtable 0.3.5 2024-04-22 [2] RSPM
#> highr 0.11 2024-05-26 [2] RSPM
#> htmltools 0.5.8.1 2024-04-04 [2] RSPM
#> htmlwidgets 1.6.4 2023-12-06 [2] RSPM
#> hyperSpec * 0.200.0.9000 2024-05-27 [1] local
#> hySpc.testthat 0.2.1 2020-06-24 [2] RSPM
#> interp 1.1-6 2024-01-26 [2] RSPM
#> jpeg 0.1-10 2022-11-29 [2] RSPM
#> jquerylib 0.1.4 2021-04-26 [2] RSPM
#> jsonlite 1.8.8 2023-12-04 [2] RSPM
#> kableExtra 1.4.0 2024-01-24 [2] RSPM
#> knitr 1.46 2024-04-06 [2] RSPM
#> lattice * 0.22-6 2024-03-20 [4] CRAN (R 4.4.0)
#> latticeExtra 0.6-30 2022-07-04 [2] RSPM
#> lazyeval 0.2.2 2019-03-15 [2] RSPM
#> lifecycle 1.0.4 2023-11-07 [2] RSPM
#> magrittr 2.0.3 2022-03-30 [2] RSPM
#> munsell 0.5.1 2024-04-01 [2] RSPM
#> pillar 1.9.0 2023-03-22 [2] RSPM
#> pkgconfig 2.0.3 2019-09-22 [2] RSPM
#> png 0.1-8 2022-11-29 [2] RSPM
#> purrr 1.0.2 2023-08-10 [2] RSPM
#> R.cache 0.16.0 2022-07-21 [2] RSPM
#> R.matlab * 3.7.0 2022-08-25 [2] RSPM
#> R.methodsS3 1.8.2 2022-06-13 [2] RSPM
#> R.oo 1.26.0 2024-01-24 [2] RSPM
#> R.utils 2.12.3 2023-11-18 [2] RSPM
#> R6 2.5.1 2021-08-19 [2] RSPM
#> RColorBrewer 1.1-3 2022-04-03 [2] RSPM
#> Rcpp 1.0.12 2024-01-09 [2] RSPM
#> rlang 1.1.3 2024-01-10 [2] RSPM
#> rmarkdown 2.27 2024-05-17 [2] RSPM
#> rstudioapi 0.16.0 2024-03-24 [2] RSPM
#> sass 0.4.9 2024-03-15 [2] RSPM
#> scales 1.3.0 2023-11-28 [2] RSPM
#> sessioninfo 1.2.2 2021-12-06 [2] RSPM
#> stringi 1.8.4 2024-05-06 [2] RSPM
#> stringr 1.5.1 2023-11-14 [2] RSPM
#> styler 1.10.3 2024-04-07 [2] RSPM
#> svglite 2.1.3 2023-12-08 [2] RSPM
#> systemfonts 1.1.0 2024-05-15 [2] RSPM
#> testthat 3.2.1.1 2024-04-14 [2] RSPM
#> tibble 3.2.1 2023-03-20 [2] RSPM
#> tidyselect 1.2.1 2024-03-11 [2] RSPM
#> utf8 1.2.4 2023-10-22 [2] RSPM
#> vctrs 0.6.5 2023-12-01 [2] RSPM
#> viridisLite 0.4.2 2023-05-02 [2] RSPM
#> withr 3.0.0 2024-01-16 [2] RSPM
#> xfun 0.44 2024-05-15 [2] RSPM
#> xml2 1.3.6 2023-12-04 [2] RSPM
#> yaml 2.3.8 2023-12-11 [2] RSPM
#>
#> [1] /tmp/RtmpxUEHmP/temp_libpath17ce4b3a89e0
#> [2] /home/runner/work/_temp/Library
#> [3] /opt/R/4.4.0/lib/R/site-library
#> [4] /opt/R/4.4.0/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────────────