Department of Food Science

Faculty of Science

University of Copenhagen

# Rank-deficient spectral FIA data

## Problem

Nørgaard & Ridder (1994) have investigated a problem of measuring samples with three different analytes on a flow injection analysis (FIA) system where a pH-gradient is imposed. The data are interesting from a data analytical point of view, especially as an illustration of closure or rank-deficiency and the use of constraints.

## Get the data

The data are available in zipped MATLAB 4.2 format. Download the data and write load data in MATLAB. If you use the data we would appreciate that you report the results to us as a courtesey of the work involved in producing and preparing the data. Also you may want to refer to the data by referring to

Nørgaard L, Ridder C, Rank annihilation factor analysis applied to flow injection analysis with photodiode-array detection. Chemometrics and Intelligent Laboratory Systems 23:107, 1994

The data have also been described in

- Bro, R, Multi-way Analysis in the Food Industry. Models, Algorithms, and Applications. 1998. Ph.D. Thesis, University of Amsterdam (NL) & Royal Veterinary and Agricultural University (DK).
- Bro R, Sidiropoulos ND, Least squares algorithms under unimodality and non-negativity constraints. Journal of Chemometrics 12:223, 1998
- Kiers HAL, Smilde AK, Constrained three-mode factor analysis as a tool for parameter estimation with second-order instrumental data. Journal of Chemometrics, 1998, 12, 125-147.
- Bro R, Harshman RA, Sidiropoulos N, Rank-deficient models for multi-way data, Journal of Chemometrics, Submitted.

## Data

The basic setup of the FIA system is shown in Figure 1a. A carrier stream containing a Britton-Robinson buffer of pH 4.5 is continuously injected into the system with a flow of 0.375 mL/min. The 77 µL of sample and 770 µL of reagent (Britton-Robinson buffer pH 11.4) are injected simultaneously into the system by a six-port valve and the absorbance is detected by a diode-array detector (HP 8452A) from 250 to 450 nm in two nanometer intervals. The absorption spectrum is determined every second 89 times during one injection. By the use of both a carrier and a reagent (Figure 1b) a pH gradient is induced over the sample plug from pH 4.5 to 11.4.

The three analytes present in the samples are 2-, 3-, and 4-hydroxy-benzaldehyde (HBA). All three analytes have different absorption spectra depending on whether they are in their acidic or basic form. Twelve samples of different constitution (Table 1) are measured. Thus the data set is a 12 (samples) × 100 (wavelengths) × 89 (times) array. The time mode of dimension 89 is also a pH profile due to the pH-gradient.

*Table 1. The concentrations of the three analytes in the 12 samples.*

Sample |
2HBA |
3HBA |
4HBA |

1 |
0.05 | 0.05 | 0.06 |

2 |
0.05 | 0.10 | 0.04 |

3 |
0.05 | 0.10 | 0.06 |

4 |
0.10 | 0.05 | 0.04 |

5 |
0.10 | 0.10 | 0.04 |

6 |
0.10 | 0.10 | 0.06 |

7 |
0 | 0.10 | 0.04 |

8 |
0 | 0.10 | 0.06 |

9 |
0.05 | 0 | 0.06 |

10 |
0.10 | 0 | 0.06 |

11 |
0.05 | 0.10 | 0 |

12 |
0.10 | 0.05 | 0 |

For each sample a landscape is obtained showing the spectra for all times, or conversely the time profiles for all wavelengths (see below).

It is characteristic of FIA that there is no physical separation of the sample. All analytes have the same dilution profile due to dispersion, i.e., all analytes will have equally shaped *total* time profile. Above this profile is shown to the left bottom. This profile thus maintains its shape at all wavelengths for all samples and for all analytes. The total profile is the profile actually detected by the photometer (the manifest profile) and is the sum of the profiles of protonated and deprotonated analytes. Due to the pH-gradient, and depending on the pK_{a} of a given analyte, an analyte will show up with different amounts of its acidic and basic form at different times, and hence will have different acidic and basic profiles in the sample plug. In the figure above these profiles are shown for one analyte. The first part of the sample plug, i.e., the earliest measurements of a sample, is dominated by deprotonated analytes while the end of the sample plug is dominated by protonated analytes.

**Structural model**

In order to specify a mathematical model for the data array of FIA data initially ignore the time domain and consider only one specific time, i.e., one specific pH. An *I* × *J* matrix called **X**_{k} is obtained where *I* is the number of samples (12), *J* is the number of wavelengths (100), and *k* indicates the specific pH/time selected.

There are three analytes with three corresponding concentration profiles and there are six spectra, an acidic and a basic for each analyte. A standard bilinear model would be an obvious decomposition method for this matrix, but this is not very descriptive in this case. In the sample mode, a three-dimensional decomposition is preferable, as there are only three different analytes. However, each analyte exists in two forms (acid/base), so there will be six different spectra, to be resolved, requiring a six-dimensional decomposition in the spectral mode. To accommodate these seemingly conflicting requirements, a more general model can be used instead

**X**_{k} = **AHB**^{T}, (1)

where **A** is an *I* × 3 matrix, and the columns are vectors describing the variations in the sample domain (ideally the concentrations in Table 4), **B** is a *J* × 6 vector describing the variations in the spectral domain (ideally the pure spectra), and **H** is a 3 × 6 matrix which defines the interactions between the columns of **A** and **B**. In this case it is known how the analyte concentrations relate to the spectra, as the acidic and basic spectrum of, e.g., 2HBA only relate to the concentration of 2HBA. Therefore **H** reads

(2)

The matrix **H **assures that the contribution of the first analyte to the model is given by the sum of **a**_{1}**b**_{1}^{T} and **a**_{1}**b**_{2}^{T} etc. By using only ones and zeros any information in **H **about the relative size of the interactions is removed; this information is represented in **B**. The **H** matrix is reserved for coding the interaction *structure* of the model.

So far, only a single time/pH has been considered. To represent the entire data set, the model must be generalized into a multi-way form. For each time the data can be represented by the model above except that it is necessary to adjust it such that the changes in relative concentration (acidic and basic fraction) can be represented as well. The relative concentration of each of the six acidic and basic analytes can be represented by a 6 × 1 vector at each time. The relative concentrations at all *K* times is held in the *K* × 6 matrix **C**. To use the generic model at the *k*th time it thus is necessary to scale the contribution from each analyte by its corresponding relative concentration. The six weights from the *k*th row of **C** are placed in a 6 × 6 diagonal matrix **D**_{k} so that the *s*th diagonal element gives the relative amount of the *s*th species. The model can be then written

(3) |

or, in other words, as

**X**_{k} = **AHD**_{k}**B**^{T}. *k* = 1, ..., *K* (4)

Note how the use of a distinct **H** and **C** (**D**_{k}) matrix allows the qualitative and quantitative relationships between **A** and **B** to be expressed separately. The interaction matrix **H**, which is globally defined, gives the interaction *structure*; it shows exactly which factors in **A** are interacting with which factors in **B**. In contrast, the **C** matrix gives the interaction *magnitudes*. For every *k* the *k*th row of **C** (diagonal of **D**_{k}) shows to which extent each interaction is present at the given *k*. The distinction between qualitative and quantitative aspects is especially important, since knowledge of the exact pattern of interactions is not always available. Not fixing **H** as here allows for exploring the type of interaction. This can be helpful for rank-deficient problems in general. The matrix **C** also has a straightforward interpretation as each column in **C** will be the estimated FIAgram or time profile of the given analyte in its acidic or basic form. Note that the model above bares some resemblance to the PARAFAC model

**X**_{k} = **AD**_{k}**B**^{T}, (6)

but differs mainly by the introduction of the matrix **H,** which enables the interactions between factors in different modes. It also enables **A** and **B**/**C** to have different column dimensions.

The PARATUCK2 model is given

(7)

while the FIA model is

**X**_{k} = **AHD**_{k}**B**^{T}. (8)

This FIA model can be fitted a what has been called a restricted PARATUCK2 model (which is now more generally referred to as a PARALIND model - see references above). The matrix **H** remains fixed during the analysis to ensure that every analyte only interact with two spectra/profiles, namely its acidic and its basic counterpart.

*Copyright © 1996-2002 Rasmus Bro (rb@kvl.dk). All rights reserved*