Contents | Data used: brod.mat
(courtesy
of Magni Martens, KVL). Eight judges assessed ten bread with respect to
11 attributes. The data are in the matrix X. The samples are pair-wise
replicates. The salt content of each bread is also known.
Purpose: Using N-PLS on a simple data set to learn about the method and to see the importance of using a model of appropriate structure. Information: R. Bro, Multi-way calibration. Multi-linear PLS, J. Chemom., 1996, 10(1), 47-62. Prerequisites: Be sure to understand the basics of handling multi-way arrays in MATLAB (Chapter 1). You should know your two-way PLS |
Unlike the fluorescence data described previously, there is no similar fundamental theory for how sensory data ideally behaves. Thus, using a trilinear model in this case can not be justified as hard model as is the case in well-behaved spectral data. However, even more significant is the idea of latent variables. That is, if we assume that all assessors use the same latent or basic types of sensations only in different proportions, then this exactly what the trilinear model states.
If the data are unfolded such that the breads are the row-mode and the attributes and judges are the column-mode (see figure above) then the data can be written as X = [X_{1} X_{2} .. X_{8}], where Xk is the assessments of the kth judge. If we make an F-component two-way PCA model of this matrix we obtain the approximation X = TP^{T} where T is the 10 by F score matrix pertaining to the breads and P is the 88 by F loading matrix pertaining to both assessors and attributes. As seen from the figure the loading elements for each assessor are not directly related to the loading elements of the other assessors. The first eleven rows of P correspond to the first assessor etc. Thus, in unfolding we impose no relation between different assessors. Each assessor is assumed to have his or her own idiosyncratic perception.
In maintaining the structure of the three-way data we obtain instead the PARAFAC model X_{k} = AD_{k}B^{T} where A is the 10 by F score matrix pertaining to the breads (similar to T above), B is the 11 by F loading matrix pertaining to both attributes and D_{k }is a diagonal matrix holding the kth row of C which is the 8 by F loading matrix for the judges. By writing the trilinear model as above it is seen that for sensory data, the model imposed on the data is that all assessors use the same basic type of sensations given by B but each assessor use these latent variables in different proportions. For example the kth assessor uses the first component with a relative magnitude of c_{(k,1)} which is the first diagonal element in D_{k}.
The trilinear model underlying both PARAFAC and N-PLS is more restricted
than unfolding models. Therefore the fit of a trilinear model will per
definition be lower than the fit of a corresponding bilinear model. However,
the bilinear model is mostly overly flexible and the increased fit is to
a large extent attributable to fitting the noise of the data. In the trilinear
model it is much more difficult to overfit, since any variation incorporated
in the model must be consistent over all assessors (or similar). In the
bilinear model above each component uses 98 (10+88) parameters while a
trilinear component uses only 29 (10+11+8) parameters. This clearly illustrates
that for the bilinear model to be suitable there must be a large deviation
from the trilinear model. Otherwise the increased number of parameters
will only fit noise. And to the degree that the trilinear model is only
approximately correct, incorporating an additional trilinear component
is still by far more parsimonious than using a bilinear component.
Task: Estimate a two-component two-way PCA of the unfolded centered data. Try to interpret the scores and loadings. Estimate a two-component PARAFAC model of the centered data. Interpret. For which model are the replicates located most closely in the score plot. Why?
Task: Use unfold as well as three-way PLS for building a regression model for salt using every second sample. Predict the salt contents in the remaining samples. You'll need a matlab m-file for PLS calibration in this exercise (for example from the PLS Toolbox) or alternatively you can use the trilinear PLS model but set the size of the three-way array to have one variable mode dimension equal to all variables (88) and the other variable mode equal to dimension 1.
Use the plotting functions in MATLAB (or in the PLS_Toolbox) to plot the scores and loadings to investigate the models
The N-way tutorial
Copyright © 1998
R. Bro