Contents | Data used: None currently. |
Preprocessing of higher-order arrays is more complicated than in the two-way case, though understandable in light of the multilinear variation presumed to be an acceptable model of the data. Centering serves the same purpose as in two-way analysis, namely to remove constant terms in the data, that may otherwise at best need an extra component, at worst make modeling impossible.
All models described here are implicitly based on that the data are ratio-scale (interval-scale with a natural origin), i.e., that there is a natural zero which really does correspond to zero (no presence means no signal) and that the measurements are otherwise proportional such that doubling the amount of a phenomenon implies that its corresponding contribution to the data is doubled. If data are not approximately ratio-scale then centering the data is also mandatory.
Centering is performed to make the data compatible with the structural
model. Scaling on the other hand is a way of making the data compatible
with the least squares loss function normally used. Scaling does not change
the structural model of the data, but only the weight paid to errors of
specific elements in the estimation. Scaling is dramatically simpler than
using a weighted loss function, and is therefore to be preferred to this,
if approximate homoscedastic data can be obtained by scaling. Centering
and scaling will be described using three-way arrays in the following.
For instance, if it is known that the true model consists of one PARAFAC term (a trilinear component) and an overall level, it may seem feasible to estimate a PARAFAC model on the original data subtracted the grand level. However, even though the mathematical structure might theoretically be true, the subtraction of the grand level introduces some artifacts in the data, not easily described by the PARAFAC model. In this case even though the grand level has been subtracted two components are still necessary to describe the data. This shows that the preprocessing has not achieved its goal of simplifying the subsequent model. If on the other the data are centered across one mode the data can be modeled by a one-component model. Another possibility is to estimate a two-component model but constraining one component to have constant loadings in each mode, thus reflecting the grand level. This provide a model with a unique estimate of the grand level (see box below).
Scaling in multi-way analysis has to be done, taking the trilinear model into account. It is not, as for centering, appropriate to scale the unfolded array column-wise, but rather whole slabs or submatrices of the array should be scaled. If variable j of the second mode is to be scaled (compared to the rest of the variables in the second mode), it is necessary to scale all columns where variable j occurs by the same scalar. This means that whole matrices instead of columns has to be scaled. For a four-way array, three-way arrays would have to scaled. Mathematically scaling within the first mode can be described
Another complicating issue, is the interdependence of centering and scaling. Scaling within one mode disturbs prior centering across the same mode, but not across other modes. Centering across one mode disturbs scaling within all modes. Hence only centering across arbitrary modes or scaling within one mode is straightforward, and furthermore not all combinations of iterative scaling and centering will converge. In practice, though, it need not influence the outcome much if an iterative approach is not used. Scaling to a sum-of-squares of one is arbitrary anyway and it may be just as reasonable to just scale, e.g., by variances, within the modes of interest once, thereby having at least mostly equalized any huge differences in scale. Centering can then be performed after scaling and thereby it is assured that the modes to be centered are indeed centered.
The N-way tutorial
Copyright © 1998
R. Bro