Seventh workshop of the ERCIM Work Group on Matrix Computations and Statistics:
October 31, 2005, Limassol, Cyprus
Organized during the 3^{rd} world conference on Computational Statistics & Data Analysis
ERCIM: European Research Consortium on Informatics and Mathematics 
data analysis – a key technology in modern society
Jump to:
General information about Copenhagen
Program  
FRIDAY April 1  SATURDAY April 2  SUNDAY April 3 
Session 2 Chair Rasmus Bro Surrogate models 
Session 4 Chair Frans van den Berg Data analysis in applications 

09:0009:30h Coffee and Danish  09:0009:45h Coffee and Danish  
09:3010:30h Keynote speaker Hans Bruun Nielsen Technical University of Denmark Surrogate models: Kriging, radial basis functions, etc. Slides (PDF) 10:3011:00h 
09:4510:30h Keynote speaker Björn Persson Foss Analytical, Denmark Problems not mentioned in the standard textbooks 10:3011:00h 

11:0011:30h Coffee break  11:0011:30h Coffee break  
11:3012:00h Rasmus Bro The Royal Veterinary and Agricultural University (KVL), Denmark MILES  Maximum Likelihood Least Squares fitting Slides (PDF) 12:0012:30h 
11:3012:00h Berkant Savas Department of Mathematics Linköping University, Sweden Handwritten digit classification by use of higher order singular value decomposition 12:0012:30h 12:30h Closing of meeting by Lars Eldén 

12:3014:00h Lunch  13:00h Sandwichestogo  
Session 1. Chair Hans Bruun Nielsen Tensor models: algorithms and applications 
Session 3. Chair Lars Eldén Linear Algebra and Statistics 

13:3014:00h LekHeng Lim Institute for Computational and Mathematical Engineering, Stanford University, USA Tensors for Chemists and Psychologists Slides (PDF) 14:0015:00h 
14:0015:00h Keynote speaker Søren Holdt Jensen Aalborg University, Denmark Noise Reduction via Truncated Matrix Decompositions 15:0015:30h 

15:0015:30h Coffee break  15:3016:00h Coffee Break  
15:3016:00h Lars Eldén Department of Mathematics Linköping University, Sweden Multilinear mappings, SVD, HOSVD, and the numerical solution of illconditioned tensor least squares problems Slides (PDF) 16:0016:30h 16:3017:00h 
16:0016:30h Diana Sima and Sabine Van Huffel ESATSISTA K.U. Leuven, Belgium Choice of truncation levels for core linear systems Slides (HTML) 16:3017:00h 

20:00h Conference dinner Brasserie Chit Chat address: Sankt Peders Stræde 24 A Location/Map (Danish) plus Some info (Danish) Route/Map (English) 
1. Tensor models: algorithms and applications
Higherorder tensors (also known as multiway arrays) occur frequently in many applications, ranging from psychometrics, chemometrics, to image and signal processing and data mining. Methods exist for decomposing tensors and for extracting information, but further algorithmic development is necessary to deal with the large data sets that are common today. This section of the workshop aims at bringing together researchers in numerical linear algebra, statistics and application areas to discuss issues in tensor models.
Keynote speaker: M. Alex O. Vasilescu, New York University, "A Tensor Framework for Computer Vision and Graphics".
Surrogate models are used in many areas, such as environment control, movie making, and optimizing car design for strength. Basically a surrogate model is an approximation function in ndimensional space, which is cheap to evaluate. Special methods go under names like Kriging, response surfaces, neural nets and radial basis functions. This section of the workshop aims at bringing together researchers in numerical analysis, statistics and application areas to discuss methods, algorithms and applications.
Keynote speaker: Hans Bruun Nielsen, Technical University of Denmark, "Surrogate models: Kriging, radial basis functions, etc.".
3. Linear Algebra and Statistics
Linear algebra provides a natural setting for many algorithms in computational statistics, due to the availability of a variety of efficient and robust algorithms for matrix computations, such as QR factorizations, SVD etc. The talks in this theme will focus on the development of such algorithms and/or their application in statistical computations.
Keynote speaker: Søren Holdt Jensen, Aalborg University, "Noise Reduction via Truncated Matrix Decompositions".
4. Data analysis in applications
Keynote speaker: Björn Persson, Foss Analytical, "Problems not mentioned in the standard textbooks".
Program committee: Lars Eldén, Rasmus Bro, Per Christian Hansen, Hans Bruun Nielsen
Important dates
2nd announcement and call for papers: January 15, 2005. (TEXT)
Abstract submission deadline: March 10, 2005.
Papers presented at the ERCIM meeting and containing a strong computational statistics and/or data analytic component can be submitted for publication in CSDA Journal (Computational Statistics and Data Analysis Journal) according to a peer but fast review procedure.
The CSDA is the official journal of the International Association for Statistical Computing.
Local organization
The workshop will be organized by the Spectroscopy and Chemometrics group of KVL, Denmark.
Full address:
The Royal Veterinary and Agricultural University / KVL
Department of Food Science
Quality and Technology
Spectroscopy and Chemometrics Group
Rolighedsvej 30
1958 Frederiksberg C
Denmark
(secretariat)
Phone +45 3528 3264
Fax +45 3528 3245
Email: gk@kvl.dk (Ms. Gilda Kischinovsky)
See PDFfile or Images 1 and 2 for details on the venue Thorvaldsensvej 40, Frederiksberg C.
In order to register for the sixth workshop, send an email to Ms. Gilda Kischinovsky gk@kvl.dk.
The price for attending will be 1500 DKK (~200 Euro).
A Payment Form can be found here (MSWord document); please fill out this form entirely including date and signature,
and return by 'snailmail' (address above) , fax (+45 3528 3245) or a computerscanned version via email ( gk@kvl.dk ).
Please provide your name and affiliation as well as possible title and abstract of your contribution.
General information about Copenhagen
Abstracts
Tensors for Chemists and Psychologists
LekHeng Lim
Institute for Computational and Mathematical Engineering, Stanford University, USA
We will introduce tensors in a modern way that is concrete, conceptually simple, and mathematically accurate. We take a view towards facilitating numerical computations of tensors and applications in multiway data analysis. Our primary focus will be the different notions of ranks and their associated decompositions and approximations. Time permitting, we will also discuss contravariance and covariance, contraction product, multilinear functions, and the tensorial generalizations of matrix determinant, eigenvalues, and singular values.
We will show that two equivalent definitions of tensors give rise to two distinct generalizations of matrix rank, which we propose calling the outer product rank and the multilinear rank. The former is associated with the CANDECOMP/PARAFAC model and the latter with the TUCKER model. The outer product rank has the advantage of being a single integer but it has mathematical properties that differ vastly from those of matrix rank. On the other hand, the multilinear rank of an orderk tensor is a ktuple of integers but it has the advantage of preserving most, if not all, common features of the matrix rank.
Multilinear mappings, SVD, HOSVD, and the numerical solution of illconditioned tensor least squares problems
Lars Eldén
Department of Mathematics
Linköping University, Sweden
Multilinear mappings occur in several applications, e.g. the restoration of blurred images. The restoration problem can be formulated as an illconditioned linear least squares problem, involving a tensor. Any linear mapping on a finitedimensional vector space can be expressed as a singular value expansion (SVD), which can also be used to solve the restoration problem.
On the other hand, the tensor can be decomposed using the Higher Order SVD (HOSVD), which is equivalent to the Tucker model in psychometrics and chemometrics. We discuss what information about the tensor can be obtained from the HOSVD. We also present some results concerning the relation between the SVD and the HOSVD, using the restoration problem as an example.
For an "almost hypercube", dense tensor the computation of the HOSVD requires one order of magnitude less flops than the computation of the SVD. We describe how an approximate truncated SVD solution of an illconditioned tensor least squares problem can be obtained via the HOSVD.
There are also applications in data analysis involving tensor models.
Globally convergent algorithms for CANDECOMP/PARAFAC with semidefinite programming
Speaker: LekHeng Lim
Institute for Computational and Mathematical Engineering, Stanford University, USA
The PARAFAC (aka CANDECOMP) model for the statistical analysis of kway data may be reduced to the problem of finding an optimal rankr approximation to an orderk tensor. It has been shown that an optimal solution need not exist when k > 2 and r > 1. When it is known a priori that an optimal solution exists, computing the solution is often difficult because of the nonconvexity of the objective function. The existing method, the Alternating Least Squares (ALS) algorithm, does not guarantee convergence to a global minimum.
We propose an algorithm based on semidefinite programming (SDP) that will frequently converge to a global minima. The nonconvex problem associated with PARAFAC can be relaxed to a convex problem which can in turn be recast as an SDP problem  a convex optimization problem whose global solution can be found in polynomial time complexity and can in practice be computed with efficient interior point methods. The relaxed problem in question represents a first level relaxation of the original PARAFAC problem of which higher level relaxations can be formulated. The sequence of optimal solutions to these higher level relaxed problems is guaranteed to converge to the true global PARAFAC optimum in the limit.
This work draws heavily on recent breakthroughs in multivariate polynomial optimization and is done jointly with KimChuan Toh, who is currently visiting the Department of Management Science and Engineering, Stanford University.
Performances of some common algorithms for fitting the PARAFAC model and simple tools to efficiently implement them
Giorgio Tomasi
Royal Veterinary & Agricultural University, Denmark
In the past few years, several methods have been proposed to fit the PARAFAC model. Most of them are simple modifications of the Alternating Least Squares algorithm that was introduced in the 1970 and that now lends its name to the model itself. However, only few attempts have been made using standard nonlinear least squares solvers like the GaussNewton method. The main reason seems to be that PARAFAC fitting gives rise to nonlinear problems that can be extremely large, far exceeding memory and computational power of standard personal computers, and relatively illconditioned. Since PARAFACALS breaks down the fitting problem in small linear ones, it is less affected by this aspect. Moreover, PARAFACALS is also relatively simple to implement and performs well in most cases and this has greatly favoured its use in spite of some well known shortcomings.
An overview of the performances of the algorithms currently available will be presented, but great focus will be given to the great simplifications allowed for some key steps of standard nonlinear least squares solvers by the inherent structure and redundancy of the Jacobian matrix of a PARAFAC model. These simplifications can be straightforwardly obtained from some properties of the columnwise KhatriRao product and can dramatically improve the performances of some algorithms. Thus, fast formulae to compute the cross product of the Jacobian matrix J, the products of the J with a vector and the exact Hessian of a PARAFAC model will be illustrated along with the effect of their implementation on some common nonlinear least square solvers.
Implementation issues related to variational data assimilation: some preliminary results and conclusions
Zahari Zlatev
National Environmental Research Institute, Denmark
The variational data assimilation approach could be viewed as an attempt to adjust globally the results of the model to the complete set of available observations (Talagrand and Courtier [3]). This approach has the theoretical advantage of providing exact consistency between the dynamics of the model and the final results of the assimilation. The idea was probably applied for the first time in 1971 by Morel et al. [2]. They defined and implemented a procedure in which the assimilating model is repeatedly integrated forward and backward in time, the observations being constantly introduced in the model. The heuristic idea behind that procedure, supported by a number of numerical results, was that repeated calculations and corrections of model results would converge to a solution which would be compatible, at least to a certain degree of accuracy, with the observations.
Results from some preliminary investigations (based mainly on running simple tests that were studied in Lewies and Derber [1]) will be presented. Several conclusions will be drawn. Data assimilation techniques will normally lead to huge computational tasks, which implies that one should try to find out, a priori, whether to apply data assimilation or not in a particular study. The important problem of formulating a set of the conditions under which the data assimilation approach will probably not give good results (these conditions have to be checked carefully before applying the data assimilation procedure) will be discussed. Plans for future work in this direction will be outlined.
[1] J. M. Lewis and J. C. Derber: “The use of adjoint equations to solve a variational adjustment problem with advective constraints”. Tellus, 37A (1985), 309322.
[2] P. Morel, G. Lefevre and G. Rabreau: “On initialization and nonsynoptic data assimilation”. Tellus, 23 (1971), 197206.
[3] O. Talagrand and Ph. Courtier: “Variational assimilation of meteorological observations with the adjoint vorticity equation. I: Theory”. Quaterly Journal of the Royal Meteorological Society, 113 (1987), 13111328.
Equivalent methods for the low rank matrix approximation problem
Mieke Schuermans, I. Markovsky, P.D. Wentzell and S. Van Huffel
Department of Electrical Engineering, Katholieke Universiteit Leuven, Belgium
Keywords: rank reduction, weighted norm, total least squares, numerical algorithms
In practice, measured real life data are noisy. To describe the data often it is assumed that these data can be written as a sum of true values which satisfy a linear model, and a noise term. To search for this best model approximation of the noisy data, the data are arranged in a data matrix C and a rank deficient matrix R has to be found which is as close as possible to the data matrix C with respect to some norm. When the measurement noise is centered, normal and independent identically distributed, the optimal closeness norm is the Frobenius norm. This is used in the wellknown Total Least Squares (TLS) and the Principal Component Analysis (PCA) methods. Nevertheless, when the measurement errors are not identically distributed the Frobenius norm is no longer optimal and a weighted norm is needed instead. We will discuss the Weighted Low Rank Approximation (WLRA), the Maximum Likelihood PCA (MLPCA) and the ElementWise weighted TLS (EWTLS). The purpose of this talk is to explore the tight equivalences between MLPCA and EWTLS. Despite their seemingly different problem formulation, it is shown that both methods can be reduced to the same mathematical kernel problem, i.e. finding the closest (in a certain sense) weighted low rank matrix approximation where the weight is derived from the distribution of the errors in the data. Different solution approaches, as used in MLPCA and EWTLS, are compared on chemical data in computational efficiency and convergence behavior.
Dimensionality reduction and volume minimization  generalization of the determinant minimization criterion for reduced rank regression problems
Berkant Savas
Department of Mathematics
Linköping University, Sweden
Reduced rank regression problems are often encountered in various scientific areas. In order for the problem to be well defined, it is assumed that the determinant of the objective matrix is never zero. To be able to handle all possible scenarios we propose a generalization of the determinant minimization criterion, namely maximal rank reduction and then volume minimization of the objective matrix. The volume of arbitrary matrices is defined as the product of the nonzero singular values. If the objective matrix is nonsingular the generalized criterion is equivalent to the determinant criterion. To illustrate the ideas we solve a given generalized reduced rank regression problem from the system identification context.
Choice of truncation levels for core linear systems
Diana Sima and Sabine Van Huffel
ESATSISTA K.U. Leuven, Belgium
In the context of data analysis, people are often facing mathematical formulations that belong to the class of illposed problems. For nearlycompatible linear algebraic systems, for instance, classical methods such as least squares or total least squares (also known as linear regression and, respectively, orthogonal regression), fail to provide relevant solutions in the illposed cases. Specialized methods that overcome this type of difficulties should be designed and used with care.
We focus here on the truncation algorithms (see Fierro et al.) and on the delicate issue of choosing the most appropriate truncation level. We note that one of the most efficient algorithms for solving the Truncated Total Least Squares problem  a (Lanczos) bidiagonalization method (in Fierro et al.)  has also the property that it readily provides a reduced core system in the sense that (Paige & Strakos) introduced.
Several methods for choosing an appropriate truncation level from given data, such as Generalized CrossValidation, Akaike's Information Criterion, and the Lcurve, can be specially adapted to this problem. We show how these methods can be made efficient by incorporating the truncation level decision as a stopping criterion into a bidiagonalization algorithm. Moreover, we study the effect that the loss of orthogonality during bidiagonalization might have on the optimal selection of the truncation level and on the final solution.
[Fierro et al] R.D. Fierro, G.H. Golub, P.C. Hansen, and D.P. O'Leary. "Regularization by truncated Total Least Squares.'' SIAM Journal on Scientific Computing, 18(1):12231241, 1997.
[Paige and Strakos] C.C. Paige and Z. Strakos. "Core problems in linear algebraic systems.'' Submitted to SIAM Journal on Matrix Analysis and Applications, 2004. Available from http://www.cs.cas.cz/~strakos/publications.html.
Probability for multiplicative ANOVAmodels
Frans van den Berg
The Royal Veterinary and Agricultural University (KVL), Denmark
Traditionally, Design Of Experiments (DOE) data is analyzed using additive regression models. More recently multiplicative factor models have been used successfully to get insight into the variance contributions of DOEdata [14]. Main arguments for multiway multiplicative model analyses of designdata are:
 The data might simply fit better in a multiplicative representation!
 The data collected during experimentation can remain in its ‘natural shape’ (table, cube or hypercube), and analyzed as such.
 All the (powerful) graphicallyoriented dataanalysis tools from chemometrics remain accessible during DOE analysis.
There is however one serious drawback:
 Statistical inference (e.g. significance testing) is not yet ‘implemented’ for multiplicative testing, and this is certainly desired, if only to get a wider acceptance in the more statisticallyoriented community.
We will discuss  using datasets collected in our daily work  how model parameter uncertainty estimates can lead to parameter/effect probability values, serving the same role they have in ‘classical additive ANOVA’.
[1] J. Mandel ‘A new analysis of variance model for nonadditive data’ Technometrics 13(1971)118
[2] L. Nannerup, M. Jakobsen M, F. van den Berg, J.K.S. Møller, J.S. Jensen and G. Bertelsen ‘Colour preservation of MApacked sliced meat products by control of critical packaging parameters’ Meat Science 68/4(2004)577585
[3] R. Bro and M. Jakobsen ‘Exploring complex interactions in designed data using GEMANOVA. Color changes in fresh beef during storage’ Journal of Chemometrics 6(2002)294304
[4] R. Bro and H. Heimdal ‘Enzymatic browning of vegetables ‘Calibration and analysis of variance by multiway methods’ Chemometrics and Intelligent Laboratory Systems 1(1996)85102
A Robust version of Principal components Analysis : ROBPCA
S. Engelen and M. Hubert
Department of Mathematics,
Katholieke Universiteit Leuven, Belgium
Principal components analysis (PCA) is a very well known dimension reduction technique, in which the principal components are uncorrelated linear combinations of the original variables. The principal components can be extracted in various ways. We can obtain them by maximizing the variance of the projected observations. Equivalently the principal components can be computed as the eigenvectors of the covariance matrix of the data. Minimizing the sum of the squared orthogonal distances of the observations to the PCA subspace, is a third approach. These three formulations are equivalent in classical PCA, but they are all inadequate when outlying samples are present in the data. Robust PCA aims at finding eigenvalues which are not so sensitive to possible contaminated cases.
Different methods have been proposed, by robustifying each of the three approaches. This has lead to projection pursuit algorithms, which minimize a robust variance of the projections (e.g. (Hubert et al., 2002)), to PCA based on robust covariance estimators (e.g. (Campbell, 1980)) and methods that trim the orthogonal distances (Maronna, 2005). Here, we propose another robust alternative ROBPCA (Hubert et al., 2005), which combines the three approaches (projection pursuit, robust covariance estimation and the trimming procedure) in one algorithm.
Next we discuss how the number of principal components to retain are determined with ROBPCA. Different techniques exist to define the optimal number, e.g. the screeplot (Jolliffe, 1986) and the prediction residual error sum of squares (PRESS) (see e.g. (Wold, 1978)) are widely used. We introduce a robust and fast way to compute the optimal dimension based on the leaveoneout crossvalidated PRESS values. As naive leaveoneout crossvalidation for robust resampling algorithms is highly time consuming, it is very important to construct procedures that yield good approximations in feasible time. We show that our proposed method is much faster, but almost as accurate as the naive approach. Finally, we show how ROBPCA can be used for multiway data and how it leads to a robust version of the PARAFAC algorithm (see e.g. (Smilde et al., 2004)). We illustrate its robustness with examples and several diagnostic displays.
N.A. Campbell. Robust procedures in multivariate analysis I: Robust covariance estimation. Applied Statistics, 29:231–237, 1980.
M. Hubert, P. J. Rousseeuw, and K. Vanden Branden. ROBPCA: a new approach to robust principal components analysis. Technometrics, 47:64–79, 2005.
M. Hubert, P. J. Rousseeuw, and S. Verboven. A fast robust method for principal components with applications to chemometrics. Chemometrics and Intelligent Laboratory Systems, 60: 101–111, 2002.
I.T. Jolliffe. Principal Component Analysis. Springer, New York, 1986. R. A. Maronna. Principal components and orthogonal regression based on robust scales. Technometrics, 2005. to appear.
A. Smilde, R. Bro, and P. Geladi. Multiway Analysis with Applications in the Chemical Sciences. Wiley, England, 2004.
S. Wold. Crossvalidatory estimation of the number of components in factor and principal components models. Technometrics, 20:397–405, 1978.
Handwritten digit classification by use of higher order singular value decomposition
Berkant Savas
Department of Mathematics
Linköping University, Sweden
Handwritten digit classification is one of the standard and thoroughly investigated classification problems in the literature. In this talk I'll give two different classification algorithms based on higher order singular value decomposition (HOSVD). The first algorithm uses HOSVD to compute orthonormal basis for the different classes. The second algorithm uses HOSVD to make a data compression in two modes of the data set. Good classification results are achieved even after a data reduction by 99%.