Prototypical Experimental Coupled Data Sets
Data from multiple sources often have shared and unshared underlying structures. For instance, when samples are measured using different analytical techniques, some chemicals are visible to several analytical techniques whereas some chemicals can be captured only using a specific analytical method. In order to have a prototypical example of such coupled data sets with shared/unshared underlying structures, we have measured a set of mixtures with known chemical composition using different analytical techniques, i.e., fluorescence spectroscopy, NMR (Nuclear Magnetic Resonance) spectroscopy and LC-MS (Liquid Chromatography - Mass Spectrometry).
Five chemicals with different relative sizes, hence, different diffusion, were selected: two peptides, a single amino acid, a sugar and an alcohol, i.e., Valine-Tyrosine-Valine (Val-Tyr-Val), Tryptophan-Glycine (Trp-Gly), Phenylalanine (Phe), Maltoheptaose (Malto) and Propanol. 29 samples were prepared with varying concentrations according to a predetermined design (see Table 1) in a phosphate buffer (pH 7.4). The buffer was prepared with deuterated water according to a protocol for biological samples (O. Beckonert et al., Metabolic profiling, metabolomic and metabonomic procedures for nmr spectroscopy of urine, plasma, serum and tissue extracts. Nature Protocols 2, 2692-2703, 2007) but with a 10-fold increase in the concentration of TSP (sodium 3-(trimethylsilyl)-propionate-2,2,3,3-d4) in order to ensure sufficient signal intensity for reference deconvolution. The 10-fold increase in the concentration of TSP did not affect the pH of the buffer. All chemicals were purchased from Sigma Aldrich and used without further purification. Samples were stored at 5ºC until they were measured.
Table 1: Concentrations of the chemicals in milliMolar
Measurements using Different Analytical Techniques:
Samples are measured using fluorescence spectroscopy forming Excitation-Emission Matrices (EEM), Liquid Chromatography-Mass Spectrometry and diffusion Nuclear Magnetic Resonance Spectroscopy. Out of five chemicals, all five chemicals are visible to NMR, three chemicals (i.e., Val-Tyr-Val, Trp-Gly, Phe) are visible using fluorescence spectroscopy and four chemicals (i.e., Val-Tyr-Val, Trp-Gly, Phe, Malto) show up in LC-MS.
NMR: NMR spectra of the samples were recorded on a Bruker DRX 500 spectrometer (Bruker Biospin Gmbh, Rheinstetten, Germany) operating at a proton frequency of 500.13 MHz. For each spectrum, 32768 complex points were acquired in 64 scans with a recycle delay of 2 seconds at a nominal temperature of 298 K. The spectrometer was equipped with a 5 mm BBI probe and spectra were recorded using the Oneshot45 sequence with 8 gradient levels ranging from 3.4 to 26.9 G cm-1 with equal steps in gradient squared in nominal gradient amplitude. The diffusion time was 100 ms and the gradient encoding time was 1 ms. All processing of the data, including phase correction, apodization, Fourier transformation, baseline correction, referencing to TSP signal, and reference deconvolution, was performed using the DOSY Toolbox. In order to correct for instrument instabilities, reference deconvolution was performed using the TSP methyl signal as a reference, using a target lineshape of 4.5 Hz. NMR measurements were then arranged as a third-order tensor with modes: mixtures, chemical shift and gradient levels.
LC-MS: Prior to LC-MS measurements, 29 samples were diluted to 10 ppm in water and subsequently analyzed with ultra-performance liquid chromatography (UPLC) system coupled to quadruple time-of-flight (Premier QTOF) mass spectrometer (Waters Corporation, Manchester, UK). Each sample (10 µL) was injected into the UPLC equipped with a 1.7µm C18 BEH column (Waters) operated with a 6-min linear gradient from 0.1% formic acid in water to 0.1% formic acid in 20% acetone: 80% acetonitrile. The data were acquired on positive electrospray ionization (ESI) mode with the following settings: capillary probe voltage was set to 2.8 keV, desolvation gas temperature was at 400ºC, cone voltage was 40 V, with the Ar collision gas energy of 10 V. The centroided raw data were converted to an intermediate netCDF format with the DataBridgeTM utility provided with the MassLynx software. Automatic peak detection and integration were performed using the XCMS package (C. A. Smith et al., Xcms: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical Chemistry 78, 779-787, 2006). Since individual chemical compounds give rise to more than one fragment ion upon ionization, these ion-features, generated by XCMS, were grouped together using the CAMERA package (C. Kuhl et al., Camera: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Analytical Chemistry 84, 283-289, 2012). The isotopes with the lowest m/z were kept. The final LC-MS data set is in form of a mixtures by features matrix.
EEM: Fluorescence Spectroscopy data were acquired on a FS920 spectrophotometer (Edinburgh Instruments, Edinburgh, Scotland). Excitation Emission Matrix data were measured with excitation wavelengths from 210 nm to 310 nm with an increment of 5 nm for each scan, and emission wavelengths from 250 nm to 500 nm with an increment of 1 nm. In order to avoid 1st order Rayleigh scatter, emission data were not acquired in an interval of 10 nm from the excitation wavelength. Excitation and emission monochromator slit widths were set at 5 nm and integration time was 0.1 s/nm. Every sample day a blank sample was measured and subtracted from the samples to eliminate interference from the water Raman signal. Due to the very different quantum yields for the three aromatic amino acids we adjusted the concentrations for the fluorescence experiment. Val-Tyr-Val was diluted 70 times, Trp-Gly was diluted 1000 times, and Phe was doubled in concentration.
Acknowledgements: We thank Mathias Nilsson for the NMR data, Gozde Gurdeniz for the LC-MS data, and Anders J. Lawaetz for EEM measurements. We also would like to acknowledge Daniela Rago for her help with the LC-MS measurements, and Parvaneh Ebrahimi and Abdelrhani Mourib for their help in sample preparation.
Availability: DOWNLOAD (Matlab file, 22.4 MB)
The measurements for all mixtures (except mixture 27) have been released. Mixture 27 was omitted due to an experimental error.
If you have used the data, please cite the following paper:
- E. Acar, E. E. Papalexakis, G. Gurdeniz, M. A. Rasmussen, A. J. Lawaetz, M. Nilsson, R. Bro, Structure-Revealing Data Fusion, BMC Bioinformatics, 15: 239, 2014.