Thursday 9 January 2014

Chemometrics under Mnova 9 - PCA

(NoteThis entry has been written by Dr. Silvia Mari, from R4R who has helped us to design and implement this module)

Background: spectroscopy and chemometrics

“For many years, there was the prevailing view that if one needed fancy data analyses, then the experiment was not planned correctly, but now it is recognized that most systems are multivariate in nature and univariate approaches are unlikely to result in optimum solutions.”
      Hopke, P. K. (2003). The evolution of chemometrics. Analytica Chimica Acta, 500(1-2), 365–377]
  
Either we apply analytical chemistry for quality and control or we attempt to a more “system biology” approach for our R&D we do need advanced methods to design experiments, calibrate instruments, and analyze the resulting data. And the “emergence of chemometrics thinking came from the realization that traditional univariate statistics is not sufficient to describe and model chemical experiments”
       Geladi, P. (2003). Chemometrics in spectroscopy . Part 1 . Classical chemometrics, 58, 767–782


With this in mind Mnova 9 now offers to its users a module called PCA which could be found under the main menu “Advanced”. It is the result of our first efforts to include chemometric tools into Mnova and it is meant to give spectroscopist the possibility to interactively work on both stacked spectra and its corresponding statistical plots.

Starting from mid ‘70s where the first paper with chemometrics in the title appeared in 1975 [1], chemometrics has grown up and is now considered a functioning research area in the chemical science. It has expanded widely from its beginnings into a variety of other areas including multivariate calibration, pattern recognition, and mixture resolution and today there are several applications of interest for the NMR spectroscopists [2-5].


PCA module

Principal Component Analysis (PCA) is a procedure which uses orthogonal transformation to convert a set of observations from correlated variables into a set of values of linearly uncorrelated variables (named principal components) [6].

PCA module under Advanced menu is working in two subsequent steps: (1) matrix generation and (2) principal component analysis. The overall workflow can be represented with the following illustration, where general steps available in Mnova are highlighted in blue whilst specific functionalities of this new PCA module are highlighted in yellow.


With the aim to help the spectroscopist to refine and optimize the data matrix to be used for advanced analysis, PCA in Mnova makes it very easy the detection and removal of spectrum outliers, reveal problems in spectral alignment as well as in its phase or baseline. Once the user has properly corrected those regions of interest, the PCA module allows to re-run the analysis, either replacing the previous analysis or creating a new one for comparison.

Interaction with the stacked spectra.

The main effort applied during the design and development could be summarized in one word: SYNCHRONIZATION. PCA plots, PCA tables and stacked plot are always synchronized. By doing so selections of a point in the score plot imply a selection in the stacked plot. 


In the same way, a selection of a point in the loading plots (hence a selection of a variable of the matrix) generate a shadow into the stacked plot according to the bin position and size.



Colors and graphics

When dealing with large dataset, color coding plays a very important role and eventually essential. Even if PCA does not use class definition in its algorithm since it is an unsupervised method, the kind of patterns expected is generally known.
The driving concept here is that colors are assigned on the basis of class belonging. Again, as in the previous section, colors are always synchronized from PCA tables to PCA plots and to stacked spectra as well


Moreover, in the loading plot, the user is allowed to select more than one bin (see flag option in the loading plot table, or multiple selection of table entry using shift or ctrl  key). Visualization of a bin region is obtained with a colored box that is displayed superimposed over the stacked plot. The User can associate different colors to different bins regions





Data filtering and scaling


The results of the analysis depend on the types of filtering and scaling of the matrix that user selects, which therefore must be specified. It can be demonstrated how both factors greatly affect the outcome of the data analysis and thus the rank of the most important variables. PCA module includes several possibilities in terms of data cleaning and scaling.


There is not a general rule in the selection of the type of scaling. For that purposes we recommend the manuscript from van den Berg et. al. [7] which describes extensively how these transformations could improve the information content of the data matrix. Finally, bear in mind that visual inspection and assessment is ultimately one of the most important steps in chemometrics.

Conclusion

We have introduced in Mnova 9 a chemometric module called PCA (Principal Component Analysis). PCA have been shown to be very effective in compressing large volume of noisy correlated data into a subspace of much lower dimension than the original data set. Data pretreatment method is crucial to the outcome of the data analysis. The resulting low dimensional representation of the data set has been shown to be of great utility for analysis or monitoring the system under study, as well as in selecting variables for control or markers of the expected pattern.
The possibility to interactively play with PCA plots and spectra at the same time, and the user friendly interface provided by Mnova will be of great advantages also for spectroscopists that are not familiar with multivariate analysis but would like to learn more and test it.
As has always been for Mnova community, the future of this new first step in chemometrics will be driven by user requirements. For that reason we look forward to get feedback, criticisms, suggestions, comments and lots of requests for future development. So, play with it and have fun at looking at your own datasets from a different perspective!

References

[1] B.R. Kowalski, Chemometrics: views and propositions, J. Chem. Inf. Comp. Sci. 15 (1975) 201–203
[2] Chemometrics in bioreactor monitoring. Lourenço, N. D., Lopes, J. a, Almeida, C. F., Sarraguça, M. C., & Pinheiro, H. M. (2012). Bioreactor monitoring with spectroscopy and chemometrics: a review. Analytical and bioanalytical chemistry, 404(4), 1211–37. doi:10.1007/s00216-012-6073-9
[3] Metabonomics and chemometrics in food science and nutrition. Kuang, H., Li, Z., Peng, C., Liu, L., Xu, L., Zhu, Y., Wang, L., et al. (2012). Metabonomics approaches and the potential application in food safety evaluation. Critical reviews in food science and nutrition, 52(9), 761–74. doi:10.1080/10408398.2010.508345
[4] Pharmaco-metabonomic phenotyping and chemometrics. Robertson, D. G., Reily, M. D., & Baker, J. D. (2007). Metabonomics in Pharmaceutical Discovery and Development, 526–539.
[5] Metabonomics and chemometrics in drug safety and toxicology. Griffin, J. (2004). The potential of metabonomics in drug safety and toxicology. Drug Discovery Today Technologies, 1(3), 285–293. doi:10.1016/j.ddtec.2004.10.011
[6] Principal component analysis, Svante Wold, Kim Esbensen, Paul Geladi. Volume 2, Issues 1–3, August 1987, Pages 37–52
[7] Van den Berg, R. A., Hoefsloot, H. C. J., Westerhuis, J. A., Smilde, A. K., & van der Werf, M. J. (2006). Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC genomics, 7(1), 142. doi:10.1186/1471-2164-7-142


Tuesday 7 January 2014

Chemical Shift, Absolutely!


(Note: This entry has been written by Dr. Mike Bernstein - Thank you, Mike!)

It’s a “given” that for NMR the chemical shift must be reported relative to standard. The most widely used is the 1H signal of tetramethylsilane (TMS) in chloroform, which has an assigned value of exactly zero. This is convention, and we all adhere to it. Correctly referencing 1H NMR spectra is seldom a difficulty, whether we use co-dissolved TMS (or a water-soluble equivalent), or the residual proton signal from the deuterated solvent. Things can get more complex, but this works for the vast majority of us. The chemical shift, d, is defined thus:


Venturing to the “dark side” of NMR – nuclei other than 1H – seldom stretches beyond 13C for most, and a residual solvent signal is very often present that can be used as a secondary chemical shift reference. But beautiful possibilities tempt many of us. Whether you are interested in biomolecular NMR and live and breathe 15N and possibly 31P, or 2H, or an orgametallic chemist with an interest in far more exotic nuclei, each heteronucleus has its charms and challenges. One thing unites NMR of all nuclei: adherence to a convention for chemical shifts. This can be easier said than done, given that some reference materials are difficult to handle, expensive, etc. The chemical shift reference compound for 19F NMR is the banned substance, Freon-11.



Absolute chemical shifts

We get help from a group working under the IUPAC [1] guise for their work in helping us calculate the chemical shift scale for all NMR-active nuclei [2].  That is, they provide us with a standard way to get the chemical shift precisely correct for any and all heteronuclear NMR spectra. That’s amazing - and hugely useful!
So how does it work? Well, it’s quite simple, really. At the heart of the calculation is the absolute frequency of the 1H signal of TMS for your NMR spectrometer hardware (console, probe, etc.) and sample (solvent, temperature, etc.). You need to be able to determine the exact frequency of this reference signal to seven decimal figures, at least. The following equation applies (sometimes expressed as a percentage) and uses a ratio to describe a constant, (Greek capital Xi):



Making it easy with Mnova

We make heavy use of absolute referencing (AR) in Mnova, with the following available:

  • Correctly reference an X-nucleus spectrum when the referenced 1H spectrum is available
  • Apply AR to heteronuclear axes in 2D experiments
  • Allow users to customise the X  values
  • Indirect 1H spectrum referencing using nTMS  for a specific hardware and solvent (locked)


Referencing heteronuclear spectra

Ensure that you have a document having (a) a correctly referenced 1H NMR spectrum, and (b) one or more –nucleus spectra.
Select Analysis è Reference è Absolute reference… and choose the X-axis spectrum/spectra to reference.  



The table of X values

Note that by tapping on the “X values…” button you will be presented with the table of X-nuclei. In the case of 15N, for example, you can choose which reference standard you want to use. By clicking on the blue “+” button you can enter your own, customised value. 



Referencing 2D spectra

When there are 2D spectra in the document then the Absolute reference… selection will reflect this, and allow you to choose which spectrum is used for referencing purposes, and the traces to which this should be applied. Note that you can adjust the referencing of 1H and X-nuclei.



Referencing a 1H spectrum

You can use saved nTMS values to reference another 1H spectrum from the same NMR spectrometer. Start with a correctly-referenced 1H NMR spectrum, and select Analysis è Reference è Edit saved references…  From this dialogue you can add the value for the particular hardware and measurement conditions – solvent, temperature, etc. 

Now, when you select Analysis è Reference è Apply saved reference then the saved value will be used if the criteria are met. 



Conclusions

Absolute referencing is a powerful way to ensure that data are correctly referenced. This is equally important in open-access environments as it is under automation, where it helps processes such as Verify be more robust. 

References

[1] (a) Harris RK, Becker ED, Cabral de Menzes SM, Goodfellow R, Granger P. Pure Appl. Chem. 2001; 73: 1795
(b) Harris RK, Becker ED. J. Magn. Reson. 2003; 156: 323.





Monday 6 January 2014

Copy and Paste NMR spectra

This is just a short entry to illustrate one nice little tool in Mnova NMR that I believe can be very handy in many situations.
Let’s suppose you have an NMR spectrum in which you have spent some time trying to customize its visual aspect. For example, you have changed the default line color and width, hidden the vertical scale, modified the background color, customized the chemical shift scale, etc. As a result, you may have something like this:



   
Now you open a new spectrum and you find that it is using the old default graphical properties and you want that this spectrum has exactly the same visual aspect as the previous one. 
There are several ways in Mnova to achieve that goal. For example, you can go to the first spectrum, go to spectral properties and save these properties to a file which can then be loaded in the target spectrum. 
However, in this post I want to show a simple shortcut that yields the same result. The procedure is as simple as this:
  
1) Go to the first spectrum and press Ctrl+C (Edit / Copy)
2) Move to the second spectrum and issue this command: Edit / Paste Properties / NMR Graphic Properties 

This is it, the second spectrum will have now the same visual aspect as the first one. 
The same trick can be used to copy and paste integral regions, zoom & cuts regions, etc.

Sunday 5 January 2014

Reference Deconvolution

Introduction

In an idealized situation, according to quantum mechanics theory, NMR transitions in liquid state and excluding dynamic effects such as chemical exchange are of Lorentzian shape [1]. In practice NMR lineshapes are never pure Lorentzians due to a number of reasons [2], ranging from magnetic field inhomogeneity and magnetic field noise to sample temperature gradients, sample spinning  or FID weighting, to cite a few. 
Another important property that might affect significantly the final observed lineshape is the following: Even in molecules of modest size the number of distinct peaks might be thousands times smaller than that of quantum transitions. As a simple example, the number of transitions of a molecule containing 15 would be 245760 whereas only a few hundreds of peaks would be observed in the spectrum. As a result, an NMR peak is actually an envelope of a distribution of myriads of Lorentzians and its shape is dominated by the coupling pattern of the spin system. 
Whilst this kind of line broadening affects signals differently across the spectrum and is very difficult (or even impossible) to resolve by post-processing operations, there are many other distortions that affect all resonances in the spectrum in the same way. These include lineshape distortions caused by poorly shimmed samples and they can be removed by using a post-processing technique known as Reference Deconvolution [3] 

Reference Deconvolution

This technique is used to remove the instrumental lineshape distortion by deconvolving the experimental NMR spectrum using a reference signal, usually one within the same spectrum (which should be an isolated singlet) known to be subject to the identical lineshape distortions. Finally, once the lineshape distortion is removed, the spectrum can be reconvoluted with a known lineshape, typically a Lorentzian, so that the result will be a corrected spectrum in which the instrumental distortion has been replaced by the ideal lineshape. 
Actually, the concept of deconvolution is very simple: If S(f) is the experimental frequency domain spectrum, it can be decomposed into two main components, the ideal spectrum I(f) and the instrumental distortion D(f). In other words, the observed experimental spectrum is the result of a contaminated ideal spectrum. Mathematically, this contamination is expressed with the concept of convolution which is represented by the symbol *

S(f) = I(f) * D(f)

The goal here is to find the function D(f) so that the ideal spectrum I(f) can be recovered:


I(f) = S(f) [*]-1 D(f)

Where [*]-1 denotes deconvolution. That is to say that the ideal spectrum can be recovered by means of a deconvolution which consists basically in reversing the effects of the convolution.  

In practice, this process is more efficiently done in the time domain:

I(t) = S(t) / D(t)

This is possibly by considering the Convolution Theorem which states that point-wise multiplication in one domain (i.e. time domain) is equivalent to convolution in the other FT domain  (e.g. frequency domain). 

The complete process of Reference Deconvolution will be illustrated with an example using Mnova NMR and one 300MHz 1H-NMR spectrum in deuterated acetone (kindly provided by Gareth Morris) in which the homogeneity of the static field was deliberately perturbed. The spectrum corresponds to ODBC (ortho-dichlorobenzene) and has been folded several times in order to optimize digitization:



First, after issuing command Process/Reference deconvolution in Mnova, the User needs to select a well resolved reference signal in the frequency domain spectrum. In order to avoid numerical instabilities this signal should be a singlet. The reason is that if the reference signal has some multiplicity (i.e. a doublet), inverse FT of this reference signal (remember that Reference Deconvolution takes place in the time domain) might result in an FID with zeroes at regular intervals. As this time domain signal will be used in the denominator of the reference deconvolution function, this would result in severe discontinuities.  

In this particular example, as in many others, a convenient reference could be the TMS signal (0 ppm), which in principle should be a singlet (disregarding the 13C and 29Si satellites, more about this in a moment), but as it can be noticed, it shows lineshape errors and spinning sidebands due to a combination of poor shimming and sample spinning. 
In the figure below, the result of selecting the reference signal with Mnova is depicted. 



In the Reference Deconvolution dialog box, there are two check boxes, 29Si and 13C satellites and the explanation is this: The TMS reference signal comes with the presence of small 29Si and 13C satellites flanking the central peak at 3.3 and 59 Hz, respectively. Since the reference line is supposed to be representative of all signals in the spectrum, any fine structure which is unique to the reference should generally be removed before deconvolution is performed. The 13C satellites are not usually a concern as they are quite distant from the main TMS peak, but the 29Si satellites are more problematic owing to their close proximity to the central signal. 
So when, for example, the 29Si satellites option is ticket, the software will automatically synthesize the peaks corresponding to the 29Si satellites which in turn will be part of the reference FID model. 


 

Once the reference region is selected, the software calculates a reference FID, Sr(t), by zeroing all the spectrum except the selected region followed by an inverse FT (there are some additional processing steps required to avoid the negative effects of the long tails of the imaginary components – The interested reader is referred to the [2-3] for further details). 

Having calculated the reference FID, Sr(t), an ideal reference FID Si(t) can be computed by simply simulating an FID using the set of frequencies and amplitudes for the parent signal (e.g. TMS) and any attendant satellites, and decay rate which depends on the target lineshape, a value that can be specified in the reference deconvolution dialog box (in this example a value of 0.35 Hz has been used). With these two reference FIDs, it is possible to calculate the correction function c(t) by simply taking the (complex) ratio of both:

c(t) = Si(t) / Sr(t)

This correction function c(t) simply represents the (inverse of the) instrumental function responsible of the lineshape distortion. Multiplying the original experimental FID, Se(t), by this function yields a corrected FID, Sc(t) which may then Fourier Transformed to yield a corrected spectrum Fc having the lineshape of the specified ideal lineshape:

Sc(t) = c(t) x Se(t)
Fc = FT[Sc(t)]

Overall, the result of applying reference deconvolution to the ODBC spectrum used in this example is shown in the figure below. 



Conclusion 

Reference deconvolution is a powerful processing method to remove some distortions that affect all the peaks in a spectrum in the same way. In practice, this is done by extracting the distorted component from a reference signal and deconvolving the whole imperfect spectrum. 
Present implementation of the algorithm in Mnova 9.0 supports 1D spectra only thus far, but extensions to 2D spectra are planned. 

Biblography

[1] Why are spectral lines Lorentzian? http://www.ebyte.it/stan/blog10b.html#10May16 

[2] Metz, K. R., Lam, M. M., & Webb, A. G. (2000). Reference deconvolution: A simple and effective method for resolution enhancement in nuclear magnetic resonance spectroscopy. Concepts in Magnetic Resonance, 12(1), 21–42. (link

[3] Morris, G. A., Barjat, H., & Home, T. J. (1997). Reference deconvolution methods. Progress in Nuclear Magnetic Resonance Spectroscopy, 31(2), 197–257.