Sunday, 30 January 2011

Alignment of NMR spectra – Part II: Binning / Bucketing

In my last post, I wrote that spectra of biological samples are usually poorly aligned due to wide changes in chemical shift arising from small variations in pH or other sample conditions such as ionic strength or temperature.
The most widely used method of addressing this chemical shift variability across spectra is by means of the so-called binning (or bucketing), procedure that consists in segmenting a spectrum into small areas (bins / buckets) and taking the area under the spectrum for each segment. Preferably, the size of the bins should be large enough so that a given peak remains in its bin despite small spectral shifts across the spectra, but not so large as to include peaks belonging to multiple compounds within a single bin.
As a simple example to illustrate how binning works, let’s consider the spectrum of Taurine (Fig. 1)

Fig. 1: 1H-NMR spectrum of Taurine synthesized with Mnova NMRPredict. Only the spectral region corresponding to the methylene protons is shown.

Taking the spectrum shown in Fig. 1 which has been predicted using Mnova NMRPredict, seven additional spectra were created by changing the chemical shift of the CH2 protons randomly in an effort to simulate the chemical shift variability observed in real life biofluid NMR spectra.

Fig. 2: Synthesized data set comprised by 8 simulated spectra of Taurine with random chemical shifts for the CH2 protons and displayed in superimposed mode in Mnova.

These spectra have been synthesized using 32768 data points and a spectral width of 6001.6 Hz with a spectrometer frequency of 500.13 MHz. If the size of each bin is set to 0.02 ppm (represented by the vertical grid lines in Fig. 2), this will result in the generation of 6001.6 / (0.02 x 500.13) = 600 bins.

When the binning command is issued in Mnova, a new spectrum with 600 data points in which every point is the sum of all the points within each bin is produced. The result of this binning or bucketing operation applied to one single spectrum of the synthetic Taurine data set is depicted in fig. 3, where the circles correspond to the area of each bucket in the original spectrum. Fig. 4 shows the result applied to all spectra in superimposed mode. Digital resolution of the resulting binned spectrum is 10 Hz/point

Fig. 3: Methylene region of one synthetic 1H-NMR spectrum of Taurine after data reduction by uniform binning

Fig. 4: Result of applying data reduction by uniform binning to the 8 1H-NMR spectra of Taurine

Once the spectra have been binned, they are ready to be exported in a convenient format (e.g. ASCII) for further statistical analysis (e.g. PCA).
It can be noticed that binning greatly minimizes the effects from variations in peak positions (in this case, all peaks get perfectly aligned). Additionally, binning reduces the data size for multivariate statistical analyses, although today’s computers and optimized linear algebra algorithms are able to handle large data volumes very efficiently.

The major drawback of this procedure is the loss of a considerable amount of information enclosed in the original spectra. In this particular case, the fine structure of the two triplets is totally lost (the coupling constant is 6.6 Hz whilst the digital resolution is 10 Hz), precluding the direct interpretation of multivariate models. In addition, peaks moving on borders between bins might cause artifacts. Another source of loss of information occurs, for example,when peaks belonging to several compouns are included within a single bin.

There exist several better alternatives to binning, typically involving some form of peak alignment without data reduction. But this will be the subject of my next post …

Thursday, 27 January 2011

Alignment of NMR spectra – The problem: Part I

The chemical shift is of great importance for NMR spectroscopy because it reflects the chemical environment of the nuclides under observation providing detailed information about the structure of a molecule.
Although the chemical shift of a nucleus in a molecule is generally assumed to be fairly stable, there are a number of experimental factors (pH, ionic strength, solvent, field inhomogeneity –bad shimming, temperature, etc) which might produce slight or even quite significant variations in chemical shifts.
This is particularly important in metabonomics/metabolomics where shifts of NMR peaks due to differences in pH and other physico-chemical interactions are quite common in NMR spectra of biological samples. For example, some important metabolites, such as citrate or taurine, have peaks whose chemical shifts fluctuate in an uncontrolled way from sample to sample. These variations can cause spurious grouping of samples in chemometric models.

Example of peak position variation in the citrate region (simulated data)

Whilst it is critical to setup the experimental conditions in the best way to minimize these chemical shift fluctuations (for example by using an appropriate buffer; BTW, there exists a standard protocol for biofluid [urine, serum/plasma] and tissue sample collection and preparation as described by Beckonert et al. [1]), spectral misalignments may still occur and special post-processing methods have to be employed.
Another example in which variation in the chemical shift is important occurs in the context of kinetics or reaction monitoring experiments by NMR. For example, consider the following reaction monitoring example [2]:

Reaction monitoring data set for the solution of phenylethylamine and 2-methoxyphenyl acetate in D2O, with every 35th spectrum from the first (bottom) to the last (top) shown (see [2])

It can be appreciated that during the course of the reaction, the chemical shifts of several signals change as a result of the change in pH (in this case, as a hydrolysis proceeds)
Although characterizing these chemical shifts fluctuations can be sometimes important (pH or drug binding-induced chemical shifts, for example) in general they obscure the process of pattern recognition (metabonomics) and impede the performance of data analysis (e.g. selection of the peaks whose intensities/heights need to be monitored becomes more difficult).

In my next posts, I will cover different ways to deal with the peak misalignment problem, first in the field of metabonomics and then in reaction monitoring.

[1] O. Beckonert, H.C. Keun, T.M. Ebbels, J. Bundy, E. Holmes, J.C. Lindon, J.K. Nicholson, Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts, Nat. Protoc. 2 (2007) 2692–2703

[2] M. Khajeh, M. A. Bernsteinb, G. A. Morrisa, Magn. Reson. Chem. 2010, 48, 516–522

Wednesday, 26 January 2011

Here I am again!

As you've undoubtedly noticed there has been little activity on my blog lately, which contrasts with the high activity I'm having in my real life, with a plethora of exciting (and challenging) new projects going on in my company, Mestrelab Research.
One particular area in which we have been working on quite intensively for the last few months belongs to the broad subject of the alignment of NMR spectra. This appears to be a very important topic for those scientists working, amongst others, on fields like metabonomics/metabolomics and reaction monitoring by NMR. Starting from today, I will start blogging about this issue, firstly covering some very basic concepts and then moving on to some more advanced techniques for the efficient alignment of NMR spectra.

So stay tuned!