Sunday, 30 January 2011

Alignment of NMR spectra – Part II: Binning / Bucketing

In my last post, I wrote that spectra of biological samples are usually poorly aligned due to wide changes in chemical shift arising from small variations in pH or other sample conditions such as ionic strength or temperature.
The most widely used method of addressing this chemical shift variability across spectra is by means of the so-called binning (or bucketing), procedure that consists in segmenting a spectrum into small areas (bins / buckets) and taking the area under the spectrum for each segment. Preferably, the size of the bins should be large enough so that a given peak remains in its bin despite small spectral shifts across the spectra, but not so large as to include peaks belonging to multiple compounds within a single bin.
As a simple example to illustrate how binning works, let’s consider the spectrum of Taurine (Fig. 1)

Fig. 1: 1H-NMR spectrum of Taurine synthesized with Mnova NMRPredict. Only the spectral region corresponding to the methylene protons is shown.

Taking the spectrum shown in Fig. 1 which has been predicted using Mnova NMRPredict, seven additional spectra were created by changing the chemical shift of the CH2 protons randomly in an effort to simulate the chemical shift variability observed in real life biofluid NMR spectra.

Fig. 2: Synthesized data set comprised by 8 simulated spectra of Taurine with random chemical shifts for the CH2 protons and displayed in superimposed mode in Mnova.

These spectra have been synthesized using 32768 data points and a spectral width of 6001.6 Hz with a spectrometer frequency of 500.13 MHz. If the size of each bin is set to 0.02 ppm (represented by the vertical grid lines in Fig. 2), this will result in the generation of 6001.6 / (0.02 x 500.13) = 600 bins.

When the binning command is issued in Mnova, a new spectrum with 600 data points in which every point is the sum of all the points within each bin is produced. The result of this binning or bucketing operation applied to one single spectrum of the synthetic Taurine data set is depicted in fig. 3, where the circles correspond to the area of each bucket in the original spectrum. Fig. 4 shows the result applied to all spectra in superimposed mode. Digital resolution of the resulting binned spectrum is 10 Hz/point

Fig. 3: Methylene region of one synthetic 1H-NMR spectrum of Taurine after data reduction by uniform binning

Fig. 4: Result of applying data reduction by uniform binning to the 8 1H-NMR spectra of Taurine

Once the spectra have been binned, they are ready to be exported in a convenient format (e.g. ASCII) for further statistical analysis (e.g. PCA).
It can be noticed that binning greatly minimizes the effects from variations in peak positions (in this case, all peaks get perfectly aligned). Additionally, binning reduces the data size for multivariate statistical analyses, although today’s computers and optimized linear algebra algorithms are able to handle large data volumes very efficiently.

The major drawback of this procedure is the loss of a considerable amount of information enclosed in the original spectra. In this particular case, the fine structure of the two triplets is totally lost (the coupling constant is 6.6 Hz whilst the digital resolution is 10 Hz), precluding the direct interpretation of multivariate models. In addition, peaks moving on borders between bins might cause artifacts. Another source of loss of information occurs, for example,when peaks belonging to several compouns are included within a single bin.

There exist several better alternatives to binning, typically involving some form of peak alignment without data reduction. But this will be the subject of my next post …

No comments: