Monday 7 February 2011

Alignment of NMR spectra – Part IV: Advanced Alignment

Previous posts on this series:
  1. Alignment of NMR spectra – Part I: The problem
  2. Alignment of NMR spectra – Part II: Binning / Bucketing
  3. Alignment of NMR spectra – Part III: Global Alignment

As I mentioned in my previous post, simple alignment based on shifting or referencing the whole spectrum is not enough in cases where there are different local chemical shift fluctuations.
Resorting back to the synthetic data set used in the previous posts, let me introduce a semi-automatic method designed specifically to align spectra having local chemical shift variations. From a practical point of view, the User needs to select the spectral regions to be aligned and then the program will automatically align those regions separately by using the same technique showed in my last post, that is, maximization of the cross-correlation function. The picture below shows the spectrum before alignment and the two selected regions (top) and the result obtained after applying the alignment algorithm (bottom).




Before going into the details of the automatic alignment algorithm, there is a point I think is worth mentioning: when you have several spectra to be aligned, it is necessary to specify the spectrum which will act as the reference (alignment target). Our implementation provides the capability to use as a reference any spectrum in the data set or the average spectrum.

Automatic Alignment: What is under the hood

Assuming that the spectral segments to be aligned are represented by two vectors g and h, a new vector f can then be generated by cross-correlation:


where * indicates the complex conjugate.
The cross-correlation implemented in Mnova is computed using the fast Fourier transform (FFT), which is a fast O(N log2[N]) process. Briefly, the strategy is to perform an FFT on each of the two vectors, invert the sign of the imaginary part of one Fourier domain representation of one of the vectors, multiply the two Fourier domain functions, and transform the result back using the inverse FFT. By simply calculating the index corresponding to the maximum of f(n) one can find the number of points in which vector g has to be shifted in order to get the highest cross-correlation with respect to h.
This is not, of course, the first time that cross-correlation has been applied for the alignment of two (or more) vectors. Actually, it has been extensively used for alignment purposes in many different contexts, including:
  • Chromatography (Anal. Chem. 2005, 77, 5655-5661)
  • NMR (J. Magn. Reson. 2010, 202, 190-202)
  • DNA Sequence Alignment (J. Biomol. Tech. 2005, 16, 453–458)
The first article was the one that inspired me to include this method in Mnova and in fact, it was implemented several years ago as a method for the automatic alignment of 1D and 2D spectra (see this).
Very recently, we have improved the traditional cross-correlation algorithm by working on the first derivative domain calculated using an improved Savtizky-Golay routine in which the order of the smoothing polynomial is automatically calculated. The idea is to minimize potential problems caused by baseline distortions or very broad peaks.

We have found this method to be very useful not only in the context of metabonomics, but also in the alignment of Reaction Monitoring data sets. However, I better leave this topic for my next post …

No comments: