I have blogged quite a few times about qNMR where I tried to cover some very basic concepts (here and here) and tricks on how to integrate overlapped multiplets (here). In my last post I announced the release of a new qNMR module for Mnova aimed at automating the quantitative analysis of NMR spectra in an efficient and robust way. Now I’m glad to write that a paper describing this functionality has now just been published in Analytical Chemistry: Optimization and Automation of Quantitative NMR Data Extraction
BTW, Mnova qNMR is free for academia or non-profit organizations.
It is well known that NMR is a very convenient technique for quantification, provided the amount of the material is within the limits of detection in NMR.
When it comes to the actual calculations, this is a very straightforward process that does not require any fancy mathematics. Typically you will select the most convenient signal(s) or multiplet(s) in your spectrum and calculate the integral which can then be mapped to the corresponding concentration units by using a scaling factor that was previously calculated using some internal or external references.
It is however a laborious process. You have to make sure that the signal or multiplet you have selected is isolated enough to avoid contaminations from other signals (solvents, impurities, other compound resonances, etc). Furthermore, you need to know the number of nuclides (i.e. protons) of the integrated signal / multiplet. Again, this is not rocket science but it is particularly labor-intensive. It could take perfectly a few minutes for one single data set.
Now suppose that you have to do this with a library of several hundred or thousands of compounds with their corresponding NMR spectra. Clearly a manual analysis is completely impractical.
There are several approaches out there that can be used to automate this process in some way or another, but to the best of my knowledge they are not optimized to take into account a number of factors that could adversely affect the accuracy and precision of the quantification calculations. Features such as spectral quality variability from sample to sample, different degrees of peak overlap, amongst others, are very important in order to have a robust system.
That is precisely one of the main objectives that we’ve pursued with the development of a new qNMR module for Mnova that we have just released, the ability to process any number of data sets in the most robust way possible.
It has been designed in such a way that it can detect those multiplets in the spectrum that gives the best results whilst discarding those problematic ones (for example, multiplets showing overlapping problems, or having impurities or artifacts). It also calculates the number of nuclides for each of those multiplets, etc.
Bottom line is that this new module is aimed at streamlining qNMR analyses and reporting whilst eliminating tedious and repetitious manual steps.
We believe that it is a useful tool not only in the traditional exploitation of qNMR where the final result is required to the highest level of precision, and the utmost care must be taken in collecting the NMR data, but also in the world of high-throughput NMR where this very careful data acquisition is often not possible. Whilst the first thought may be that qNMR therefore is not possible, we know from collaborative studies that quite reasonable and useful quantitative data can be obtained in this scenario. The error expectation obviously has to be lowered and actual numbers will depend on factors such as the rate of data acquisition, SNR, and impurity levels, but concentration determination to within 5-10% of the actual value is not an unrealistic expectation.
This new module can be tried for free, just go here.
Figure: Photo taken during our bike climbing to Galibier (Lautaret) in the Tour de France 2011
One of the most exciting and complex challenges in NMR software for small molecules at present is the ability to verify a proposed chemical structure from its NMR spectrum automatically, a process commonly known as Automatic Structure Verification (ASV). Nowadays, it is possible to acquire spectra automatically on large numbers of compounds, but the interpretation of all of this data constitutes a key bottleneck . As John Hollerton wrote in Stan’s blog
So now I come to the purpose of ASV. I don't know of many (any) companies employing people to look at 50 spectra a day (except for specific, one-off projects)
We at Mestrelab have been working for several years already to provide the most powerful and usable ASV software package.
It was not an easy job, and in some ways, resembles the stages of the Tour de France that Santi, some friends and I made a few days ago. We had to suffer, curve by curve, ramp by ramp to reach the top of the mythical Cols of the Alps (Galibier, Alpe D'Huez, Croix de Fer, Telegraph, Les 2 Alpes, etc), but we've made it :-)
Similarly, the road to ASV has also been very steep and tough, but we think now that we have a truly sophisticated and successful system that delivers very good results.
Of course, there's always room for improvement (as with the Alps, which we could climb much faster if we had been riding on road bikes instead of BTTs:-)), but either way, we are very satisfied with the current state of our ASV.
From here I would like to take this opportunity to thank Stan Sykora for his superb work
References:  S.A. Richards, J.C. Hollerton, Essential Practical NMR for Organic Chemistry, John Wiley & Sons, Ltd, 2011.
Whilst this new version presents a number of significant improvements in the software, in this post I would like to focus on a new peak picking concept which, to the best of my knowledge, is novel and in some way, revolutionary. I will try to keep this as short and clear as possible, just to illustrate the very basic ideas that motivated this new approach to peak picking. In the next posts I will elaborate further on some of the new points introduced here. Traditional Peak Picking
First up, to lay the groundwork for this article, let’s revisit the way in which peak picking algorithms usually work in all NMR software packages, including the former versions of Mnova: Essentially, in this procedure, all peaks maxima (and/or minima) in a spectrum are determined and their values are either stored in a tabular form (i.e. in a peak table) or displayed graphically over the spectrum (Figure 1).
Figure 1: Illustration of traditional peak picking
Each NMR application might offer different levels of sophistication in the peak picking algorithms. For example, some can resolve overlapped peaks better than others (http://nmr-analysis.blogspot.com/2009/06/fighting-against-peak-overlap.html ), others operate more efficiently with spectra with low SNR, etc.), but the important element I would like to highlight here is the fact that the output of any peak picking algorithm is just a plain list of significant points in a spectrum.
So far, so good. The question is: What is the purpose of applying peak picking? Well, there is no definitive answer and it depends upon the particular application. Nonetheless, if we restrict the context to that of the simple analysis of small molecules for their characterization, peak picking is usually applied to the calculation of coupling constants and chemical shifts, (in other words determination of the spin system(s) in the spectrum). Whenever practicable, this should be done as automatically as possible by the NMR Software.
Historically, this automatic analysis was based on Quantum Mechanical (QM) methods . They are certainly the most rigorous yet complex methods, although interestingly, the most popular approaches even more than 40 years ago [2-4] when computer power was much more limited than today.
Another approach, which incidentally has attracted significant interest in recent years, despite being computationally much less demanding, is based on the same technique typically used by organic chemists, i.e. the use of the popular first-order analysis rules [5-7]. Of course, this approach is only valid in weakly coupled systems and, thus, has a more limited scope when compared to QM methods, but is useful for a rapid spectrum analysis.
In any event, regardless of the method employed, the main obstacle to achieving a successful automatic analysis of 1H NMR spectra with minimal user intervention lies in the fact that NMR spectra do not consist only of peaks arising from the transitions of the studied spin system, but also many others, such as, solvent and impurity peaks, spinning sidebands, reference peaks (e.g. TMS), satellite peaks, etc. Continuing the previous example, if the vertical intensities are expanded a little bit more, we can appreciate (see Figure 2) that peak picking finds not only the main transitions, but also many other peaks like small impurities, 13C satellites, etc.
Figure 2: Expanded view of Figure 1. It can be seen that small peaks, including 13C satellites and impurities are detected, but they lack any kind of classification.
This is certainly good as those peaks are real and it is important to detect them. However, the problem is that, in general, those peaks are not labeled or marked according to their type (compound, solvent, 13C satellite, impurity, etc): They are just peaks, they do not have a semantic context and do not present any further characterization. This has some important consequences when automatic analysis is required. For example, solvent peaks should be marked appropriately so that they are not used during the actual analysis. Same applies to impurities, 13C peaks, etc.
Intelligent Peak Picking
By Intelligent I mean a Peak Picking algorithm equipped with the ability to classify and mark a peak according to their type. Of course, the identification of any obvious impurity or solvents is a task an experienced chemist is very familiar with and can do very efficiently, and for this reason, it is very important that any software provides the user the ability to manually classify a peak. Of course, automation is very important, but this kind of analysis is extremely difficult for a computer program. Impurity or solvent peaks can overlap with compound resonances, making some simple strategies based on the definition of ‘solvent’ regions ineffective.
Just to give you a first impression on how an Intelligent Peak Picking looks like in Mnova, take a look at Figure 3:
This is the result of a fully automatic Peak Picking. Some points are worth noting:
As you can see, now the peak labels have different colors. They are color coded according to their classification (i.e. compound peak, impurity, solvent, etc) as per the legend shown in the figure. Once again, this classification has been done fully automatically by Mnova, but manual intervention is always possible.
Typically, peak picking algorithms use to include the so-called threshold parameter used to filter out small peaks in a spectrum. This is not very convenient, for many reasons, including:
a. It is very difficult to find a threshold parameter that works well under all spectral conditions. Even though this parameter can be used relatively to the noise level in a spectrum, very often it has to be tuned manually in order to get good results.
b. As this threshold parameter usually works globally across the spectrum, if it is set too high to filter small noisy or impurity peaks, we run into the risk of losing small compound peaks, for example, the small peaks in both sides of a heptaplet.
c. It is also very sensitive to baseline imperfections.
All the drawbacks outlined above are strongly alleviated (if not fully resolved) by the intelligent peak picking included in Mnova 7.0. Why and how will be the subject of my future posts.
Unlike traditional peak picking algorithms, the intelligent version presented for the first time in Mnova 7.0 adds an extra dimension: every peak is automatically classified according to different descriptors, ranging from peak compound, impurities, 13C satellites, solvent, etc. The automation of this classification is possible thanks to a fuzzy logic expert system developed in Mnova and which I will describe shortly in future posts.
In my opinion, this peaks classification opens new avenues in the automatic analysis of 1H NMR spectra of small molecules. For example, multiplet analysis using first order rules is much more efficient, especially in cases of sever signal overlap or multiplets contaminated with solvent peaks. For example, Figure 4 shows the result of analyzing the spectrum of Santonin fully automatically (i.e. with one button click).
In my upcoming posts I will describe in more detail this new approach to NMR peak picking and 1H NMR analysis, including automatic solvent detection, multiplet analysis and automatic determination of the number of protons in a spectrum. Stay tuned!
P. Diehl, S. Sykora and J. Vogt, J. Magn. Reson. 19, 67 (1975).
J.D. Swalen and C.A. Reilly, J. Chem. Phys. 37, 21 (1962).
S. Castellano and A.A. Bothner-By, J. Chem. Phys. 41, 3863 (1964).
D.S. Stephenson and G. Binsch, J. Magn. Reson. 37, 395 (1980).
T.R. Hoye and H. Zhao, J. Org. Chem. 67, 4014–4016 (2002).
C. Cobas, V. Constantino-Castillo, M. Martín-Pastor and F. del Río-Portilla, Magn. Reson. Chem. 43, 843–848 (2005).
S. Bourg, J.-M. Nuzillard, J. Chim. Phys. 95, 18 (1998).
I’m sure that many of you are aware of the infamous controversy over the total synthesis of Hexacyclinol . There are a plethora of arguments, from the purely synthetic chemical point of view to the spectroscopist perspective, which both put in doubt the veracity of aforementioned total synthesis. You can find, at the end of this post, a list of references that may be of interest on this subject. In this particular post, I would like to comment on several interesting aspects from an NMR standpoint.
In the original article’s “supporting information”, one can find the spectra of Hexacyclinol and derived compounds. I believe that the key to the controversy here lies in the fact that these spectra are found solely as plain images, that is, all the relevant spectral information that could have provided more conclusive proof over the authenticity or not of the total synthesis of this compound is lost.
Let’s consider for example, the image of the Hexacyclinol spectrum:
I have cut the central vertical area of the image in order to visualize the highest intensities as well as the smallest ones together. The objective is to simply appreciate with clarity the level of 13C satellites intensities in the case of the CHCI3 signal. The level is indicated by the red horizontal line which would correspond, approximately to an intensity of 32, that is, a ~0.55% of the intensity of the CHCI3 signal.
At first glance, one cannot observe such signals from the 13C satellite. Is this a conclusive reason to assume that this spectrum is really a fake, that is, a spectrum that was created synthetically? Of course, I’d say it is not conclusive but it could be an indication.
In principal, I would say that the SNR of this spectrum seems good enough for the 13C signals to be observed; not only in the CHCI3 area but rather that they cannot be observed in any part of the spectrum. I can only come up with two reasons that justify the absence of those signals:
On the one hand, the author could have acquired the spectrum using a pulse sequence that removes the 13C signals (e.g. 13C GARP broadband decoupling). I’d say this is highly improbable given that, if this was so, it would seem reasonable that the author would have specified in the article the use of such decoupling technique. In any case, if the complete “raw data” acquired were available, then the pulse sequence could be analyzed. Of course, this would not be a conclusive proof either since the pulse sequence is generally found in a text file which could be manipulated easily by any editor (unless it is digitally signed).
On the other hand, the fact that the 13C signals are not observable may be due to the acquisition settings, more specifically gain settings in the case of very strong spectra. The problem lies in when there is not enough noise to fill at least ~6 ADC steps, you start rapidly losing small peaks and, of course, the noise starts looking weird, kind of binary, which can be actually perceived in this case. In any case, I don’t think this would really justify the absence of the 13C signals. If this is what happened during the acquisition, it would never be so perfectly void of satellites or have a perfect baseline and phase, according to my opinion.
Therefore I insist, if the raw data were available, more effective analysis could be carried out. For example, one could analyze the line widths of isolated signals. If all of them were the same, this would put us on that track that the spectrum was synthesized given that in real life, this is something very difficult to occur.
In summary, it is very difficult to reach conclusive results working with plain images. Once the spectra have been acquired, transforming them into images only result in an irreversible loss of important information. I believe that publishers should oblige authors to submit spectra/original FID’s (including all the files with additional metadata) to avoid any type of loss of relevant information.
Crossing over of peaks is a very common event in Reaction Monitoring (RM) experiments. When this happens, the automatic alignment algorithm discussed in previous posts (here and here) might not work properly. To illustrate this issue, as I did not have a real experiment at hand, I simulated using Mnova a very simple data set comprised by a triplet and a singlet in such a way that the chemical shift of the triplet moves from 1.4 ppm to 0.7 ppm and having an exponential decay from spectrum to spectrum. This is depicted in the figure below, both as a stacked and a bitmap plot.
Now let’s say you are interested in extracting the intensities of the triplet as the reaction progresses. There is actually no need to pre-align the spectra algorithmically; it is much simpler to have some kind of graphical tool to instruct the software which peaks (or multiplets) need to be used for the reaction monitoring analysis. Let me show you how this works in Mnova:
First of all, in the Data Analysis module you select the region to be analyzed. As a starting point, the region will have a rectangular shape (green rectangle in the figure below):
It can be noted that the graph shows an exponential decay, but the actual values must obviously be wrong as the values calculated, using the green rectangle as a boundary for the integration, include peaks from both the triplet and singlet, and we are interested in the analysis of the triplet resonances only. Now let’s change this…
The selection rectangle has a number of handlers (small green boxes). You can drag and move them freely so that you can adjust the selection feature to follow the triplet (BTW, the number of handlers can be adjusted. In this case, there are 6 handlers, but higher numbers are also permitted). In the figure below, the result of adjusting the handlers to follow the triplet is shown:
Now you can see that there is an outlier in the exponential curve which, obviously is caused by the singlet which overlaps with the triplet (spectrum number 6 which corresponds to data point #5, as in the graph the numbering starts from zero). Figure below shows that particular spectrum showing the singlet overlapping with the triplet:
At this stage, there are several approaches. The simplest one is to just discard that point for the analysis, for example, by right clicking on that point in the graph and disabling it:
As soon as that point is deleted / disabled, Mnova will update the graph automatically. This is the new result:
Another approach would involve using GSD to eliminate the singlet from the triplet so that it would not be necessary to discard the information from that particular spectrum. However, this is something I will blog about in a future post.
Following the progression of chemical reactions by NMR is becoming more and more popular. Quoting Michael A. Bernstein et al. (Magn. Reson. Chem. 2007; 45: 564–571)
(…)The technique is rich in structural information, and can uniquely provide subtle information on speciation, protonation sites, and intermediate compound production. NMR measurements can be made under quantitative conditions, and one can be confident that all organic species will be observed. These factors combine to make NMR a very attractive tool for these analyses, and address many of the shortcomings in traditional spectroscopic measurements (…)
Typically, as a reaction proceeds, it’s very common to observe very significant chemical shift fluctuations of a given resonance due to, for example, changes in pH or protonation of the starting material, just to mention a few. These changes in chemical shift can be so large that extracting relevant information from those spectra (e.g. intensities/integrals across the data set) can be difficult, so aligning those spectra can be helpful. Let me illustrate this with an example exhibiting clear nonlinear misalignments: peaks at about ca 11.6 ppm do not move whilst the peaks at higher field move very significantly:
Instead of displaying the data set as a stacked plot as above, it might be more convenient to display it as an intensity or bitmap plot because this plotting mode highlights more clearly the alignment /misalignment profiles:
It’s evident that correcting the data using a single reference peak (or a global shift) is not sufficient. In order to align this data set, we can follow two different strategies:
Starting with raw spectrum (1), it is possible to perform a full-spectrum correction (global alignment) before the single intervals are aligned:
It can be appreciated that after applying the global alignment, most of the peaks in (2) are now properly aligned, except the peaks at the left which were previously aligned but after this operation get misaligned. This problem will be covered in the next step. After the spectra have been aligned ‘globally’, the user just needs to select the interval which comprises the peaks left to be aligned as depicted in (1). (2) shows the final result once both the global and local alignment have been applied:
A different, although analogous strategy, would consist in aligning two different spectral intervals separately without resorting to a global alignment as shown in (1) below. Note that the peaks in the interval in the left are already well aligned (so selecting this region is optional; if there were some minor misalignment, the algorithm would optimize such residual misalignment).
(2) shows the final result after the two intervals have been aligned. It’s completely equivalent to the result obtained with Strategy 1
Conclusion In this post I showed how the automatic alignment algorithm can be used to align RM data sets prior to any further analysis. However, there is a better way to extract NMR descriptors from Reaction Monitoring experiments that does not require any prior pre-processing alignment. In fact, I believe that this method, which I will present in my next post, has several advantages (in the context of reaction monitoring), especially in those cases where the chemical shift ordering of some peaks changes during the reaction, situations in which automatic alignment algorithms usually have great trouble dealing with. An example of a reaction monitoring data set showing peaks crossing over is shown below, in bitmap mode (it’s a simulated data set)
Therefore in my next post, I will show how to analyze RM data sets with important peak fluctuations and crossing over
As I mentioned in my previous post, simple alignment based on shifting or referencing the whole spectrum is not enough in cases where there are different local chemical shift fluctuations. Resorting back to the synthetic data set used in the previous posts, let me introduce a semi-automatic method designed specifically to align spectra having local chemical shift variations. From a practical point of view, the User needs to select the spectral regions to be aligned and then the program will automatically align those regions separately by using the same technique showed in my last post, that is, maximization of the cross-correlation function. The picture below shows the spectrum before alignment and the two selected regions (top) and the result obtained after applying the alignment algorithm (bottom).
Before going into the details of the automatic alignment algorithm, there is a point I think is worth mentioning: when you have several spectra to be aligned, it is necessary to specify the spectrum which will act as the reference (alignment target). Our implementation provides the capability to use as a reference any spectrum in the data set or the average spectrum.
Automatic Alignment: What is under the hood
Assuming that the spectral segments to be aligned are represented by two vectors g and h, a new vector f can then be generated by cross-correlation:
where * indicates the complex conjugate. The cross-correlation implemented in Mnova is computed using the fast Fourier transform (FFT), which is a fast O(N log2[N]) process. Briefly, the strategy is to perform an FFT on each of the two vectors, invert the sign of the imaginary part of one Fourier domain representation of one of the vectors, multiply the two Fourier domain functions, and transform the result back using the inverse FFT. By simply calculating the index corresponding to the maximum of f(n) one can find the number of points in which vector g has to be shifted in order to get the highest cross-correlation with respect to h. This is not, of course, the first time that cross-correlation has been applied for the alignment of two (or more) vectors. Actually, it has been extensively used for alignment purposes in many different contexts, including:
Chromatography (Anal. Chem. 2005, 77, 5655-5661)
NMR (J. Magn. Reson. 2010, 202, 190-202)
DNA Sequence Alignment (J. Biomol. Tech. 2005, 16, 453–458)
The first article was the one that inspired me to include this method in Mnova and in fact, it was implemented several years ago as a method for the automatic alignment of 1D and 2D spectra (see this). Very recently, we have improved the traditional cross-correlation algorithm by working on the first derivative domain calculated using an improved Savtizky-Golay routine in which the order of the smoothing polynomial is automatically calculated. The idea is to minimize potential problems caused by baseline distortions or very broad peaks.
We have found this method to be very useful not only in the context of metabonomics, but also in the alignment of Reaction Monitoring data sets. However, I better leave this topic for my next post …
We have seen that binning helps in minimizing, for example, the effect of pH-induced fluctuations in chemical shift so that, in the field of NMR-based metabonomics studies, ensuring that signals for a given metabolite appear at the same location in all spectra. One evident disadvantage of binning is that it greatly reduces the spectral resolution (e.g. in a 500 MHz instrument, a typical 64 Kb NMR spectrum with SW = 12 ppm, would be reduced to 300 points (bins) if a bin width of 0.04 ppm [20 Hz = ~218 points] is used). This loss of resolution is not desirable and considering that today’s powerful computers can handle large data matrices, there is now an increasing tendency to perform multivariate analysis at the maximum spectral resolution possible. Alternatives to binning typically involve some form of peak alignment procedure and in this post I will cover the simplest one, global alignment. The purpose of this post is to simply illustrate the concept of alignment, but it is important to note that this method is not generally applicable to the misalignment problems found in metabonomics NMR data sets, although it might be useful in many other contexts.
The idea of global alignment is very simple and corresponds to the well-known chemical shift referencing method in which the user sets the internal reference peak (e.g. TMS, DSS, TSP, etc) of each spectrum to e.g. 0 ppm. In order to cope with small fluctuations in chemical shifts, this method seeks for the highest peak within a narrow (user-defined, auto-tuning option in Mnova) interval, as depicted in the figure below:
Clearly, this method will not work properly in those data sets with local misalignments, that is, when signals of one metabolite fluctuates in one direction whilst the peaks of a different metabolite move differently). As an example, let’s consider again the simulated data set of Taurine used in my previous post and which I copy below for convenience:
Remember that this data set has been generated by randomly changing the chemical shifts of the two CH2 groups. Now, let’s apply the global alignment procedure using as chemical shift reference at a value of 3.25 ppm as shown in the picture below: As expected, all peaks corresponding to the triplet at 3.25 get perfectly aligned, but the other multiplet remains misaligned (see below). One could devise an extension to this global alignment procedure in which the same procedure is applied to different segments of the spectrum. In this particular case, one could select two different windows, one for each triplet and apply the same algorithm locally to each segment. However, having to manually select the chemical shift reference for each segment is not very practical and, in addition, relying only on the simple search of the maximum peak within each segment is not a very robust method for automatic alignment. In my next post, I will present a much more powerful automatic alignment method in which the user will not need to define the reference chemical shift value for each segment / window, but before that, and as an introduction to that post, let me show you another global automatic alignment method. Let’s assume that we have several spectra which we want to align automatically in such a way that we first manually reference the chemical shift of one of these spectra (e.g. the first one in the series) and then ask the software (e.g. Mnova) to automatically align all the other spectra using this one as a reference spectrum. The idea for such algorithm is to figure out which is the optimal value that a spectrum has to be shifted (left or right) so that the difference between this spectrum and the reference one is minimal.
Such alternative ‘global method’ has been implemented in Mnova several years ago already and is based on the maximization of the cross-correlation between the reference spectrum and the spectrum/spectra to be aligned. This procedure is the essential foundation for the advanced alignment method which I will present in my next post.