Setting the sceneOver the last 25 years, during my bachelor's degree, PhD, Post Doc, and now as director of R&D at Mestrelab, I have had the opportunity to interact with many organic chemists. Most of them, although with their own singularities, share relatively similar procedures and workflows, with their strengths and weaknesses. I have witnessed many advances in the way they conduct their research, but I also must say that there are some areas of it that remain firmly rooted in the past.
An example of the latter which I’m still seeing in many labs is the issue of data loss: In the particular case of academia, research teams are typically made up of (pre)doctoral or postdoctoral students whose residence time is usually between 3 and 8 years, roughly speaking.
During that period, they produce an enormous amount of spectroscopic data (NMR, GC/LC/MS, UV/IR, etc.) to characterize their molecules. Whilst some groups have some sophisticated IT infrastructures equipped with either in-house or third party DBs (including Mnova DB for analytical data), I think it is not unreasonable to say that most of them save their spectroscopy data on their personal computers (e. g. laptops) or in shared folders of their research group (e. g. Dropbox). Data leakage is the result as students leave.
If you're a principal investigator, I'm sure you've found yourself in the following situation: one of your students synthesized a compound some time ago. However, for some reason, you are now considering the possibility that the proposed structure may not be the right one. Obviously, to review this structure, you need to have access to the original spectroscopic data, but unfortunately, the student is no longer part of your research group and you have no way of locating the NMR spectra.
In the same plot line, some students only keep the spectroscopic data of the products that they have successfully synthesized but discard the data of those reactions that did not work in the way they had planned.
These are just two examples of what I consider to be a more general problem associated with the difficulty of efficiently managing analytical information in an organic chemistry laboratory.
Nowadays, many labs are moving from paper-based to electronic laboratory notebooks (ELNs) that offer significant benefits for long-term storage. However, most of them lack the capability to understand and handle spectroscopy data in an integrated manner. Some of them are just repository of PDFs of analytical data generated by some specialized software. This is, in my opinion, a very limited, unproductive and inefficient solution to the extent that data generated in this form has been dubbed as “dead data” where all the valuable spectroscopy information has been removed, reducing it to a series of unstructured set of images and text strings. As it is stored today, analytical data is virtually unusable and tasks like the ones listed below are simply impossible to perform:
- NMR data could have been processed incorrectly making a comprehensive analysis of the data unfeasible.
- Only some parts of the spectrum could have been reported or the resolution is too low to characterize a compound unambiguously. For instance, accurate determination of coupling constants, inspection of possible impurities or side products in a reaction would not be possible.
- Spectroscopic data search: Do I have any spectrum that contains a triplet at 3.5 ppm? This is a question that could not be answered with dead data.
- Do I have any spectrum similar to this one?
Some ELNs, in addition to PDF or plain images, also store raw data but do not offer a solution with real spectroscopy intelligence capabilities within a searchable and homogeneous environment.
Mbook 2.0: A spectroscopy-aware ELN
This file can be accessed and viewed directly from within Mbook with a new spectral viewer which provides basic navigation tools such as zoom-in and out.
At this present time Mbook 2.0 does not include spectral search capabilities, but we expect to offer this feature shortly once the integration of Mbook with Mnova DB is completed