The CSD Python API – Helping the world’s structural chemists innovate

Stewart Adcock | Saturday 15:00 | Room B

The Cambridge Structural Database (CSD) is the world’s repository for small molecule organic and metal-organic crystal structures, a valuable resource for structural chemistry research and education. Exploration and application of knowledge derived from these crystal structures is enabled by a widely-used suite of software tools.

This software consists of more than 4 million lines of C++, plus substantial amounts of code in other languages, developed over 25 years. Sophisticated GUIs allow scientists to perform very specific tasks effectively, but limit innovation beyond these pre-defined tasks. Furthermore, the learning curve for this technology is significant. A consequence is that valuable data, and the scientific tools built on that data, is inaccessible to many of the scientists who would benefit most.

To provide more flexible access to the CSD and the C++ toolkit upon which the associated software tools are built, we developed the CSD Python API. It offers elegant, and simple to use, Python modules for programmatically accessing both the CSD data and scientific functionality. Python is a widely adopted programming language in the scientific community that, despite its powerful features, is reasonably easy to learn and use by non-programmers. It also facilitates integration with other widely used scientific tools.

This case study will explain how we built the CSD Python API, and some of the lessons we learnt along the way. It will describe how the CSD Python API is revolutionising state-of-the-art structural chemistry research, and has opened scientific and development opportunities that were previously unavailable. Come to this presentation to learn how a large and daunting code-base can be tamed by layering on a friendly API.