Carbohydrates are one of the major constituents of the living cells. They provide mechanical stability of the cell wall and play important role in signal transduction, cell-cell recognition and immunological properties of microorganisms. The role of the provision of data on carbohydrates to the scientific community in biomedical and immunological research can hardly be overestimated. However, in contrast to other disciplines studying molecular basis of life, glycomics is lacking information-technology-based advantages. Universal integration standards and computer-assisted tools in glycomics are still in the making. Existing carbohydrate databases are focused on particular properties, utilize proprietary formats, do not provide complete coverage and most of them lack data quality.

Bacterial, Plant and Fungal Carbohydrate Structure Databases (CSDB) aims at closing this gap by its curated content and cross-database integration, thus bringing glycomics to the same level of integrity as exists in genomics and proteomics. CSDBs have been continuously developed and updated since 2005. Nowadays it provides the data on bacterial, plant and fungal carbohydrates with known primary structure. Currently it is the only free database with primary data on carbohydrate structures from these taxonomical domains published up to 2015.

Two key features of this project are coverage and data consistency. The database contains structures of ~19K carbohydrates and glycoconjugates (including glycoproteins and glycolipids) associated with ~8K microorganisms in ~7K publications. The coverage approaches nearly all prokaryotic structures reported up to 2015 and about one third of published plant and fungal structures. The average growth is ~600 structures per year.

CSDB stores structural, taxonomical, bibliographical, assigned NMR spectroscopic and other data (elucidation methods, publication abstracts, conformational, biochemical, and genetic data etc.) on carbohydrates with published structure. The source of data were import and manual re-annotation of other databases (incl. CarbBank), manual and semi-automated retrospective processing of publications, and user data submissions. All data have been checked for consistency by experts in carbohydrate biochemistry prior to the upload and corrected when necessary, which makes CSDB the only glycoinformatic project with fully moderated content. Comparison of the consistency of free carbohydrate databases showed the outstanding data intregrity in CSDB.

The CSDB interface includes the web-based user part, administrator part and gateways for automated data interchange with other databases. Currently it is cross-linked with NCBI PubMed, NCBI Taxonomy and GlycomeDB. Users can search the database by fragments of structure, bibliography, taxonomical annotations, fragments of NMR spectra, composition data, trivial names etc. The integration with numbered projects in glycomics has been achieved on the level of programming interface and by bulk data export in Resource Description Framework. The unambiguous but nevertheless human-readable carbohydrate structure description language has been developed for this project and translation tools to and from other known glycan representations are provided.

CSDB serves as a glycoinformatic platform and, except the database itself, hosts a number of services, such as:

The CSDB is freely available at

CSBD was developed within the framework of the International Science and Technology Center Partner Project, supported by the Cooperative Threat Reduction Program of the US Department of Defense (ISTC Partner). The further funding originated from Russian Foundation of Basic Research and Russian Federation President grant committee. My personal role in this project was general research and development, database ideology and architecture, data formats, structure encoding language, programming of engine and services, web-design, cross-database interfaces, coordination of literature annotation and database filling processes.

Selected publications:

