N. D. Zelinsky Institute of Organic Chemistry, Russian Academy of Sciences, Moscow, Russia
KEYWORDS: CSDB, carbohydrate, database, bacterial, plant, fungal
Glycoconjugate Journal, 2015, v.32(5), p.241-242
The Carbohydrate Structure Database (CSDB) has been designed for accessing published data on glycans and glycoconjugates of prokaryotic, plant, fungal and protista origin. It stores primary glycan structures, detailed bibliography, taxonomy (up to strains), assignments of NMR spectra, analytical methods, cross-references, and other information (medical, biosynthetic, etc.) if available. In 2015, two parallel databases, Bacterial (BCSDB) and Plant and Fungal (PFCSDB) ones, have been merged to CSDB. Main features of CSDB are: 1) coverage on prokaryotic glycans close to complete; 2) high data quality achieved by automated and manual expert verification; 3) manually verified bibliographic, taxonomic, and NMR spectroscopic annotations; 4) automated data exchange with other databases using dedicated formats and GlycoRDF ontology within Resource Description Framework; 5) free access via the Internet (http://csdb.glycoscience.ru/).
As of 2015, CSDB contains ca. 10900 bacterial glycan structures from ca. 5900 organisms published in ca. 4400 articles and ca. 5300 plant and fungal glycan structures from ca. 1400 organisms published in ca. 1800 articles. The coverage is above 80% for bacteria and archaea glycans published up to now; ca. 1000 new records are added annually. For plant and fungi, the coverage is ca. 40% and is expanding rapidly. Most data are derived from manual curation based on retrospective literature analysis. Many structures published before 1996 were imported from CarbBank, appended with missing data and verified manually against the original publications with error correction.
Merging the two databases simplified cross-project interactions and improved coverage-dependent services built on the CSDB platform: empirical and database-driven 13C and 1H NMR spectrum simulation, NMR-based structure ranking, taxon coverage and fragment distribution analysis tools, and clustering of taxa based on similarities in their glycans.