Carbohydrates are one of the major constituents of the living cells. They provide mechanical stability of the cell wall and play important role in signal transduction, cell-cell recognition and immunological properties of microorganisms. The role of the provision of data on carbohydrates to the scientific community in biomedical and immunological research can hardly be overestimated. However, in contrast to other disciplines studying molecular basis of life, glycomics is lacking information-technology-based advantages. Universal integration standards and computer-assisted tools in glycomics are still in the making. Existing carbohydrate databases are focused on particular properties, utilize proprietary formats, do not provide complete coverage and most of them lack data quality.
Bacterial, Plant and Fungal Carbohydrate Structure Databases (CSDB) aims at closing this gap by its curated content and cross-database integration, thus bringing glycomics to the same level of integrity as exists in genomics and proteomics. CSDBs have been continuously developed and updated since 2005. Nowadays it provides the data on bacterial, plant and fungal carbohydrates with known primary structure. Currently it is the only free database with primary data on carbohydrate structures from these taxonomical domains published up to 2015.
Two key features of this project are coverage and data consistency. The database contains structures of ~19K carbohydrates and glycoconjugates (including glycoproteins and glycolipids) associated with ~8K microorganisms in ~7K publications. The coverage approaches nearly all prokaryotic structures reported up to 2015 and about one third of published plant and fungal structures. The average growth is ~600 structures per year.
CSDB stores structural, taxonomical, bibliographical, assigned NMR spectroscopic and other data (elucidation methods, publication abstracts, conformational, biochemical, and genetic data etc.) on carbohydrates with published structure. The source of data were import and manual re-annotation of other databases (incl. “CarbBank”), manual and semi-automated retrospective processing of publications, and user data submissions. All data have been checked for consistency by experts in carbohydrate biochemistry prior to the upload and corrected when necessary, which makes CSDB the only glycoinformatic project with fully moderated content. Comparison of the consistency of free carbohydrate databases showed the outstanding data intregrity in CSDB.
The CSDB interface includes the web-based user part, administrator part and gateways for automated data interchange with other databases. Currently it is cross-linked with NCBI PubMed, NCBI Taxonomy and GlycomeDB. Users can search the database by fragments of structure, bibliography, taxonomical annotations, fragments of NMR spectra, composition data, trivial names etc. The integration with numbered projects in glycomics has been achieved on the level of programming interface and by bulk data export in Resource Description Framework. The unambiguous but nevertheless human-readable carbohydrate structure description language has been developed for this project and translation tools to and from other known glycan representations are provided.
CSDB serves as a glycoinformatic platform and, except the database itself, hosts a number of services, such as:
The CSDB is freely available at http://csdb.glycoscience.ru.
CSBD was developed within the framework of the International Science and Technology Center Partner Project, supported by the Cooperative Threat Reduction Program of the US Department of Defense (ISTC Partner). The further funding originated from Russian Foundation of Basic Research and Russian Federation President grant committee. My personal role in this project was general research and development, database ideology and architecture, data formats, structure encoding language, programming of engine and services, web-design, cross-database interfaces, coordination of literature annotation and database filling processes.
CSDB project website
Merged CSDB poster, 2015 (18th European Carbohydrate Symposium) (JPG, 566Kb)
Bacterial, plant and fungal CSDB poster, 2014 (6th Baltic Meeting on Bacterial Carbohydrates) (JPG, 637Kb)
Bacterial CSDB poster, 2009 (4th Baltic Meeting on Bacterial Carbohydrates) (JPG, 876Kb)
Carbohydrate databases: problems and solutions (lection)
K.S Egorova, Ph.V. Toukach
"CSDB_GT : a new curated database on glycosyltransferases"
(Glycobiology, 2017, ePub ahead of print)
Ph.V. Toukach, K.S Egorova
"Carbohydrate Structure Database merged from bacterial, archaeal, plant and fungal parts"
(Nucleic Acid Research Database Issue, 2016, ò. 44(D1), ñòð. D1229-D1236)
K.S Egorova, A.N. Kondakova, Ph.V. Toukach
"Carbohydrate Structure Database: tools for statistical analysis of bacterial, plant and fungal glycomes"
(Database, 2015, ID bav073)
Ph. Toukach, K. Egorova
"Bacterial, Plant, and Fungal Carbohydrate Structure Databases: daily usage"
(ãëàâà â "Glycoinformatics", eds: T. Lütteke, M. Frank, ñåðèÿ: Methods in Molecular Biology, ò. 1273. Springer New York, 2015, ch. 5, pp. 55-85, ISBN 978-1-4939-2342-7)
R. Ranzinger, K.F. Aoki-Kinoshita, M.P. Campbell, S. Kawano, T. Lütteke, S. Okuda, D. Shinmachi, T. Shikanai, H.Sawaki, Ph.V. Toukach, M. Matsubara, I. Yamada, H. Narimatsu
"GlycoRDF: An ontology to standardize Glycomics data in RDF"
(Bioinformatics, 2015, v. 31(6), pp. 919-925)
R.R. Kapaev, K.S. Egorova, Ph.V. Toukach
"Carbohydrate structure generalization scheme for database-driven simulation of experimental observables, such as NMR chemical shifts"
(Journal of Chemical Information and modeling, 2014, v. 54, pp. 2594-2611)
R.R. Kapaev, Ph.V. Toukach
"Improved carbohydrate structure generalization scheme for 1H and 13C NMR simulations"
(Analytical Chemistry, 2014, v. 87, pp. 7006-7010)
Ph. Toukach, K. Egorova
"Bacterial, Plant, and Fungal Carbohydrate Structure Database (CSDB)"
(in "Glycoscience: Biology and Medicine", eds: T. Endo, P.H. Seeberger, G.W. Hart, C-H. Wong, N. Taniguchi, Springer Japan, 2014, ch. 29, pp. 241-250, ISBN 978-4-431-54840-9)
K.S. Egorova, Ph.V. Toukach
"Expansion of coverage of Carbohydrate Structure Database (CSDB)"
(Carbohydrate Research, 2014, v. 389, pp. 112–114)
K.F. Aoki-Kinoshita, J. Bolleman, M.P. Campbell, S. Kawano, J. Kim, T. Lütteke, M. Matsubara, S. Okuda, R. Ranzinger, H. Sawaki, T. Shikanai, D. Shinmachi, Y. Suzuki, Ph.V. Toukach, I. Yamada, N.H. Packer, H. Narimatsu
"Introducing glycomics data into the Semantic Web"
(Journal of Biomedical Semantics, 2013, v. 4, id.39)
"CSDB and other carbohydrate databases"
(Glycoconjugate Journal, 2013, v. 30, pp.347-349)
K.S. Egorova, Ph.V. Toukach
"Critical analysis of CCSD data quality"
(Journal of Chemical Information and modeling, 2012, v. 52(11), pp.2812-2814)
"Bacterial Carbohydrate Structure Database 3: Principles and Realization"
(Journal of Chemical Information and modeling, 2011, v. 51(1), pp.159-170)
S. Herget, Ph.V. Toukach, R. Ranzinger, W.E. Hull, Y. Knirel, C.-W. von der Lieth
"Statistical analysis of the Bacterial Carbohydrate Structure Data Base (BCSDB): Characteristics and diversity of bacterial carbohydrates in comparison with mammalian glycans"
(BMC Structural Biology, 2008, v.8, id.35)
Ph. Toukach, H. Joshi, R. Ranzinger, Yu. Knirel, C.-W. von der Lieth
"Sharing of worldwide distributed carbohydrate-related digital resources: online connection of the Bacterial Carbohydrate Structure DataBase and GLYCOSCIENCES.de"
(Nucleic Acid Research - Database Issue, 2007, v.35, pp. D280-D286)