1Department of Bioinformatics, Faculty of Engineering, Soka University, Tokyo, Japan
2Swiss Institute of Bioinformatics, Geneva, Switzerland
3Biomolecular Frontiers Research Centre, Macquarie University, Sydney, Australia
4Database Center for Life Science, Research Organization of Information and Systems, Tokyo, Japan
5Institute of Veterinary Physiology and Biochemistry, Justus-Liebig University Giessen, Giessen, Germany
6Laboratory of Glycoorganic Chemistry, The Noguchi Institute, Tokyo, Japan
7Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, Shiga, Japan
8Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
9Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia, USA
10Research Center for Medical Glycoscience, National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan
11NMR Laboratory, N.D. Zelinsky Institute of Organic Chemistry, Moscow, Russia
KEYWORDS: BioHackathon, Carbohydrate, Data integration, Glycan, Glycoconjugate, SPARQL, RDF standard, carbohydrate structure database
Journal of Biomedical Semantics, 2013, v.4, id. 39
Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)1, which can, for example, serve as "switches" that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases.
In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as "proofs-of-concept" to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement.
We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains.