This paper presents a data Grid system, built on top of specific biological data sources in flat file format, which carries out the ingestion into a relational DBMS that integrates these data. The prototype has been implemented for UniProtKB (located at EBI - European Bioinformatics Institute, UK) and UTRdb (located at ITB/CNR Bari, Italy) data banks owing to the following two reasons: a public available relational schema of the UniProtKB and UTRdb does not exist; UniProtKB is the most complete repository of proteins whereas UTRdb contains mRNA nucleotides and although the relation between nucleotides and proteins could be important for several studies, an explicit management of such relationship (cross-referenced link) is not yet available. The system also allows transparent, periodic update of both the DBMS and the involved data banks. Each component is a GSI (Grid Security Infrastructure) enabled Web Service, exploiting the gSOAP Toolkit; the system utilizes several grid nodes to carry out the data ingestion faster whilst reducing the redundance of data present into the flat files.

A Grid System for the Ingestion of Biological Data into a Relational DBMS

ALOISIO, Giovanni;CAFARO, Massimo;
2007-01-01

Abstract

This paper presents a data Grid system, built on top of specific biological data sources in flat file format, which carries out the ingestion into a relational DBMS that integrates these data. The prototype has been implemented for UniProtKB (located at EBI - European Bioinformatics Institute, UK) and UTRdb (located at ITB/CNR Bari, Italy) data banks owing to the following two reasons: a public available relational schema of the UniProtKB and UTRdb does not exist; UniProtKB is the most complete repository of proteins whereas UTRdb contains mRNA nucleotides and although the relation between nucleotides and proteins could be important for several studies, an explicit management of such relationship (cross-referenced link) is not yet available. The system also allows transparent, periodic update of both the DBMS and the involved data banks. Each component is a GSI (Grid Security Infrastructure) enabled Web Service, exploiting the gSOAP Toolkit; the system utilizes several grid nodes to carry out the data ingestion faster whilst reducing the redundance of data present into the flat files.
2007
9780769528472
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11587/301397
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact