This is an old revision of the document!

Development of a Semantic database for the MEDIRAD Image and Radiation Dose BioBank (IRDBB)
Members
Former members
General purpose
Overall IRDBB architecture
OntoMEDIRAD ontology
Semantic database
Semantic Translator
Collaborations
Technical reports
Publications
Funding

Development of a Semantic database for the MEDIRAD Image and Radiation Dose BioBank (IRDBB)

Members

Bernard Gibaud - Leader

Former members

Marine Brenet - Software Engineer

General purpose

This work addresses the development of a computer system called Image and Radiation Dose BioBank (IRDBB) designed to manage image and dosimetric data in an integrated way. This development is exemplary of the strategy promoted by our team for implementing imaging biobanks in the future. This strategy puts emphasis on adhering to the F.A.I.R. principles, i.e. making sure that the shared data is: Findable, Accessible, Interoperable and Reusable. This strategy recommends using the semantic web technologies (ontologies) to ensure the precise definition of shared information, and the use of standards (among which the DICOM Standard).

Our contribution to the development of the IRDBB system concerns the development of a semantic database implemented as a Resource Description Framework (RDF) graph aligned onto an application ontology called OntoMEDIRAD, that specifies the semantics of any information within this database.

The IRDBB system was designed to fulfil the needs of the researchers involved in the MEDIRAD project, both the IRDBB system and the OntoMEDIRAD ontology were developed with an objective of extensibility and reusability in the context of similar projects. The choice of an ontology-based approach aims eventually at facilitating the access to MEDIRAD research data to a wide community of researchers interested in low dose research, e.g. via federated systems.

Overall IRDBB architecture

The overall architecture of the IRDBB system is shown in Fig. 1.

The major components are:

a component called IRDBB_UI, which is a web server managing the user interface
a component called KHEOPS, developed in Geneva by the Osman Ratib's group in ITMI, managing the DICOM data (based on the DCM4CHE software)
a component called FHIR repository, developped in the b<>com Technology Institute, managing all non-DICOM files
a component called Semantic Translator, providing a set of services to populate and query the semantic database
a STARDOG Triple Store, supporting the semantic database
a component called Sparklis Portal, extending the IRDBB_UI to assist the users in building SPARQL queries
a component called Keycloak, also provided by ITMI and b<>com providing a Single Sign-On mechanism for access control.

OntoMEDIRAD ontology

This ontology was developed iteratively between 2017 and 2020. It aims at addressing the needs expressed by the MEDIRAD participants in their answers to the questionnaire sent in October 2017 (User needs and competency questions concerning the IRDBB repository). The ontology is organized as a set of files represented in OWL, the Web Ontology Language (Fig. 2).

The ontology was designed as an application ontology gathering all entities and relationships involved. The general modelling approach that we adopted was a realist one, i.e. trying to refer to entities existing in reality. Of course, we tried as much as possible to reuse existing ontological resources. Therefore, we adopted an organization in modules, in which the root application ontology (called OntoMEDIRAD) imports several extracts of existing ontologies. These extracts, e.g. from the Foundational Model of Anatomy, the Units Ontology (UO), the Phenotype and Trait Ontology (PATO) were generated using the OntoFox tool2 [11], based on the MIREOT model. The overall integration of these disparate ingredients relied on the common philosophical ground provided by the Basic Formal Ontology (BFO version 23).

Fig. 3 shows an extract of the OntoMEDIRAD ontology.

The OntoMEDIRAD ontology can be freely download and reused:OntoMEDIRAD Files. The paper to be cited in reference is the AMIA 2020 paper (see below).

Semantic database

The Semantic database is an RDF graph containing RDF assertions that document the nature and provenance of all the data shared in the MEDIRAD IRDBB repository, i.e. DICOM or non-DICOM data. The DICOM data concern:

image data such as CT images, that correspond directly to irradiation events
image data such as SPECT or PET images, that may either correspond to irradiation events (due to the injection of a radiopharmaceutical) with some diagnostic goal, or correspond to a strategy to estimate the biodistribution of the radiopharmaceutical used in an internal radiotherapy (e.g. 131Iodine treatment of thyroïd cancer)
structured reports such as CT Radiation Dose Structured Reports
other structured reports, e.g. implementing e-CRF documenting the internal radiotherapy treatments in WP3

The non-DICOM data concern e.g.:

voxelized dose maps calculated by Monte Carlo simulation
segmentation of organs and tissues of interest.

The Semantic database is populated by the Semantic Translator, when the data files are imported into the IRDBB system. It is supported by the Stardog Triple store.

The Semantic database can be queried though the IRDBB_UI web interface. Two ways are proposed:

use of predefined SPARQL queries
use of the SPARKLIS tool, a tool allowing the end-users to freely navigate in the RDF graph and build their own SPARQL queries. This tool was provided by Sébastien Ferré (University of Rennes 1).

Semantic Translator

This software was designed by Marine Brenet. It is implemented as a set of services, called by IRDBB_UI or KHEOPS. The main services concern the creation of the RDF assertions describing the nature and provenance of the DICOM and non-DICOM data, and the management of predefined SPARQL queries. For DICOM data, the Semantic translator translates into RDF the key DICOM metadata. For non-DICOM data, the relevant provenance metadata are provided in an XML file (non-DICOM File set descriptor) that is part of the non-DICOM File set to be imported. A more detailed documentation of the Semantic translator is available in xx. The domain covered by the predefined SPARQL queries is described in yy.

Collaborations

Direct collaborators in the development of IRDBB

Guillaume Pasquier, b<>com, Rennes
Joël Spaltenstein, Institute of Translational Molecular Imaging (ITMI), Genève
Nicolas Van Dooren, ITMI, Genève
Osman Ratib, ITMI, Genève

Other collaborators in the development of IRDBB

John Stratakis, University of Crete
John Damilakis, University of Crete
Manuel Bardiès, CRCT Inserm, Toulouse
Alex Vergara Gil, CRCT Inserm, Toulouse

Former collaborators in the development of IRDBB

Cédric Moubri-Tournes, b<>com, Rennes
Eric Guiffard, b<>com, Rennes

Technical reports

Documentation of the ontology of the IRDBB semantic repository, Version 1.3, MEDIRAD Technical report, MS8 Milestone, 2020.
Documentation of the Semantic Translator software, Version 1, MEDIRAD Technical report, 2020.
Documentation of predefined SPARQL queries, Version 1, MEDIRAD Technical report, 2020.

Publications

Spaltenstein J, Roduit N, van Dooren N, Pasquier G, Brenet M, Gibaud B, Pasquier G, Mildenberger P and Ratib O. A multicentric IT platform for storage and sharing of imaging-based radiation dosimetric data. CARS 2020 Munich (Germany).
Gibaud B, Brenet M, Pasquier G, Vergara Gil A, Bardiès M, Stratakis J, Damilakis J, van Dooren N, Spaltenstein J, and Ratib O. A semantic database for integrated management of image and dosimetric data in low radiation dose research in medical imaging. American Medical Informatics Association (AMIA) Conference, November 2020, Chicago (USA).

Funding

This work was supported by the European Commission as part of the MEDIRAD project (Number 755523) in the Horizon 2020 Program (EURATOM NFRP-2016-2017).

Table of Contents