Skip to main content
Menu

FAIR Research - An interview with DPhil student Dominique Batista

Data Researcher at work

Image by ARMMYPICCA, Adobe Stock

We talked to Dominique Batista about his DPhil at Oxford e-Research Centre, focused on the principles of FAIR research

 

Why FAIR? What was previously 'unFAIR'?

The FAIR Principles initiative has only been around since 2016 and has already changed the way we do science, particularly in Europe where it has received support from the European Commission and the European Open Science Cloud.

In the past, data users were faced with having to navigate through forests of available resources, often discouraging, unappealing or known only to those within a certain discipline. Nothing was formalised properly, there were many initiatives trying to make sense of things resulting in much disagreement. Data was being unnecessarily reinvented or poorly reused.

“When enough data has been FAIRified, the way we do analysis will completely switch from a world of unstructured scientific data to a world where data is annotated, connected and properly structured. It will be easier and faster to train machine learning models, and machines will be able to exchange data without human intervention. I want to find the forest behind that tree.”

How does your DPhil research relate to the FAIR principles?

My DPhil will focus on the FAIRification process (the set of methods and tools that are used to make data FAIR). At the moment the verification of data is a very time consuming and complex process; a great deal human intervention and expertise is necessary. There are many challenges to overcome.

“I want to give back the science time to the scientist so they can focus on what matters.”

I first want to reduce the friction caused by the FAIRification process by automating as much as possible and providing a general working framework that can be reused by specific communities. This will support my second objective which is to greatly increase the reusability of research data and discover the full hidden value of what research can provide.

I’ll be investigating things such as domain specific knowledge about standards and ontologies, or technical expertise about data management, web semantics and data formats. There are huge challenges with the principles themselves because they lack formalisation. For instance: what does it mean to have rich metadata annotation? Or how does one evaluate the persistency uniqueness and stability of an identifier?

What was it that attracted you to the FAIR principles?

In 2015 I got my first software Engineering job at the French Institute of Bioinformatics. I was responsible for handling metadata and training materials for more than 30 French bioinformatics institutions and essentially I wanted to make my life easier. As IFB was part of the ELIXIR European project, I got involved with the Bioschemas Initiative where I met Susanna Sansone and Philippe Rocca Serra who are my internal DPhil Supervisors. The more I got involved with FAIR principles, the more I realised its potential as an area for study.

“It’s been proven many times data is the new gold and I am convinced the value we currently extract is only the tip of the iceberg.”

How is your research helping the wider community?

Part of my DPhil will be community driven. I will be working with the data producers of the Precision Toxicology Consortium, in particular Birmingham University, and other data producing partners, in order to understand their needs. I will use the Consortium data as my benchmark and proof of concept, at the same time reviewing the literature about existing tools and initiatives. I don’t want to reinvent the wheel if something exists that I can reuse.

It would be a risk to focus only on Life Sciences however. I’ll also be looking outside the Life Sciences community for external partners, for instance, Wes Armour has suggested that I look at the SKA projects, so that I can extend my own framework using the Consortium data to make different plugins and extensions that other communities can use. I would like to provide something that is general enough that all of the scientific community have a common ground.