Prof Susanna-Assunta Sansone
PhD

Associate Director, Associate Professor - FAIR Data Science

susanna-assunta.sansone[at]oerc.ox.ac.uk
http://uk.linkedin.com/in/sasansone - twitter: @SusannaASansone

I am an Associate Director, Associate Professor and Principal Investigator at the Oxford e-Research Centre, part of the Department of Engineering Science of the University of Oxford. I am also one of the founding members of the Digital Research Cluster at Wolfson College, Oxford, and a Consultant for Springer Nature, and Founding Honorary Academic Editor of the Scientific Data journal.

I hold a PhD in Molecular Biology from Imperial College of Science, Technology and Medicine, London, UK; after few years working on vaccine genetics in an Imperial's spinnoff (now known as Emergent BioSolutions, Inc . ) I moved to the European Bioinformatics Institute (EBI, Cambridge) where I worked for nine years as a Project and Team Coordinator and Principal Investigator, before moving to the Oxford e-Research Centre in 2010.

PUBLICATIONS

All my publications are here and/or on my Google Scholar profile

INTERESTS AND ACTIVITIES

My interests and activities are in the areas of knowledge and information management , and interoperability of applications in the life, environmental and biomedical sciences, impacting on the reproducibility of research outputs and the evolution of scholarly publishing , which drive science and discoveries. Specifically I strive to make digital research objects, including data, Findable, Accessible, Interoperable and Reusable = FAIR.

For the last 16 years, my group and I have collaborated with researchers in academic, governmental and commercial settings, informatics professionals, service providers, learned societies, pre-competitive informatics initiatives, library science experts, journal editors, funders and policymakers worldwide, to:

  • influence data policies by leading and promote the collaborative creation and uptake of open, community-driven representation standards, e.g. ontology, semantic web methodologies and principles;

  • enable science by developing and implementing methods and standards-driven open source resources to improve the collection, curation, representation and publication of multi-dimensional data; and

  • prepare a new generation of scientists by creating and delivering educational lectures, training and teaching material, to address the glaring lack of courses in these specialized subjects, also mentoring, tutoring and supervising high-achieving undergraduate and graduate students.

The following sections define key accomplishments, up to January 2017.

APPLIED RESEARCH

  • Delivered over 110 publications, out of these 86 are peer-reviewed papers published in 29 different journals in the life, environmental, biomedical and computational area; achieved h-index 45 and i10-index 85; an average of almost 1,000 citations per year, for the last few years - see my profile on Google Scholar;

  • Written 2 expert pieces for the UK Wellcome Trust ( "Interoperability Standards" ), and the USA National Institutes of Health (NIH) Big Data to Knowledge Initiative (BD2K) ( "Making Data Broadly Usable" ), to inform their policies and framework to support open data and interoperability standards;

  • Held a total of 22 grants, of which 15 in Oxford (14 as the Oxford PI) worth £4,4 million, funded by the BBSRC, MRC and NERC, the European Commission (EC), pharmas and the NIH BD2K;

  • Co-established the ( ELIXIR UK Node ), a virtual entity - supported by the UK RCs and the Wellcome Trust - representing the UK bioinformatics capacity and key research infrastructure resources; the UK Node is part of ELIXIR , an European inter-governmental organization of 15 member nodes, including EMBL-EBI, in 14 countries;

  • Launched and maintained two long-standing, community-driven data resources of international standing (selected in 2016 as formal resources of the ELIXIR UK Node and the wider ELIXIR community), which have a growing user base of researchers, service providers, librarians, trainers, publishers and funding agencies: ISA and BioSharing are described in the sections below.

TEACHING AND TRAINING

  • Curriculum development and continuous improvement of syllabus design for the data management component of “Data Management, Analysis and Statistics”, a 2-week foundation module (the first to include data management as mandatory subject) of the Oxford BBSRC Interdisciplinary Bioscience Doctoral Training Programme and EPSRC and BBSRC Synthetic Biology Centre for Doctoral Training;

  • Delivered over 140 invited or selected lectures at events worldwide, including educational seminars, including for the NIH BDK2 "Fundamental for Data Science" ( "Metadata Standards" ) and the Stanford Centre for Biomedical Informatics; my talks (most of which are available on my slideshare site since 2010) have been viewed over 31,000 times;

  • Co-supervised a high achieving Oxford DPhil graduate, also a BBSRC, EPSRC, ESRC and JISC funded Software Sustainability Institute Fellow, and five MSc students for their final year projects;

  • Co-founded the nascent Data Carpentry for Life Science initiative under the Elixir UK Node in an international context, designed to tune and extend generic concepts and tools to data curation, representation and interoperability standards areas.

UNIVERSITY, POLICIES AND PUBLIC

  • Advised, chaired and co-chaired on almost 20 boards of not-for-profit, commercial organizations, University Boards and open working groups focused on research data management, served as reviewer for nine journals and expert for BBSRC and NERC panels;

  • Co-established Springer Nature Scientific Data data journal, which uses ISA and BioSharing resources; growing since its launch in 2014, its content has been indexed in PubMed and Medline and its standing in the community continues to grow;

  • Contributed as an editor to Oxford University Press GigaScience , the other ISA and BioSharing supportive data journal, obtaining its first IF of 7.463 in 2016;

  • Contributed to the design, delivery and publication of the community-developed FAIR principles , defining the characteristics that contemporary data resources, tools and infrastructures should exhibit to be Findable, Accessible, Interoperable, and Reusable by third parties. Published by 53 international leaders, the article includes exemplar implementations like ISA and BioSharing; the principles have rapidly been adopted by publishers (e.g. PloS, Elsevier), funders (e.g. the “UK Concordat on Open Research Data” signed by Jo Johnson, UK MP Minister of State for Universities and Science, NIH BD2K, EC Horizon 2020 future plans), infrastructure programmes and societies, such as ELIXIR, EBI databases, the European Open Science Cloud initiative, and the Research Data Alliance (RDA).

     

     

     

KEY UNIVERSITY AND COMMUNITY ROLES

  • Research File Service Board - Chair

  • IT Architecture Advisory Group - Member

  • Research Data Management Support - Member

  • RDA ; Technical Advisory Board Member

  • Dryad ; Board of Directors Vice-Chair

  • ELIXIR-UK Node; Executive Board Member

  • Elsevier Research Data Management Board; Member

  • Force11 Community; Advisory Board Member

  • Springer Nature Scientific Data; Founding Honorary Academic Editor

  • OUP GigaScience, BioMedCentral SIGS and JBMS; Editorial Board Member

MY CURRENT GROUP MEMBERS

  1. Philippe Rocca-Serra , Senior Research Lecturer

  2. Alejandra Gonzalez-Beltran , Research Lecturer

  3. Milo Thurston , Research Software Engineer

  4. Peter McQuilton , Knowledge Engineer

  5. Allyson Lister , Knowledge Engineer

  6. David Johnson , Research Software Engineer

  7. Massimiliano Izzo , Research Software Engineer

  8. Eamonn Maguire , Contractor (former DPhil student)

  9. Delphine Dauga , Contractor Biocurator

  10. Melanie Adekale , Contractor Biocurator

INFRASTRUCTURE FOR RESEARCH

ISA infrastructure and ISA Commons

Embedded in several funded projects

A open source software suite underpinned by a community-driven representation format, to facilitate standards-compliant collection, curation, sharing and publication of experiments in the life, natural and biomedical sciences. Running since 2007, ISA has a user base ranging from hundreds to thousands of users from diverse domains, because it is currently it is embedded almost 30 public resources (institute-based, project/consortium-based or global repositories, including some based at EBI, in USA, Japan, China and Australia), supported two data-driven journals, and it complements 9 internal data platforms (also at the FDA National Centre for Toxicological Resources and Janssen R&D). The extension of the ISA metadata representation format for nanotechnology applications became a formal ASTM standard in 2013.

Resource approved by the ELIXIR UK Node , and an ELIXIR Service , part of interoperability platform.

FAIRsharing - Standards, Databases and Policies

Embedded in several funded projects

A curated, informative and educational resource on inter-related data standards, databases, and policies in the life, environmental and biomedical sciences, working with and for researcher, standard/database developer, funder, journal editor, librarian or data manager looking to make informed decisions. Launched in 2011 as the BioSharing portal, the FAIRsharing has had almost 60,000 users (as of Jan 2017), and it is endorsed by a community of 68 organizations, including publishers (embedded in the data policies of 600 Springer Nature’s journals, also PloS, EMBO press, BMJ, F1000Research, BioMedCentral, Oxford University Press, Wellcome Trust Open Research), standardization groups, and research data management support initiatives and libraries (such as those at JISC, Stanford, Cambridge and the Oxford Universities).

Resource approved by the ELIXIR UK Node and part of the ELIXIR Service ELIXIR interoperability platform.

StatO and OBI - Ontologies for Statistics Results and BioMedical Investigation

Embedded in several funded projects

The Ontology for Biomedical Investigations (OBI) project is an international, collaborative effort to build an integrated ontology for the description of biological and clinical investigations.

Digital platforms for scholarly publishing - Consultancies

Partly embedded in funded projects

Collaborations with scientific, technical and medical publishers, including Springer Nature Scientific Data using the ISAexplorer to enable discovery of its curated content, and OUP GigaScience , with which we work to explore novel ways to track and publish scholarly outputs using ISA and other representation models .

FUNDED PROJECTS

NIH BD2K CEDAR - Centre for Expanded Data Annotation and Retrieval

Funds and duration: NIH, 2014-2018

CEDAR works to facilitate the use of metadata in the analysis of Big Data sets, contributing to the implementation of NIH Big Data two Knowledge (BD2K) initiative's vision. We work with colleagues at Stanford and Yale Universities to create a unified framework that researchers can use to create consistent, easily searchable standards-compliant metadata. As partner in the centre, I also seat on the Steering Committee, bringing in ISA, FAIRsharing and our ontology activities.

NIH BD2K BioCADDIE - Biomedical and healthCAre Data Discovery and Indexing Ecosystem

Funds and duration: NIH, 2014-2018

BioCADDIE engages a broad community of stakeholders to create the NIH Big Data two Knowledge (BD2K) Data Discovery Index (DDI). The DDI will do for data what PubMed (and PubMed Central) did for the literature. I seat on its Executive and Steering Committee and lead several working groups, bridging our FAIRsharing activities on standards and metadata.

ELIXIR's UK Node and ELIXIR EXCELERATE

Funds and duration: BBSRC, MRC, NERC, 2014-2017 (phase 1); EC, 2015- 2018

The UK Node contributes the country’s substantial expertise in bioinformatics expertise for researchers, computer scientists and data managers in the Life, Natural and Medical Sciences. We lead on standards and curation areas. The UK Node is also funded as part of the larger ELIXIR EXCELERATE grant, set to better integrate activities cross all nodes.

IMI eTRIKS - European Translational Information and Knowledge Management Services

Funds and duration: Roche, 2014-2017

eTRIKS develops the knowledge management platform and services to support data intensive translational research for the Innovative Medicines Initiative (IMI), Europe’s largest public-private initiative. Funded by Roche, we bring in this project ISA, FAIRsharing and our expertise on community standards.

BioSchemas

Funds and duration: ELIXIR and EC, 2017- 2018

Extension of the schema.org vocabulary, used by major search engines like Google, Yahoo, Yandex, Microsoft and Pinterest to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond. Under the Bioschemas umbrella, we work on specifications to improve the description of generic types in life sciences. We lead on the Dataset and Standards workstreams. 

IMI  IMPRiND - Inhibiting Misfolded PRotein propagation in Neurodegenerative Diseases

Funds and duration: EC and ESFRI, 2017-2021

An international consortium that aims to map and target critical steps in the propagation of misfolded tau and α-synuclein, considered the main culprits of neurodegeneration in Alzheimer's and Parkinson's disease respectively. We lead on the data representation and publication activities, bringing elements of our ISA and FAIRsharing and our expertise on community standards.

COPO - Collaboratively Open Plant Omics

Funds and duration: BBSRC, 2015-2018

A collaboration with Earlham Institute, Warwick and EMBL-EBI, COPO develops a framework to utilise existing services to facilitate the description, deposition and publication of datasets, but also to enable the identification and citation of datasets, thereby increasing transparency and reproducibility.

UK-China collaboration on omics data publication and curation

Funds and duration: BBSRC, 2012-2015 (phase 1), 2015-2018 (phase 2)

Collaboration with GigaScience, a joint BioMedCentral and BGI data journal with associated database to define common curation practices for omics-based datasets.

PhenoMeNal: Infrastructure for phenome and metabolome analysis

Funds and duration: EC H2020, 2015-2018

A collaboration with variety of other European partners to develop a data processing and analysis infrastructure (and related services) for molecular phenotype data generated by metabolomics applications, set to improve the understanding of the causes and mechanisms underlying health, healthy ageing and diseases.

MultiMot: Infrastructure for cell migration data

Funds and duration: EC H2020, 2015-2018

A coordinated action with a variety of European partners, linked to international efforts to set and promote community standards and infrastructure to report and share cell migration data.

Metagenomics Data Infrastructure

Funds and duration: BBSRC, 2012-2015 (Completed)

Coordinated by EMBL-EBI, the Metagenomics service is being developed to be an automated pipeline for the curation, archiving and analysis of metagenomic data.

COSMOS - COordination Of Standards In MetabOlomicS

Funds and duration: EC FP7, 2012-2015 (Completed)

A collaboration with EBI and a variety of other European partners, COSMOS (Coordination of Standards in Metabolomics) has brought together European metabolomics data providers to set and promote community standards.