Publicly accessible results

This page offers links to publicly accessible outcomes of eTRANSAFE.

The source code and related information of many of the project assets have been included in public repositories, and are available for their use by the wider scientific community.

Tool	Description
Flame	Flame is a flexible framework supporting predictive modeling and similarity search within the eTRANSAFE project. Flame allows to: Easily develop machine-learning models, for example, QSAR-like models, starting from annotated collections of chemical compounds stored in standard formats (i.e. SDFiles) Transfer new models into a production environment where they can be used by web services to predict the properties of new compounds. Flame can be used in command mode or by using a web-based GUI. The final version of Flame is v1.2.2. Source code, docker image, windows installer, and training materials are available on GitHub, where Flame will remain updated and accessible: Flame v1.2.2 Training material Related publication: PMID: 33875019
Models	As part of the different modelling activities carried out within eTRANSAFE, a collection of 19 Endocrine Disruptors models has been made available as an example of model sharing for the scientific community. These models can be found on the Harvard Dataverse.
Code related to clinical data work	An important objective of eTRANSAFE was to merge publicly available sources of drug safety and pharmacology data with proprietary preclinical and clinical data donated by pharmaceutical organisations. The code related to the work carried out in the project to make use of public clinical data is available on GitHub: Code used for matching drugs with adverse events from clinicaltrials.gov Code used for cleaning the FAERS data Code used for loading PubMed / Medline data Code for parsing the DailyMed product labels is available at EMC upon request
Rosetta Stone	The eTRANSAFE Rosetta Stone is a Spring Boot application that exposes API endpoints for the translation between clinical and preclinical terminologies, normalization and lookup of terms, as well as hierarchical expansions of concepts. The manual mappings as well as all the code required to run the eTRANSAFE Rosetta Stone are available on GitHub: Mappings Rosetta Stone code Also, there is a running version of the Rosetta Stone available on the public internet: Interface API documentation OntoBrowser code as modified for eTRANSAFE is also available on GitHub
PretoxTM (Preclinical Toxicology Text Mining)	The main objective of PretoxTM (Preclinical Toxicology Text Mining) is to retrieve treatment-related findings from toxicology reports using Natural Language Processing (NLP) techniques. The extracted findings are then presented in a well-defined user interface for validation by toxicology experts. PretoxTM uses Transformers, the state-of-the-art in NLP, to detect relevant toxicological sentences and to recognise relevant entities (NER) associated with a finding. In summary, PretoxTM can identify, capture and standardize findings related to drug treatment (i.e., safety findings) by mining legacy preclinical toxicology reports. Further information about PretoxTM along with training materials is available on the PretoxTM documentation. PretoxTM Corpus is available on Zenodo.
Chemistry Service	The chemistry service is a Django based web service wrapper for delivering compound identification functionalities. It is based on the ChEMBL database and the chembl_structure_pipeline. The Chemistry Service offers the following functionalities: compound name to compound structure conversion compound structure to compound name conversion compound structure standardisation compound structure checker get parent compound Source code of the Chemistry Service and ChEMBL Structure Pipeline are available on GitHub ChEMBL database is available on the EMBL-EBI website
Dataset of protein and genetic biomarkers	A dataset of protein and genetic biomarkers evaluated in clinical trials has been created as part of eTRANSAFE. This has been done by using the Clinical Biomarker App, a dedicated application developed in the project which employs a natural language processing approach to identify, extract, and classify proteomic and genomic biomarkers used in clinical trials. A database with this dataset, complemented with information from the scientific literature has been made available at: https://www.disgenet.org/biomarkers/ Related publication: PMID: 36968019