Over 30 million articles coming from approximately 7,000 journals, available in the form of published literature to multiple databases, are largely isolated. This makes it difficult to study a system in an integrated manner to derive actionable insights. However, this issue can be overcome by creating a holistic map based on collation and integration of all the available scientific data across various databases and scientific literature and representing them in the form of a comprehensive atlas. We engage students, teachers, and researchers from life sciences stream to curate and create an integrated human map by assimilating all macro and micro level information available in the public domain. This will look at every major cell in the body, to understand, how molecular signaling pathways and sub-cellular functions result in the extraordinary ability of humans to function. To build the atlas in a progressive fashion, we are currently focusing on the generation of skin atlas as phase I of the project.
Recognizing the potential of this project, the Department of Biotechnology (DBT), India, and Persistent Systems have funded – MANAV: The Human Atlas Initiative, for a period of 3 years. The Indian Institute of Science Education and Research (IISER), National Centre for Cell Science (NCCS), and Persistent Systems are collaborating to create an open source annotation platform that will help students, teachers, and researchers to aid in the collection, curation, and visualization of data from research articles seamlessly.
Persistent Systems is the technology partner in this visionary national project and is involved in designing and developing the annotation platform as well as multiple methodologies for different modules required for data collection, curation, annotation, visualization, and management.
Essentially, the following are the key objectives of this project which will be facilitated by the platform: –
A hybrid annotation approach to build the curated knowledge base
In MANAV, scientific articles are screened through with machine learning natural language processing (NLP) system and are transformed into the bio-semantic triplets (two entities and the relationship between them). Further, the data is manually reviewed by expert curators. The output of this pipeline provides a Knowledge Graph (KG) which provides information on the annotated article/s in the form of a network. We are further developing a query engine to perform keyword and phrase-based searches. Additionally, MANAV has a persona specific dashboard for better data visualization and exploration in order to understand the data flow and gain more insights into the data using the KG network. While ArangoDB is used to store the curated KG in MANAV, Python and R languages are used for the backend development.
Crowdsourcing and upskilling of students
Another major objective of the MANAV project is to upskill students for curation and annotation of scientific literature. Through a systematic process, various research articles are identified and curated in the form of KG (termed as gold standard KG). The computational methodology has been designed to compare the student curated articles with the gold standard KG. Here, Persistent jointly with NCCS and IISER Pune has developed multi-feature based supervised classification methods which successfully categorize the articles in five major subject groups and three article complexity levels for mapping each article to the curator based upon his/her proficiency.
The knowledge collected using the MANAV annotation platform will be freely available to the scientific community across the globe, enabling a community to be built around this project.
Want to contribute?
If you are a Life Sciences student or researcher and would like to contribute to MANAV, please visit https://manav.gov.in/ to register your interest.
We are also actively looking for academic institutions and research organizations to contribute to MANAV data curation. To know more about The Human Atlas Initiative please contact email@example.com