How leverages Ignite for distributed analytics in the bioinformatics domain

<  Return to Schedule

At nference, we have a ton of use cases which involve heavy real time compute on large amounts of multi-omics data(genomics, proteomics etc). Each user request might require statistical computation involving 100s of GB of numerical data. An example of this is finding similar proteins which translates to cosine similarity of one vector against 10 million other vectors. We needed a horizontally scalable framework which allowed us to define different statistical analyses and execute it on TB's of numerical data in real time, without movement of data. We have built a light framework on top of Ignite to allow for developers to quickly create their own functions (E.g cosine similarity) to run on existing datasets and also upload their own datasets. We have leveraged ignite colocated processing and ignite thick clients to solve this. Internally , we built this with the goal of satisfying the use cases of one team. We have planned to migrate other real time distributed compute use cases from other teams to this framework.

Rohit Jain Engineering Lead, molecular analytics

My educational background is in NLP and engineering. Have published 3 papers in top NLP conferences. I spent around 2.5 years working in a cutting edge R&D lab out of india before joining as one of their early employees.

Apache Ignite® and associated open source project names are trademarks of the Apache Software Foundation.

The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event.

Contact Organizers
Copyright © GridGain 2021

Our Community Partner:

Apache Software Foundation logo

Organized by:

GridGain logo