Ignite summit schedule
What We Learned Supporting 1M RPM with 10 MS Latency Using Apache Ignite at iFood
iFood is the largest Food Tech in Latin America, with more than 300k stores in more than 1700 cities in Brazil and Colombia. Our end user logs into the app, selects a restaurant, selects the food they want to be delivered and presses send. This order request is then delivered to the restaurant for it to be produced and eventually delivered to the end user. Most of these requests are made asynchronously via events, and my team's responsibility is to deliver these events to the restaurants. Each restaurant session should receive each event at least once, and currently, this is achieved by sending an acknowledgment request for each event for each session. The events and acknowledgments are processed using SQS consumers, and these queues are populated by SNS topics. Because of that, we can add new queues attached to topics to run new systems alongside existing ones and compare them. The backend for this started with a polling system over a Postgres database. This database held pretty nicely until we got to around 10 million monthly orders for 50,000 stores, but it started getting expensive. It was also a single point of failure which we wanted to avoid. With this in mind, four years ago, our team worked on another polling solution called connection-order-events using Apache Ignite, which is currently our primary polling system, while we maintained the old one as a fallback. The platform is now processing more than 65 million orders per month, which translates to more than 500M events being delivered to external agents (user devices and integrations). The connection-order-events service is responsible for indexing and providing order events to all stores through a public API via polling. During the years, we've had some difficulties with this solution using Apache Ignite. Some issues with cache rebalancing, some difficulties related to using Kubernetes, node discovery, scaling, among others. In these four years, we learned how to optimize the use of Apache Ignite and ensure greater stability of the platform, which allows us today to support an average throughput of 1M rpm with p99 at less than 10ms of latency.
Henrique Arroyo
Graduated in computer science in 2014 from the University of São Paulo. He has worked most of his career developing Java web applications and backend systems. After working with several companies and research institutes, he started working at iFood in 2019 in an environment with a wide range of technologies, including Apache Ignite. Has since been working to adapt and improve the solution.