I would like to build a mobile app that can scale to around 1M users over 1 year. We are currently testing with 100 users without any real load issues. We use the MERN stack with React Native Expo, and Google Cloud Services for GCB. We also use Google Cloud Run. We use a microservices architecture that we manage ourselves but thought of using Kafka. However, I need advice on optimising the app in terms of:
- load balancing,
- caching,
- database optimisation,
- autoscaling,
- load testing, and
- continuous optimisation frameworks
Any help would be appreciated! Thanks:)
Honestly, it sounds like you are prematurely optimizing.
I would like to build a mobile app that can scale to around 1M users over 1 year. We are currently testing with 100 users without any real load issues.
So would we all. I am building and running a mobile/web app that I intend to grow exponentially, too... but growth doesn't work like that. You're off by many orders of magnitude from 1MM DAU, and any effort you're spending on these questions is far better spent on business development, fundraising, user acquisition... you get the idea.
This isn't to be overcritical or flippant, but to help steer you away from spending engineering dollars or sweat equity (or both) on things you don't need. Granted, I have built my application in a way that I think will scale, however I consciously steer away from solutions that are needlessly over-engineered. I would suggest keeping your options open by building your systems in such a way that you can easily grow and adapt. For instance, consider how your monolith (if you have one) can be separated out into different services that may be able to scale. Consider also that there is a trend in cloud computing away from microservices madness back toward the monolith (the pendulum swings...) so don't get caught up in the hype.
Some of your current stack, e.g. Cloud Run, is also on my stack. I run an application that many only run with attached volume storage (Drupal, don't hate me for being a PHP guy) on Cloud Run with GCS as a persistence layer, along with Cloud SQL. Works great. When I need to scale... I will. I also run an instance of that application on GKE for easier shell access, cron jobs, etc.
TL;DR: Build your tech rapidly and avoid painting yourself into a corner... but realize that you're far more likely to have difficulties building a successful business than you are scaling when the traffic gets crazy.
To scale your mobile app from 100 users to potentially 1 million users over a year, you'll need to focus on several key areas. I'll provide advice on each of the points you've mentioned:
- Load Balancing:
- Implement a load balancer in front of your Cloud Run instances to distribute traffic evenly.
- Consider using Google Cloud Load Balancing, which integrates well with your existing GCP setup.
Use health checks to ensure traffic is only routed to healthy instances.
Caching:
Implement a distributed caching system like Redis or Memcached.
Use caching at multiple levels: application-level caching, database query caching, and HTTP caching.
Consider using a Content Delivery Network (CDN) for static assets.
Database Optimization:
Ensure proper indexing on frequently queried fields.
Consider using read replicas for scaling read operations.
Implement database sharding if you expect very high write loads.
Use database connection pooling to manage connections efficiently.
Autoscaling:
Utilize Cloud Run's built-in autoscaling capabilities.
Set up appropriate metrics and thresholds for scaling.
Consider implementing predictive autoscaling based on historical usage patterns.
Load Testing:
Use tools like Apache JMeter, Gatling, or Locust for load testing.
Simulate realistic user behavior in your tests.
Gradually increase load to identify bottlenecks.
Test different components of your system separately and then together.
Continuous Optimization Frameworks:
Implement application performance monitoring (APM) tools like New Relic or Datadog.
Use Google Cloud Monitoring for infrastructure and application metrics.
Set up automated alerts for performance degradation.
Regularly review and optimize based on collected metrics.
Additional recommendations:
- API Gateway: Consider implementing an API Gateway to manage and optimize API requests.
- Microservices: Since you're already using a microservices architecture, ensure each service can scale independently.
- Kafka: If you decide to use Kafka, it can help with event-driven architectures and managing high throughput of events/messages.
- Database Choice: Evaluate if your current database choice (likely MongoDB) is the best fit for your scaling needs. Consider using a combination of databases for different use cases (polyglot persistence).
- Asynchronous Processing: Implement message queues for handling time-consuming tasks asynchronously.
- Code Optimization: Regularly profile your code to identify and fix performance bottlenecks.
- Mobile App Optimization: Optimize the React Native Expo app for performance, considering things like lazy loading, efficient state management, and minimizing network requests.
Also, if you want to try out open sourced options, my suggestions would be -
Certainly, I'll elaborate on each point and provide open-source alternatives where applicable:
- Load Balancing:
- Open-source alternative: HAProxy or NGINX
- These can be deployed on your own infrastructure or cloud VMs
- Configure with round-robin, least connections, or IP hash algorithms
- Implement sticky sessions if needed for stateful applications
Use health checks to route traffic only to healthy instances
Caching:
Open-source alternatives: Redis, Memcached
Application-level caching: Use libraries like node-cache for Node.js
Database query caching: Implement MongoDB's built-in query cache
HTTP caching: Use Varnish as a reverse proxy cache
CDN alternative: Set up your own CDN using NGINX and geographically distributed servers
Database Optimization:
Use MongoDB's built-in tools like
explain()
to analyze query performanceImplement database indexing strategies (single field, compound, text indexes)
Consider time-series collections for time-based data
Use MongoDB Atlas (managed service) or set up your own MongoDB cluster
Implement database sharding using MongoDB's native sharding capabilities
Open-source connection pooling: Use MongoDB Driver's built-in connection pooling
Autoscaling:
Open-source alternative: Kubernetes with custom metrics
Implement horizontal pod autoscaling in Kubernetes
Use Prometheus for collecting custom metrics
Set up Grafana for visualizing metrics and creating dashboards
Implement predictive autoscaling using open-source projects like KEDA (Kubernetes Event-driven Autoscaling)
Load Testing:
Open-source tools: Apache JMeter, Gatling, Locust
JMeter: Java-based, feature-rich, supports various protocols
Gatling: Scala-based, good for testing REST APIs and websockets
Locust: Python-based, allows writing test scenarios in code
Create realistic user scenarios (e.g., login, browse, purchase)
Gradually ramp up concurrent users to identify system limits
Monitor system resources during tests to identify bottlenecks
Continuous Optimization Frameworks:
Open-source APM: Elastic APM, SigNoz
Elastic APM: Part of the ELK stack, provides detailed performance insights
SigNoz: Full-stack APM solution with tracing and metrics
Prometheus + Grafana: For metrics collection and visualization
Set up alerting using Alertmanager (works with Prometheus)
Implement distributed tracing using Jaeger or Zipkin
Additional elaborations:
API Gateway: - Open-source alternative: Kong or Traefik - Implement rate limiting, authentication, and request/response transformations - Use plugins for additional functionality like caching or logging
Microservices: - Use Docker for containerization - Implement service discovery using Consul or etcd - Use gRPC for efficient inter-service communication
Message Queues: - Open-source alternatives: RabbitMQ, Apache Kafka - RabbitMQ: Great for task queues and pub/sub messaging - Kafka: Excellent for high-throughput event streaming
Database Choices: - Consider using a combination of databases: - MongoDB for flexible document storage - PostgreSQL for complex relational data - Redis for caching and real-time features - Implement database migrations using tools like Flyway or Liquibase
Code Optimization: - Use Node.js profiling tools like clinic.js - Implement code splitting in React Native for faster initial load times - Use React Native's performance APIs to identify and fix UI bottlenecks
Mobile App Optimization: - Implement efficient state management using Redux or MobX - Use React Native's FlatList for rendering large lists efficiently - Implement lazy loading for images and other heavy content - Minimize bridge usage between native and JavaScript threads
Monitoring and Logging: - Set up centralized logging using the ELK stack (Elasticsearch, Logstash, Kibana) - Use Prometheus for metrics collection and Grafana for visualization - Implement distributed tracing using Jaeger or Zipkin
Security: - Implement OAuth 2.0 for authentication using open-source libraries like Passport.js - Use Let's Encrypt for free SSL/TLS certificates - Implement rate limiting and DDoS protection using fail2ban or custom solutions