There are two use cases that we are covering with Lambdas right now at TangoCode: Restful APIs for microservices and one-time processing tasks for distributed batch processes.
The verdict is in: AWS Lambdas have done their job. They fit perfectly in these use cases and provide the scalability we need at the cost we expect for our clients. That said, though, there is no such thing as a perfect technology that has all the tools and does all the work Software Engineers need.
Because of that, I’m here to discuss an alternative to AWS Lambdas: using ECS and Docker containers to mitigate some limitations such as cold-start (This technology has improved drastically, but there may still be some scenarios where you need close to real-time responses), capacity and timeout limits, and the development experience.
First, some context …
Elastic Container Service (ECS) is a container orchestration service that supports Docker. ECS comes with several components that make this orchestration happen. These include the following:
- Cluster. A Cluster is how AWS groups services or tasks inside ECS.
- Task Definition. The task definition contains a description of how a task should launch. It has information such as which container image to use, the capacity allocated for this container (CPU, Memory), and many other configurations that are available for a Docker container.
- ECS Task. Is a running container that is launched based on the configuration from the Task Definition. A task should belong to a Service when it is a long-running task. A software component that constantly fetches the messages from a queue when available is a great example. A task can be configured on its own to run short jobs that will exit and won’t be re-launched again, such as a scheduled cronjob.
- Service. A Service controls a group of long-running tasks and ensures everything is going according to the configuration. If a task becomes unhealthy, the service is in charge of stopping it and launching a new one to maintain the desired running task number.
- Fargate. Fargate is an AWS service for ECS that allows us to focus on designing and implementing our applications without managing any server. A Task can run as EC2 or Fargate type. On EC2 tasks types, we will have to take care of scaling the instances where the containers are running, this is not the case if you use Fargate.
- Elastic Container Registry (ECR). Is the service that allows us to upload our Docker container images and keep them in our AWS account. It is like a Docker Hub by AWS.
Disk Capacity, Memory, and Timeout
The first limitation that pushed us to implement an alternative to lambdas was the resource limits per invocation. Two scheduled lambdas in our distributed batch process were about to hit the time execution limit of 5 minutes (this was before the most recent AWS update – Lambdas can now run for 15 minutes). It was also at the limit of its temporary disk space. The requirements for these lambdas went from downloading and processing ~100MB files with real estate information of 5000 properties to files greater than 400MB with more than 15000 properties.
After doing some tests with the largest file we might need to process based on client requirements, we found the best configuration for our Fargate task was 2 vCPU and 2048 GB of memory.
In Fargate the memory and CPU values should be assigned based on these configurations:
|0.25 vCPU||0.5GB, 1GB, and 2 GB|
|0.5 vCPU||Min. 1GB and Max. 4GB, in 1GB Increments|
|1 vCPU||Min. 2GB and Max. 8GB, in 1GB Increments|
|2 vCPU||Min. 4GB and Max. 16GB, in 1GB Increments|
|4 vCPU||Min. 8GB and Max. 30GB, in 1GB Increments|
For this use case, we scheduled a lambda to trigger an ECS Task running with Fargate configuration every day at 3:00 AM.
Currently, Fargate doesn’t support task scheduling. Therefore, we had to implement this ECS Worker Pattern. Now we are able to process large XML files with an approximate cost of $0.025 (5 minutes) per execution with no time or disk space constraints.
With AWS Lambdas and Serverless framework, we deploy our applications following the micro-services architecture. However, the time that each lambda takes to be up and running (cold-start) has an impact on the waiting time per HTTP request.
With ECS we can deploy our code with the same microservice architecture mapping each microservice to an ECS service and deploy as many tasks or replicas as we want in the availability zones that we need. That way, there is no such cold-start limitation, since the containers are continuously running.
To deploy ECS microservices, it’s necessary to configure a load balancer on top. This distributes the HTTP traffic to each container throughout listeners that route the requests to target groups. The target groups then know how many tasks are running for a specific ECS Service or micro-service, their availability zones, and whether they are healthy or not.
As the above diagram points out, we are running the micro-services with Fargate because of the simplicity this technology provides. However, for long-running tasks, Fargate could be more expensive than EC2-based tasks.
The Development Experience
Deploying our applications as Docker containers provides several benefits. One of them is the ability to debug the code on our local machines using the actual configuration that the application will use once it is deployed to either environment (QA, UAT, PROD, etc.).
This practice could be a bit tedious when working with Lambda functions. In desperate cases, finding bugs means putting logs in our code, deploying our lambdas to a testing environment in AWS, and going through the Cloud Watch logs to figure out what could be wrong. Working with Docker instead saves us the deployment step and offers more tools we can use to run code locally, test it, and debug it.
Configuring our code to be deployed to ECS is not as easy as deploying lambda functions with Serverless Frameworks. Understanding the required steps can be confusing and frustrating,
Despite that, though, it’s a worthwhile process. Once the first task is deployed correctly, it’s simply a matter of replicating the steps to deploy other tasks manually, or to go through any continuous integration service.
Please refer to this guide for the needed steps.
It remains very important to understand when to use Fargate vs EC2. Therefore, we have to bear in mind that, for EC2, AWS charges on a monthly basis for the running instances. For Fargate, it charges us for the executed time.
|Per vCPU per Hour||$0.0506|
|Per GB per Hour||$0.0127|
|EC2 – t2.small||Fargate|
Based on the numbers from the table above, running 1 task on Fargate for 730 hours (1 month) is 3.4 times more expensive than running it with the same specs on EC2. So if costs are a concern and you are willing to assume the EC2 maintenance and configuration to reduce the billing. Fargate should be the way to go to execute short-running tasks (tasks that will execute and will stop once the job is done) and EC2 should be the best option for long-running tasks (tasks that need to be constantly running expecting an event to do their job). Some examples of long-running tasks are restful microservices, queue consumers, and any other type of server.