Configure Application Auto Scaling to manage Lambda provisioned concurrency on a schedule

david medragh
4 min readJan 31, 2022

Provisioned Concurrency feature Re-invent 2019.

Cold Start
- During a cold start, API response time might hit XXX milli-sec and this is just bad user experience.

This process took a while before it serves your API request that results in stiff response time and this process is known as a cold start.

And cold start is a big concern about the applicability of serverless technologies to latency-sensitive workloads.

Cases when we hit cold start:

  • For 1st API request after lambda code deployment.
  • For every 1st request to newly created lambda instance in the Auto-scaling process.

Concurrency

Concurrency is the number of requests that a Lambda function is serving at any given time. If a Lambda function is invoked again while a request is still being processed, another instance is allocated, which increases the function’s concurrency.

Due to a spike in traffic, when Lambda functions scale, this causes the portion of requests that are served by new instances to have higher latency than the rest. To enable our function to scale without fluctuations in latency, we use provisioned concurrency. By allocating provisioned concurrency before an increase in invocations, we can ensure that all requests are served by initialized instances with very low latency.

We can configure Application Auto Scaling to manage provisioned concurrency on a schedule or based on utilization. Use scheduled scaling to increase provisioned concurrency in anticipation of peak traffic. To increase provisioned concurrency automatically as needed, use the Application Auto Scaling API to register a target and create a scaling policy.

Reserving concurrency has the following effects:

  • Other functions can’t prevent your function from scaling — All of your account’s functions in the same Region without reserved concurrency share the pool of unreserved concurrency. Without reserved concurrency, other functions can use up all of the available concurrency. This prevents your function from scaling up when needed.
  • Your function can’t scale out of control — Reserved concurrency also limits your function from using concurrency from the unreserved pool, which caps its maximum concurrency. You can reserve concurrency to prevent your function from using all the available concurrency in the Region, or from overloading downstream resources.

BENEFITS OF AUTO SCALING

  • Organize Scaling quickly
  • Keep up performance automatically
  • Pay only for what you need
  • Make smart scaling choices

Types of auto Scaling

  • Scheduled Auto Scaling
  • Workload Auto Scaling (based on utilization)

SCHEDULED AUTO SCALING

Scheduled Auto Scaling is used when boosts of request rates are mostly predictable. Let’s take a simple example of a company, where employees work from 9 to 5, so request rates will be higher during this time.

Benefits of Scheduled Auto Scaling:

  • Cost-effective (Saves money) by scheduling Provisioned Concurrency for a fixed amount of time instead of being active all the time
  • Time saving and extra efforts by scheduling Provisioned Concurrency automatically

WORKLOAD AUTO SCALING

Is based on measured utilization and used when request rates are not really predictable.
Provisioned Concurrency here will match the change of workload.

Benefits of Workload Auto Scaling:

  • Time saving spent on manual Provisioned Concurrency settings as the Provisioned Concurrency will match the number of requests
  • Cost-effective by automatically fitting according to the workload when it is unpredictable for setting the Provisioned Concurrency beforehand

HOW TO

CONCURRENCY CONFIGURATION VIA TERRAFORM

Example Usage with Alias Name

resource "aws_lambda_provisioned_concurrency_config" "this" {
function_name = aws_lambda_function.this.function_name
provisioned_concurrent_executions = var.provisioned_concurrent_executions
qualifier = aws_lambda_alias.this.name
}

The following arguments are required:

  • function_name — (Required) Name or Amazon Resource Name (ARN) of the Lambda Function.
  • provisioned_concurrent_executions — (Required) Amount of capacity to allocate. Must be greater than or equal to 1.
  • qualifier — (Required) Lambda Function version or Lambda Alias name.

There is a way to manage this via the cli

Reference:

https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html
https://aws.amazon.com/blogs/compute/scheduling-aws-lambda-provisioned-concurrency-for-recurring-peak-usage/

--

--