Scale your Amazon Kendra index

Amazon Kendra is a fully managed, intelligent search service powered by machine learning. Amazon Kendra reimagines enterprise search for your websites and applications so your employees and customers can easily find the content they’re looking for. Using keyword or natural language queries, employees and customers can find the right content even when it’s scattered across multiple locations and content repositories within your organization.

Although Amazon Kendra is designed for large-scale search applications with millions of documents and thousands of queries per second, you can run smaller experiments to evaluate Amazon Kendra. You can run a proof of concept, or simply have a smaller workload and still use features that Amazon Kendra Enterprise Edition has to offer. On July 1, 2021, Amazon Kendra introduced new, smaller capacity units for smaller workloads. In addition, to promote experimentation, the price for Amazon Kendra Developer Edition was reduced by 55%.

Amazon Kendra Enterprise Edition capacity units

The base capacity for Amazon Kendra supports up to 100,000 documents and 8,000 searches per day, with adaptive bursting capability to better handle unpredictable query spikes. You can increase the query and the document capacity of your Amazon Kendra index through storage capacity units and query capacity units, and these can be updated independently from each other.

Storage capacity units offer scaling in increments of 100,000 documents (up to 30 GB storage), each. For example, if you need to index 1 million documents, you need nine storage capacity units (100,000 documents with base Amazon Kendra Enterprise Edition, and, 900,00 additional documents from the storage capacity units).

Query capacity units (QCUs) offer scaling increments of 8,000 searches for day, with built-in adaptive bursting. For example, if you need 16K queries per day (average QPS of 0.2) you can provision two units.

For more information about the maximum number of storage capacity units and query capacity units available for a single index, see Quotas for Amazon Kendra.

About capacity bursting

Amazon Kendra has a provisioned base capacity of one query capacity unit. You can use up to 8,000 queries per day with a minimum throughput of 0.1 queries per second (per query capacity unit).

An adaptive approach to handling unexpected traffic beyond the provisioned throughput is to use the built-in adaptive query bursting feature in Amazon Kendra. This allows you to apply unused query capacity to handle unexpected traffic. Amazon Kendra accumulates your unused queries at your provisioned queries per second rate, every second, up to the maximum number of queries you’ve provisioned for your Amazon Kendra index. These accumulated queries are automatically used to help handle unexpected traffic spikes above the currently allocated QPS capacity.

Optimal performance of adaptive query bursting can vary, depending on several factors such as your total index size, query complexity, accumulated unused queries, and overall load on your index. We recommend performing your own load tests to accurately measure bursting capacity.

Best practices

When dimensioning your Amazon Kendra index, you need to consider how many documents you’re indexing, how many queries you expect per day, how many queries per second you need to accommodate, and if you have usage patterns that require additional capacity due to sustained usage. You could also experience short peak times where you can accommodate brief periods of time for additional QPS requirements.

It’s therefore good practice to observe your query usage patterns for a few weeks, especially when the patterns are not easily predictable. This will allow you to define an optimal balance between using the built-in adaptive bursting capability for short, unsustained QPS peaks, and adding/removing capacity units to better handle longer, more sustained peaks and lows.

For information about visualizing and building a rough estimate of your usage patterns in Amazon Kendra, see Automatically scale Amazon Kendra query capacity units with Amazon EventBridge and AWS Lambda.

Amazon Kendra Enterprise Edition allows you to add document storage capacity in units of 100,000 documents with maximum storage of 30 GB. You can add and remove storage capacity at any time, but you can’t remove storage capacity beyond your used capacity (number of documents ingested or storage space used). We recommend estimating how often documents are added to your data sources in order to determine when to increase storage capacity in your Amazon Kendra through storage capacity units. You can monitor the document count with Amazon CloudWatch or on the Amazon Kendra console.

Queries per second represent the number of concurrent queries your Amazon Kendra index receives at a given time. If you’re replacing a search solution with Amazon Kendra, you should be able to retrieve this information from query logs. If you exceed your provisioned and bursting capacity, your request may receive a 400 HTTP status code (client error) with the message ThrottlingException. For example, using the AWS SDK for Python (Boto3), you may receive an exception like the following:

ThrottlingException: An error occurred (ThrottlingException) when calling the Query operation (reached max retries: 4)

For cases like this, Boto3 includes the retries feature, which retries the query call (in this case to Amazon Kendra) after obtaining an exception. If you aren’t using an AWS SDK, you may need to implement an error handling mechanism that, for example, could use exponential backoff to handle this error.

You can monitor your Amazon Kendra index queries with CloudWatch metrics. For example, you could follow the metric IndexQueryCount, which represents the number of index queries per minute. If you want to use the IndexQueryCount metric, you should divide that number by 60 to obtain the average queries per second. Additionally, you can get a report of the queries per second on the Amazon Kendra console, as shown in the following screenshot.

The preceding graph shows three patterns:

  • Peaks of ˜2.5 QPS during business hours, between 8 AM and 8 PM.
  • Sustained QPS usage over ˜0.5 QPS and below 1 QPS between 8 PM and 8 AM.
  • Less than 0.3 QPS usage on the weekend (Feb 7, 2021, was a Sunday and Feb 13,2021 was a Saturday)

Taking into account these capacity requirements, you could start defining your Amazon Kendra index additional capacity units as follows:

  • For the high usage times (between 8 AM and 8 PM Monday through Friday), your Amazon Kendra index adds 24 VQUs (each query capacity unit provides capacity for at least 0.1 QPS) which when added to the initial Amazon Kendra Enterprise Edition query capacity (0.1 QPS), can support 2.5 queries per second
  • For the second usage pattern (Monday through Friday from 8 PM until 8 AM), you add four VQUs, which when combined with your initial Amazon Kendra Enterprise Edition (0.1 QPS), provides capacity for 0.5 QPS.
  • For the weekends, you add two VQUs, provisioning capacity for 0.3 QPS.

The following table summarizes this configuration.

Period Additional VQUS Capacity (Without Bursting)
Mon – Fri 8 AM – 8 PM 24 2.5 QPS
Mon – Fri 8 PM – 8 AM 4 0.5 QPS
Sat – Sun 2 0.3 QPS

You can use this initial approach to define a baseline that needs to be reevaluated to ensure the right sizing of your Amazon Kendra resources.

It’s also important to keep in mind that query autocomplete capacity is defined by your query capacity. Query autocomplete capacity is calculated as five times the provisioned query capacity for an index with a base capacity of 2.5 calls per second. This means that if your Amazon Kendra index query capacity is below 0.6 QPS, you have 2.5 QPS for query autocomplete. If your Amazon Kendra index query capacity is above 0.6 QPS, your query autocomplete capacity is calculated as 2.5 times your current index query capacity.


In this blog post you learned how to estimate capacity and scale for your Amazon Kendra index.

Now it’s easier than ever to experience Amazon Kendra, with 750 hours of Free Tier and the new reduced price for Amazon Kendra Developer Edition. Get started, visit our workshop, or check out the AWS Machine Learning Blog.

About the Author

Dr. Andrew Kane is an AWS Principal Specialist Solutions Architect based out of London. He focuses on the AWS Language and Vision AI services, helping our customers architect multiple AI services into a single use-case driven solution. Before joining AWS at the beginning of 2015, Andrew spent two decades working in the fields of signal processing, financial payments systems, weapons tracking, and editorial and publishing systems. He is a keen karate enthusiast (just one belt away from Black Belt) and is also an avid home-brewer, using automated brewing hardware and other IoT sensors.


Tapodipta Ghosh is a Senior Architect. He leads the Content And Knowledge Engineering Machine Learning team that focuses on building models related to AWS Technical Content. He also helps our customers with AI/ML strategy and implementation using our AI Language services like Amazon Kendra.



Jean-Pierre Dodel leads product management for Amazon Kendra, a new ML-powered enterprise search service from AWS. He brings 15 years of Enterprise Search and ML solutions experience to the team, having worked at Autonomy, HP, and search startups for many years prior to joining Amazon four years ago. JP has led the Amazon Kendra team from its inception, defining vision, roadmaps, and delivering transformative semantic search capabilities to customers like Dow Jones, Liberty Mutual, 3M, and PwC.


Juan Bustos is an AI Services Specialist Solutions Architect at Amazon Web Services, based in Dallas, TX. Outside of work, he loves spending time writing and playing music as well as trying random restaurants with his family.