Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides.
Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should be able to pull together data across several structured and unstructured repositories to index and search on.
One such unstructured data repository is Confluence. Confluence is a team workspace that gives knowledge worker teams a place to create, capture, and collaborate on any project or idea. Team spaces help teams structure, organize, and share work, so every team member has visibility into institutional knowledge and access to the information they need.
There are two Confluence offerings:
We’re excited to announce that you can now use the new Amazon Kendra connector V2 for Confluence to search information stored in your Confluence account both on the cloud and your data center. In this post, we show how to index information stored in Confluence and use the Amazon Kendra intelligent search function. In addition, the ML-powered intelligent search can accurately find information from unstructured documents having natural language narrative content, for which keyword search is not very effective.
This version supports OAuth 2.0 authentication in addition to basic authentication for the Cloud edition. For the Data Center (on-premises) edition, we have added OAuth2 in addition to basic authentication and personal access tokens for showing search results based on user access rights. You can benefit from the following features:
With Amazon Kendra, you can configure multiple data sources to provide a central place to search across your document repository. For our solution, we demonstrate how to index a Confluence repository using the Amazon Kendra connector for Confluence. The solution consists of the following steps:
To try out the Amazon Kendra connector for Confluence, you need the following:
Choose your preferred authentication method:
In this section, we show the steps to gather your authentication details depending on your authentication method.
For basic authentication with the Data Center edition, all you need is your login and password. Make sure your login has privileges to gather all content.
For Cloud edition, your user ID serves as your user login. For your password, you need to get a token. Complete the following steps:
This authentication method works for on premises (Data Center) only. Complete the following steps to acquire authentication details:
To configure Secrets Manager, we use the login URL and this value.
This authentication method follows the full OAuth2.0 (3LO) documentation from Confluence. We first create and configure an app on Confluence and enable it for OAuth2. The process is slightly different for the Cloud and Data Center editions. We then get an authorization token and exchange this for an access token. Finally, we get the client ID, client secret, and client code. Complete the following steps:
You should see three URLs listed.
The following is example code:
https://auth.atlassian.com/authorize? audience=api.atlassian.com &client_id=YOUR_CLIENT_ID &scope=REQUESTED_SCOPE%20REQUESTED_SCOPE_TWO &redirect_uri=https://YOUR_APP_CALLBACK_URL &state=YOUR_USER_BOUND_VALUE &response_type=code &prompt=consent
You’re redirected to your Confluence home page.
This is the authorization code that we use to exchange with the access token.
We need these values to make a call to exchange the authorization token with the access token.
Next, we use the Postman utility to post the authorization code to get the access token. You can use alternate tools like curl to do this as well.
The grant_type parameter is hard-coded. We collected the values for client_id and client_secret in a previous step. The value for code is the authorization code we collected earlier.
A successful response will return the access token. If you added offline access to the URL earlier, you also get a refresh token.
If you’re generating a new token from the refresh token, the current token is valid only for 1 hour. If you need to get a new token, you can start all over again. However, if you have the refresh token, as before, use Postman to post to the following URL: https://auth.atlassian.com/oauth/token. Use the following JSON format for the body of the token:
{“grant_type”: “refresh_token”, “client_id”: “YOUR_CLIENT_ID”, “client_secret”: “YOUR_CLIENT_SECRET”, “refresh_token”: “YOUR_REFRESH_TOKEN”}
The call will return a new access token
If using the Data Center edition with OAuth2 authentication, complete the following steps:
Use the client ID you copied earlier, and https://httpbin.org for the redirect URI. For CODE_CHALLENGE, enter the code you copied earlier.
You’re redirected to httpbin.org.
Use the client ID, client secret, and authorization code you saved earlier. For CODE_VERIFIER, enter the value from when you generated the code challenge.
The access token and refresh token are valid only for 1 hour. To refresh the token, post the following code to the same URL to get new values:
grant_type: refresh_token client_id: YOUR_CLIENT_ID client_secret: YOUR_CLIENT_SECRET refresh_token: REFRESH_TOKEN redirect_uri: YOUR_REDIRECT_URL
The new tokens are valid for 1 hour.
To store your Confluence credentials in Secrets Manager, compete the following steps:
To configure the Amazon Kendra connector, complete the following steps:
This creates and propagates the IAM role and then creates the Amazon Kendra index, which can take up to 30 minutes.
Complete the following steps to create your data source:
.
For Confluence Data Center and Cloud editions, we can add additional optional information (not shown) like the VPC. For Data Center edition only, we can add additional information for the web proxy. There is also an additional authentication option if using a personal access token that is valid only for Data Center and not Cloud edition.
Mapping fields is a useful exercise where you can substitute field names to values that are user-friendly and fit in your organization’s vocabulary.
A banner message appears when the sync is complete.
Now that you have ingested the content from your Confluence account into your Amazon Kendra index, you can test some queries. For the purposes of our test, we have created a Confluence website with two teams: team1 with the member Analyst1 and team2 with the member Analyst2.
The Confluence connector also crawls local identity information from Confluence. You can use this feature to narrow down your query by user. Confluence offers comprehensive visibility options. Users can choose their content to be seen by other users, at a space level, or by groups. When you filter your searches by users, the query returns only those documents that the user has access to at the time of ingestion.
Note that for Confluence Data Center edition, the user name is the email ID.
Rerun your search query.
This brings you a filtered set of results. Notice we bring back just 62 results.
We now go back and restrict Bob Straham to just be able to access his workspace and run the search again.
Notice that we get just a subset of the results because the search is restricted to just Bob’s content.
When fronting Amazon Kendra with an application such as an application built using Experience Builder, you can pass the user identity (in the form of the email ID for Cloud edition or user name for Data Center edition) to Amazon Kendra to ensure that each user only sees content specific to their user ID. Alternately, you can use AWS IAM Identity Center (successor to AWS Single Sign-On) to control user context being passed to Amazon Kendra to limit queries by user.
Congratulations! You have successfully used Amazon Kendra to surface answers and insights based on the content indexed from your Confluence account.
To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra connector for Confluence V2, delete that data source.
With the new Confluence connector V2 for Amazon Kendra, organizations can tap into the repository of information stored in their account securely using intelligent search powered by Amazon Kendra.
To learn about these possibilities and more, refer to the Amazon Kendra Developer Guide. For more information on how you can create, modify, or delete metadata and content when ingesting your data from Confluence, refer to Enriching your documents during ingestion and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.
Ashish Lagwankar is a Senior Enterprise Solutions Architect at AWS. His core interests include AI/ML, serverless, and container technologies. Ashish is based in the Boston, MA, area and enjoys reading, outdoors, and spending time with his family.