We are excited to announce that Amazon Personalize now supports incremental bulk dataset imports; a new option for updating your data and improving the quality of your recommendations. Keeping your datasets current is an important part of maintaining the relevance of your recommendations. Prior to this new feature launch, Amazon Personalize offered two mechanisms for ingesting data:
With incremental bulk imports, Amazon Personalize simplifies the data ingestion of historical records by enabling you to import incremental changes to your datasets with a DatasetImportJob. You can import 100 GB of data per FULL DatasetImportJob or 1 GB of data per INCREMENTAL DatasetImportJob. Data added to the datasets using INCREMENTAL imports are appended to your existing datasets. Personalize will update records with the current version if your incremental import duplicates any records found in your existing dataset, further simplifying the data ingestion process. In the following sections, we describe the changes to the existing API to support incremental dataset imports.
A new parameter called importMode has been added to the CreateDatasetImportJob API. This parameter is an enum type with two values: FULL and INCREMENTAL. The parameter is optional and is FULL by default to preserve backward compatibility. The CreateDatasetImportJob request is as follows:
{ “datasetArn”: “string”, “dataSource”: { “dataLocation”: “string” }, “jobName”: “string”, “roleArn”: “string”, “importMode”: {INCREMENTAL, FULL} }
The Boto3 API is create_dataset_import_job, and the AWS Command Line Interface (AWS CLI) command is create-dataset-import-job.
The response to DescribeDatasetImportJob has been extended to include whether the import was a full or incremental import. The type of import is indicated in a new importMode field, which is an enum type with two values: FULL and INCREMENTAL. The DescribeDatasetImportJob response is as follows:
{ “datasetImportJob”: { “creationDateTime”: number, “datasetArn”: “string”, “datasetImportJobArn”: “string”, “dataSource”: { “dataLocation”: “string” }, “failureReason”: “string”, “jobName”: “string”, “lastUpdatedDateTime”: number, “roleArn”: “string”, “status”: “string”, “importMode”: {INCREMENTAL, FULL} } }
The Boto3 API is describe_dataset_import_job, and the AWS CLI command is describe-dataset-import-job.
The response to ListDatasetImportJob has been extended to include whether the import was a full or incremental import. The type of import is indicated in a new importMode field, which is an enum type with two values: FULL and INCREMENTAL. The ListDatasetImportJob response is as follows:
{ “datasetImportJobs”: [ { “creationDateTime”: number, “datasetImportJobArn”: “string”, “failureReason”: “string”, “jobName”: “string”, “lastUpdatedDateTime”: number, “status”: “string”, “importMode”: ” {INCREMENTAL, FULL} } ], “nextToken”: “string” }
The Boto3 API is list_dataset_import_jobs, and the AWS CLI command is list-dataset-import-jobs.
The following code shows how to create a dataset import job for incremental bulk import using the SDK for Python (Boto3):
import boto3 personalize = boto3.client(‘personalize’) response = personalize.create_dataset_import_job( jobName = ‘YourImportJob’, datasetArn = ‘arn:aws:personalize:us-east 1:111111111111:dataset/AmazonPersonalizeExample/INTERACTIONS’, dataSource = {‘dataLocation’:’s3://bucket/file.csv’}, roleArn = ‘role_arn’, importMode = ‘INCREMENTAL’ ) dsij_arn = response[‘datasetImportJobArn’] print (‘Dataset Import Job arn: ‘ + dsij_arn) description = personalize.describe_dataset_import_job( datasetImportJobArn = dsij_arn)[‘datasetImportJob’] print(‘Name: ‘ + description[‘jobName’]) print(‘ARN: ‘ + description[‘datasetImportJobArn’]) print(‘Status: ‘ + description[‘status’])
In this post, we described how you can use this new feature in Amazon Personalize to perform incremental updates to a dataset with bulk import, keeping the data fresh and improving the relevance of Amazon Personalize recommendations. If you have delayed access to your data, incremental bulk import allows you to import your data more easily by appending it to your existing datasets.
Try out this new feature by accessing Amazon Personalize now.
Neelam Koshiya is an enterprise solution architect at AWS. Her current focus is to help enterprise customers with their cloud adoption journey for strategic business outcomes. In her spare time, she enjoys reading and being outdoors.
James Jory is a Principal Solutions Architect in Applied AI with AWS. He has a special interest in personalization and recommender systems and a background in ecommerce, marketing technology, and customer data analytics. In his spare time, he enjoys camping and auto racing simulations.
Daniel Foley is a Senior Product Manager for Amazon Personalize. He is focused on building applications that leverage artificial intelligence to solve our customers’ largest challenges. Outside of work, Dan is an avid skier and hiker.
Alex Berlingeri is a Software Development Engineer with Amazon Personalize working on a machine learning powered recommendations service. In his free time he enjoys reading, working out and watching soccer.