[]When we analyze images, we may want to incorporate other metadata related to the image. Examples include when and where the image was taken, who took the image, as well as what is featured in the image. One way to represent this metadata is to use a JSON format, which is well-suited for a document database such as Amazon DocumentDB (with MongoDB compatibility). Example use cases include:
[]In this post, we focus on the first use case of enabling image search and exploration of a generic photo collection. We look at the JSON output of image analysis generated from Amazon Rekognition, which we ingest into Amazon DocumentDB, and then explore using Amazon SageMaker.
[]SageMaker is a fully managed service that provides every developer and data scientist the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality models.
[]Amazon Rekognition makes it easy to add image and video analysis to your applications. You just provide an image or video to the Amazon Rekognition API, and the service can identify objects, people, text, scenes, and activities. Amazon Rekognition has a simple, easy-to-use API that can quickly analyze any image or video file that’s stored in Amazon Simple Storage Service (Amazon S3). It requires no ML expertise to use.
[]Amazon DocumentDB is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. You can use the same MongoDB 3.6 or 4.0 application code, drivers, and tools to run, manage, and scale workloads on Amazon DocumentDB without having to worry about managing the underlying infrastructure. As a document database, Amazon DocumentDB makes it easy to store, query, and index JSON data.
[]In this post, we explore images taken from Unsplash. In the source code, we have kept image file names in their original format,
[]Each image is analyzed using Amazon Rekognition. The output from the Amazon Rekognition API is a nested JSON object, which is a format well-suited for Amazon DocumentDB. For example, we can analyze the following image, Gardens by the Bay, Singapore, by Coleen Rivas.
[]Amazon Rekognition generates the following JSON output:
{‘Labels’: [{‘Name’: ‘Outdoors’, ‘Confidence’: 98.58585357666016, ‘Instances’: [], ‘Parents’: []}, {‘Name’: ‘Garden’, ‘Confidence’: 96.23029327392578, ‘Instances’: [], ‘Parents’: [{‘Name’: ‘Outdoors’}]}, {‘Name’: ‘Arbour’, ‘Confidence’: 93.65332794189453, ‘Instances’: [], ‘Parents’: [{‘Name’: ‘Garden’}, {‘Name’: ‘Outdoors’}]}, {‘Name’: ‘Person’, ‘Confidence’: 93.00440979003906, ‘Instances’: [{‘BoundingBox’: {‘Width’: 0.016103893518447876, ‘Height’: 0.03213529288768768, ‘Left’: 0.6525371670722961, ‘Top’: 0.9264869689941406}, ‘Confidence’: 93.00440979003906}, {‘BoundingBox’: {‘Width’: 0.010800352320075035, ‘Height’: 0.020640190690755844, ‘Left’: 0.781416118144989, ‘Top’: 0.8592491149902344}, ‘Confidence’: 78.98234558105469}, {‘BoundingBox’: {‘Width’: 0.017044249922037125, ‘Height’: 0.02785704843699932, ‘Left’: 0.7455113530158997, ‘Top’: 0.8547402620315552}, ‘Confidence’: 66.65809631347656}], ‘Parents’: []}, {‘Name’: ‘Human’, ‘Confidence’: 93.00440979003906, ‘Instances’: [], ‘Parents’: []}, {‘Name’: ‘Amusement Park’, ‘Confidence’: 82.81632232666016, ‘Instances’: [], ‘Parents’: []}, {‘Name’: ‘Theme Park’, ‘Confidence’: 76.72222900390625, ‘Instances’: [], ‘Parents’: [{‘Name’: ‘Amusement Park’}]}, {‘Name’: ‘Plant’, ‘Confidence’: 73.67972564697266, ‘Instances’: [], ‘Parents’: []}, {‘Name’: ‘Potted Plant’, ‘Confidence’: 68.09540557861328, ‘Instances’: [], ‘Parents’: [{‘Name’: ‘Plant’}, {‘Name’: ‘Vase’}, {‘Name’: ‘Jar’}, {‘Name’: ‘Pottery’}]}, {‘Name’: ‘Pottery’, ‘Confidence’: 68.09540557861328, ‘Instances’: [], ‘Parents’: []}, {‘Name’: ‘Jar’, ‘Confidence’: 68.09540557861328, ‘Instances’: [], ‘Parents’: []}, {‘Name’: ‘Vase’, ‘Confidence’: 68.09540557861328, ‘Instances’: [], ‘Parents’: [{‘Name’: ‘Jar’}, {‘Name’: ‘Pottery’}]}, {‘Name’: ‘Ferris Wheel’, ‘Confidence’: 64.03276824951172, ‘Instances’: [], ‘Parents’: [{‘Name’: ‘Amusement Park’}]}, {‘Name’: ‘Nature’, ‘Confidence’: 62.96412658691406, ‘Instances’: [], ‘Parents’: []}, {‘Name’: ‘Planter’, ‘Confidence’: 58.99357604980469, ‘Instances’: [], ‘Parents’: [{‘Name’: ‘Potted Plant’}, {‘Name’: ‘Plant’}, {‘Name’: ‘Vase’}, {‘Name’: ‘Jar’}, {‘Name’: ‘Pottery’}]}, {‘Name’: ‘Herbs’, ‘Confidence’: 57.66265869140625, ‘Instances’: [], ‘Parents’: [{‘Name’: ‘Planter’}, {‘Name’: ‘Potted Plant’}, {‘Name’: ‘Plant’}, {‘Name’: ‘Vase’}, {‘Name’: ‘Jar’}, {‘Name’: ‘Pottery’}]}, {‘Name’: ‘Park’, ‘Confidence’: 51.91413879394531, ‘Instances’: [], ‘Parents’: [{‘Name’: ‘Lawn’}, {‘Name’: ‘Outdoors’}, {‘Name’: ‘Grass’}, {‘Name’: ‘Plant’}]}, {‘Name’: ‘Grass’, ‘Confidence’: 51.91413879394531, ‘Instances’: [], ‘Parents’: [{‘Name’: ‘Plant’}]}, {‘Name’: ‘Lawn’, ‘Confidence’: 51.91413879394531, ‘Instances’: [], ‘Parents’: [{‘Name’: ‘Grass’}, {‘Name’: ‘Plant’}]}], ‘LabelModelVersion’: ‘2.0’, ‘ResponseMetadata’: {‘RequestId’: ‘8f0146c9-ff5e-4b7b-9469-346aa46b125f’, ‘HTTPStatusCode’: 200, ‘HTTPHeaders’: {‘content-type’: ‘application/x-amz-json-1.1’, ‘date’: ‘Thu, 04 Mar 2021 05:54:59 GMT’, ‘x-amzn-requestid’: ‘8f0146c9-ff5e-4b7b-9469-346aa46b125f’, ‘content-length’: ‘2511’, ‘connection’: ‘keep-alive’}, ‘RetryAttempts’: 0}} []This output contains the confidence score of finding a variety of types of objects, called labels, in the image.
[]Those types of objects include Garden, Person, and even Ferris Wheel, among others. You can download the list of supported labels from our documentation page. The output from Amazon Rekognition includes all detected labels over a specified confidence level. In addition to the confidence of the label, it outputs an array of instances in the case that multiple objects of that label have been identified. For example, in the preceding image, Amazon Rekognition identified three Person objects, along with the location in the picture for each identified object.
[]Amazon DocumentDB stores each JSON output as a document. Multiple documents are stored in a collection, and multiple collections are stored in a database. Borrowing terminology from relational databases, documents are analogous to rows, and collections are analogous to tables. The following table summarizes these terms.
Document Database Concepts | SQL Concepts |
Document | Row |
Collection | Table |
Database | Database |
Field | Column |
[]We now implement the following tasks:
[]To conduct these tasks, we use a SageMaker notebook, which is a Jupyter notebook app provided by a SageMaker notebook instance. Although you can use SageMaker notebooks to train and deploy ML models, they’re also useful for code commentary and data exploration, the latter being the focus of our post.
[]We have prepared an AWS CloudFormation template to create the required AWS resources for this post in our GitHub repository. For instructions on creating a CloudFormation stack, see the video Simplify your Infrastructure Management using AWS CloudFormation.
[]The CloudFormation stack provisions the following:
sudo -u ec2-user -i <<'EOF' source /home/ec2-user/anaconda3/bin/activate python3 pip install --upgrade pymongo pip install --upgrade ipyplot source /home/ec2-user/anaconda3/bin/deactivate cd /home/ec2-user/SageMaker wget https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem wget https://github.com/aws-samples/documentdb-sagemaker-example/raw/main/rekognition/script.ipynb mkdir pics cd pics wget https://github.com/aws-samples/documentdb-sagemaker-example/raw/main/rekognition/pics.zip unzip pics.zip rm pics.zip EOF []Prior to creating the CloudFormation stack, you need to create a bucket in Amazon S3 to store the image files for analysis. For instructions, see Creating a bucket.
[]When creating the CloudFormation stack, you need to specify the following:
[]It should take about 15 minutes to create the CloudFormation stack. The following diagram shows the resource architecture.
[]This CloudFormation template incurs costs, and you should consult the relevant pricing pages before launching it.
[]All the subsequent code in this tutorial is in the Jupyter notebook in the SageMaker instance created in your CloudFormation stack.
stack_name = “docdb-rekognition” # name of CloudFormation stack []The stack_name refers to the name you specified for your CloudFormation stack upon its creation.
def get_secret(stack_name): # Create a Secrets Manager client session = boto3.session.Session() client = session.client( service_name=’secretsmanager’, region_name=session.region_name ) secret_name = f'{stack_name}-DocDBSecret’ get_secret_value_response = client.get_secret_value(SecretId=secret_name) secret = get_secret_value_response[‘SecretString’] return json.loads(secret)
secret = get_secret(secret_name) db_username = secret[‘username’] db_password = secret[‘password’] db_port = secret[‘port’] db_host = secret[‘host’]
uri_str = f”mongodb://{db_username}:{db_password}@{db_host}:{db_port}/?ssl=true&ssl_ca_certs=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false” client = MongoClient(uri_str)
client[“admin”].command(“ismaster”)[“hosts”]
db_name = “db” # name the database coll_name = “coll” # name the collection db = client[db_name] # create a database object coll = db[coll_name] # create a collection object
[]We use the ipyplot library to preview the images that were downloaded onto our SageMaker instance using the following code:
# Get image paths pic_local_paths = glob.glob(f”{local_prefix}/*.jpg”) pic_local_paths = sorted(pic_local_paths) # Preview images ipyplot.plot_images( images=pic_local_paths, max_images=10, img_width=180, )
[]After you verify the images, upload the images to your S3 bucket for Amazon Rekognition to access and analyze:
for pic_local_path in pic_local_paths: pic_filename = os.path.basename(pic_local_path) boto3.Session().resource(‘s3’).Bucket(s3_bucket).Object(os.path.join(s3_prefix, pic_filename)).upload_file(pic_local_path) []Then we get the Amazon S3 keys for the images, to tell the Amazon Rekognition API where the images are for analysis:
fs = s3fs.S3FileSystem() pic_keylist = fs.ls(f’s3://{s3_bucket}/{s3_prefix}/’)[1:] pic_keylist = [key.split(‘docdb-blog/’)[1] for key in pic_keylist]
[]Next, we loop over every image, analyzing each one using the Amazon Rekognition API, and ingesting the analysis output into Amazon DocumentDB. The results of each image analysis are stored as a document, and all these documents are stored within a collection. Apart from ingesting the analysis results from Amazon Rekognition, we also store each image’s Amazon S3 key, which is used as a unique identifier. See the following code:
for pic_key in pic_keylist: # Analyze image with Rekognition pic_result = rekognition.detect_labels( Image={ ‘S3Object’:{ ‘Bucket’: s3_bucket, ‘Name’: pic_key }}, MinConfidence=50, MaxLabels=100) # Extract S3 key and image labels pic_label = pic_result[‘Labels’] doc = { “img”: pic_key.split(‘/’)[-1], “Labels”: pic_result[‘Labels’] } # Ingest data into DocumentDB coll.insert_one(doc)
[]We can now explore the image labels using Amazon DocumentDB queries.
[]As is a common first step in data science, we want to explore the data to get some general descriptive statistics. We can use database operations to calculate some of these basic descriptive statistics.
[]To get a count of the number of images we ingested, we use the count_documents() command:
coll.count_documents({}) > 15 []The count_documents() command gets the number of documents in a collection. The output from Amazon Rekognition for each image is recorded as a document, and coll is the name of the collection.
[]Across the 15 images, Amazon Rekognition detected multiple entities. To see the frequency of each entity label, we query the database using the aggregate command. The following query counts the number of times each label appears with a confidence score greater than 90% and then sorts the results in descending order of counts:
result = pd.DataFrame(coll.aggregate([ {“$unwind”: “$Labels”}, {“$match”: {“Labels.Confidence”: {“$gte”: 90.0}}}, {“$group”: {“_id”: “$Labels.Name”, “count”: {“$sum”: 1}}}, {“$sort”: {“count”: -1} } ])) []We wrap the output of the preceding query in pd.DataFrame() to convert the results to a DataFrame. This allows us to generate visualizations such as the following.
[]Based on the plot, Person and Human labels were the most common, with six counts each.
[]Besides labels, Amazon Rekognition also outputs the confidence level with which those labels were applied. The following query identifies the images with a Book label applied with 90% or more confidence:
# Query images with a ‘Book’ label of 90% or more confidence coll.find( {“Labels”: {“$elemMatch”: {“Name”: “Book”, “Confidence”: {“$gte”: 90.0}}}}, {“_id”: 0, “img”: 1} ) []
[]We can also search for images containing multiple labels. The following query identifies images that contain the Book and Person labels, both with the minimum confidence level of 90%:
# Query images with a ‘Book’ label of 90% or more confidence and a ‘Person’ label of 90% or more confidence coll.find( {“$and”: [ {“Labels”: {“$elemMatch”: {“Name”: “Book”, “Confidence”: {“$gte”: 90.0}}}}, {“Labels”: {“$elemMatch”: {“Name”: “Person”, “Confidence”: {“$gte”: 90.0}}}}] }, {“_id”: 0, “img”: 1} ) []
[]We can use the explain() method in the MongoDB API to determine what query plan the Amazon DocumentDB query planner used to conduct these queries:
coll.find( {“Labels”: {“$elemMatch”: {“Name”: “Book”, “Confidence”: {“$gte”: 90.0}}}}, {“_id”: 0, “img”: 1} ).explain() > {‘queryPlanner’: {‘plannerVersion’: 1, ‘namespace’: ‘db.coll’, ‘winningPlan’: {‘stage’: ‘COLLSCAN’}}, ‘serverInfo’: {‘host’: ‘documentdbinstancethree-haw55aziqvyy’, ‘port’: 27017, ‘version’: ‘3.6.0’}, ‘ok’: 1.0} []The winningPlan field shows the plan that the Amazon DocumentDB query planner used to run this query. It chose a COLLSCAN, which is a full collection scan, namely to scan each document and apply the predicate on each one.
[]Similarly, we can see the Amazon DocumentDB query planner also chose a full collection scan for the second query:
coll.find( {“$and”: [ {“Labels”: {“$elemMatch”: {“Name”: “Book”, “Confidence”: {“$gte”: 90.0}}}}, {“Labels”: {“$elemMatch”: {“Name”: “Person”, “Confidence”: {“$gte”: 90.0}}}}] }, {“_id”: 0, “img”: 1} ).explain() > {‘queryPlanner’: {‘plannerVersion’: 1, ‘namespace’: ‘db.coll’, ‘winningPlan’: {‘stage’: ‘COLLSCAN’}}, ‘serverInfo’: {‘host’: ‘documentdbinstancethree-haw55aziqvyy’, ‘port’: 27017, ‘version’: ‘3.6.0’}, ‘ok’: 1.0}
[]As with many database management systems, we can make queries perform better in Amazon DocumentDB by creating an index on commonly queried fields. In this case, we create an index on the label name and label confidence, because these are two fields we’re using in our predicate. After we create the index, we can modify our queries to use it.
[]To create the index, run the following:
coll.create_index([ (“Labels.Name”, ASCENDING), (“Labels.Confidence”, ASCENDING)], name=”idx_labels”) []With the index created, we can use the following code block to implement the query to identify images containing books. We add some extra predicates that only find records that have the label Book and a label with a confidence level greater than or equal to 90.0, though not necessarily for the Book label. The query planner uses the index to filter the documents based on these first predicates and then apply the predicate asking for the Book label to have a confidence level greater than or equal to 90.0.
# Query for ‘Book’ label with 90% or more confidence query_book = coll.find({“$and”: [ {“Labels.Name”: “Book”}, {“Labels.Confidence”: {“$gte”: 90.0}}, {“Labels”: {“$elemMatch”: {“Name”: “Book”, “Confidence”: {“$gte”: 90.0}}}} ]}, {“_id”: 0, “img”: 1} ) []Similarly, we can modify the query looking for both Book and Person labels as follows:
# Query for ‘Book’ label with 90% or more confidence and # ‘Person’ label with 90% or more confidence query_book_person = coll.find( {“$and”: [ {“Labels.Name”: “Book”}, {“Labels.Confidence”: {“$gte”: 90.0}}, {“Labels.Name”: “Person”}, {“Labels.Confidence”: {“$gte”: 90.0}}, ## unnecessary, but adding for clarity {“Labels”: {“$elemMatch”: {“Name”: “Book”, “Confidence”: {“$gte”: 90.0}}}}, {“Labels”: {“$elemMatch”: {“Name”: “Person”, “Confidence”: {“$gte”: 90.0}}}}] }, {“_id”: 0, “img”: 1} ) []To validate that the Amazon DocumentDB query planner is, in fact, using the index we created, we can again use the explain() method. When we add this method to the query, we can observe the plan that Amazon DocumentDB chose, namely the winningPlan field. It used an IXSCAN stage, indicating that it used the index for this query. This is more efficient than scanning all documents in the collection and applying the predicates to each one.
query_book.explain() > {‘queryPlanner’: {‘plannerVersion’: 1, ‘namespace’: ‘db.coll’, ‘winningPlan’: {‘stage’: ‘FETCH’, ‘inputStage’: {‘stage’: ‘IXSCAN’, ‘indexName’: ‘idx_labels’}}}, ‘serverInfo’: {‘host’: ‘documentdbinstanceone-ba0lmvhl0dml’, ‘port’: 27017, ‘version’: ‘3.6.0’}, ‘ok’: 1.0} query_book_person.explain() > {‘queryPlanner’: {‘plannerVersion’: 1, ‘namespace’: ‘db.coll’, ‘winningPlan’: {‘stage’: ‘FETCH’, ‘inputStage’: {‘stage’: ‘IXSCAN’, ‘indexName’: ‘idx_labels’}}}, ‘serverInfo’: {‘host’: ‘documentdbinstancetwo-iulkk0vmfiln’, ‘port’: 27017, ‘version’: ‘3.6.0’}, ‘ok’: 1.0}
[]Besides identifying images with a particular label, you can also specify the number of detected instances of that label. To find all images with at least four instances of Person, each with 90% or more confidence, use the following query:
coll.find( {“Labels”: {“$elemMatch”: {“Name”: “Person”, “Confidence”: {“$gte”: 90.0}, “Instances.3”: {“$exists”: True}}}}, {“_id”: 0, “img”: 1} ) []The query checks if the fourth instance, Instances.3, exists, with instance count starting from zero.
[]You can also set a maximum limit for the number of instances. The following query selects all images with at least two but fewer than four instances of a Person label with 90% or more confidence:
coll.find( {“Labels”: {“$elemMatch”: {“Name”: “Person”, “Confidence”: {“$gte”: 90.0}, “Instances.1”: {“$exists”: True}, “Instances.3”: {“$exists”: False}}}}, {“_id”: 0, “img”: 1} ) []
[]Looking closer, we can see that the first image actually contains many people. Possibly due to how small they appear, fewer than four were detected.
[]To perform the preceding analysis with your own album, you can replace the sample pictures in Amazon S3 with your own pictures.
[]To save cost, delete the CloudFormation stack you created. This removes all the resources you provisioned using the CloudFormation template, including the Amazon VPC, Amazon DocumentDB cluster, and SageMaker notebook instance. For instructions, see Deleting a stack on the AWS CloudFormation console. You should also delete the images in the S3 bucket that you created, along with the images it contains.
[]In this post, we analyzed images using Amazon Rekognition, ingested the output into Amazon DocumentDB, and explored the results using queries implemented in SageMaker. For another example of how to use SageMaker to analyze and store data in Amazon DocumentDB for an ML use case, see Analyzing data stored in Amazon DocumentDB (with MongoDB compatibility) using Amazon SageMaker.
[]Amazon DocumentDB provides you with several capabilities that help you back up and restore your data based on your use case. For more information, see Best Practices for Amazon DocumentDB. If you’re new to Amazon DocumentDB, see Get Started with Amazon DocumentDB. If you’re planning to migrate to Amazon DocumentDB, see Migrating to Amazon DocumentDB.
[]Annalyn Ng is a Senior Solutions Architect based in Singapore, where she designs and builds cloud solutions for public sector agencies. Annalyn graduated from the University of Cambridge, and blogs about machine learning at algobeans.com. Her book, Numsense! Data Science for the Layman, has been translated into multiple languages and is used in top universities as reference text.
[] Brian Hess is a Senior Analytics Platform Specialist at AWS. He has been in the data and analytics space for over 20 years and has extensive experience in roles including solutions architect, product management, and director of advanced analytics.