Advertising agencies can use generative AI and text-to-image foundation models to create innovative ad creatives and content. In this post, we demonstrate how you can generate new images from existing base images using Amazon SageMaker, a fully managed service to build, train, and deploy ML models for at scale. With this solution, businesses large and small can develop new ad creatives much faster and at lower cost than ever before. This allows you to develop new custom ad creative content for your business at low cost and at a rapid pace.
Consider the following scenario: a global automotive company needs new marketing material generated for their new car design being released and hires a creative agency that is known for providing advertising solutions for clients with strong brand equity. The car manufacturer is looking for low-cost ad creatives that display the model in diverse locations, colors, views, and perspectives while maintaining the brand identity of the car manufacturer. With the power of state-of-the-art techniques, the creative agency can support their customer by using generative AI models within their secure AWS environment.
The solution is developed with Generative AI and Text-to-Image models in Amazon SageMaker. SageMaker is a fully managed machine learning (ML) service that that makes it straightforward to build, train, and deploy ML models for any use case with fully managed infrastructure, tools, and workflows. Stable Diffusion is a text-to-image foundation model from Stability AI that powers the image generation process. Diffusers are pre-trained models that use Stable Diffusion to use an existing image to generate new images based on a prompt. Combining Stable Diffusion with Diffusers like ControlNet can take existing brand-specific content and develop stunning versions of it. Key benefits of developing the solution within AWS along with Amazon SageMaker are:
For this post, we use the following GitHub sample, which uses Amazon SageMaker Studio with foundation models (Stable Diffusion), prompts, computer vision techniques, and a SageMaker endpoint to generate new images from existing images. The following diagram illustrates the solution architecture.
The workflow contains the following steps:
To deploy the model to SageMaker endpoints, we must create a compressed file for each individual technique model artifact along with the Stable Diffusion weights, inference script, and NVIDIA Triton config file.
In the following code, we download the model weights for the different ControlNet techniques and Stable Diffusion 1.5 to the local directory as tar.gz files:
if ids ==”runwayml/stable-diffusion-v1-5″: snapshot_download(ids, local_dir=str(model_tar_dir), local_dir_use_symlinks=False,ignore_patterns=unwanted_files_sd) elif ids ==”lllyasviel/sd-controlnet-canny”: snapshot_download(ids, local_dir=str(model_tar_dir), local_dir_use_symlinks=False)
To create the model pipeline, we define an inference.py script that SageMaker real-time endpoints will use to load and host the Stable Diffusion and ControlNet tar.gz files. The following is a snippet from inference.py that shows how the models are loaded and how the Canny technique is called:
controlnet = ControlNetModel.from_pretrained( f”{model_dir}/{control_net}”, torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32) pipe = StableDiffusionControlNetPipeline.from_pretrained( f”{model_dir}/sd-v1-5″, controlnet=controlnet, torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32) # Define technique function for Canny image = cv2.Canny(image, low_threshold, high_threshold)
We deploy the SageMaker endpoint with the required instance size (GPU type) from the model URI:
huggingface_model = HuggingFaceModel( model_data=model_s3_uri, # path to your trained sagemaker model role=role, # iam role with permissions to create an Endpoint py_version=”py39″, # python version of the DLC image_uri=image_uri, ) # Deploy model as SageMaker Endpoint predictor = huggingface_model.deploy( initial_instance_count=1, instance_type=”ml.p3.2xlarge”, )
Now that the endpoint is deployed on SageMaker endpoints, we can pass in our prompts and the original image we want to use as our baseline.
To define the prompt, we create a positive prompt, p_p, for what we’re looking for in the new image, and the negative prompt, n_p, for what is to be avoided:
p_p=”metal orange colored car, complete car, colour photo, outdoors in a pleasant landscape, realistic, high quality” n_p=”cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, blurry, bad anatomy, bad proportions”
Finally, we invoke our endpoint with the prompt and source image to generate our new image:
request={“prompt”:p_p, “negative_prompt”:n_p, “image_uri”:’s3://
In this section, we compare the different ControlNet techniques and their effect on the resulting image. We use the following original image to generate new content using Stable Diffusion with Control-net in Amazon SageMaker.
The following table shows how the technique output dictates what, from the original image, to focus on.
Technique Name | Technique Type | Technique Output | Prompt | Stable Diffusion with ControlNet |
canny | A monochrome image with white edges on a black background. | metal orange colored car, complete car, colour photo, outdoors in a pleasant landscape, realistic, high quality | ||
depth | A grayscale image with black representing deep areas and white representing shallow areas. | metal red colored car, complete car, colour photo, outdoors in pleasant landscape on beach, realistic, high quality | ||
hed | A monochrome image with white soft edges on a black background. | metal white colored car, complete car, colour photo, in a city, at night, realistic, high quality | ||
scribble | A hand-drawn monochrome image with white outlines on a black background. | metal blue colored car, similar to original car, complete car, colour photo, outdoors, breath-taking view, realistic, high quality, different viewpoint |
After you generate new ad creatives with generative AI, clean up any resources that won’t be used. Delete the data in Amazon S3 and stop any SageMaker Studio notebook instances to not incur any further charges. If you used SageMaker JumpStart to deploy Stable Diffusion as a SageMaker real-time endpoint, delete the endpoint either through the SageMaker console or SageMaker Studio.
In this post, we used foundation models on SageMaker to create new content images from existing images stored in Amazon S3. With these techniques, marketing, advertisement, and other creative agencies can use generative AI tools to augment their ad creatives process. To dive deeper into the solution and code shown in this demo, check out the GitHub repo.
Also, refer to Amazon Bedrock for use cases on generative AI, foundation models, and text-to-image models.
Sovik Kumar Nath is an AI/ML solution architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. Sovik has published articles and holds a patent in ML model monitoring. He has double masters degrees from the University of South Florida, University of Fribourg, Switzerland, and a bachelors degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, taking ferry rides, and watching movies.
Sandeep Verma is a Sr. Prototyping Architect with AWS. He enjoys diving deep into customer challenges and building prototypes for customers to accelerate innovation. He has a background in AI/ML, founder of New Knowledge, and generally passionate about tech. In his free time, he loves traveling and skiing with his family.
Uchenna Egbe is an Associate Solutions Architect at AWS. He spends his free time researching about herbs, teas, superfoods, and how to incorporate them into his daily diet.
Mani Khanuja is an Artificial Intelligence and Machine Learning Specialist SA at Amazon Web Services (AWS). She helps customers using machine learning to solve their business challenges using the AWS. She spends most of her time diving deep and teaching customers on AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. She is passionate about ML at edge, therefore, she has created her own lab with self-driving kit and prototype manufacturing production line, where she spend lot of her free time.