In a previous post, we talked about analyzing and tagging assets stored in Veeva Vault PromoMats using Amazon AI services and the Veeva Vault Platform’s APIs. In this post, we explore how to use Amazon AppFlow, a fully managed integration service that enables you to securely transfer data from software as a service (SaaS) applications like Veeva Vault to AWS. The Amazon AppFlow Veeva connector allows you to connect your AWS environment to the Veeva ecosystem quickly, reliably, and cost-effectively in order to analyze the rich content stored in Veeva Vault at scale.
The Amazon AppFlow Veeva connector is the first Amazon AppFlow connector supporting automatic transfer of Veeva documents. It allows you to choose between the latest version (the Steady State version in Veeva terms) and all versions of documents. Moreover, you can import document metadata.
With a few clicks, you can easily set up a managed connection and choose the Veeva Vault documents and metadata to import. You can further adjust the import behavior by mapping source fields to destination fields. You can also add filters based on document type and subtype, classification, products, country, site, and more. Lastly, you can add validation and manage on-demand and scheduled flow triggers.
You can use the Amazon AppFlow Veeva connector for various use cases, ranging from Veeva Vault PromoMats to other Veeva Vault solutions such as QualityDocs, eTMF, or Regulatory Information Management (RIM). The following are some of the use cases where you can use the connector:
This post focuses on how you can use Amazon AI services in combination with Amazon AppFlow to analyze content stored in Veeva Vault PromoMats, automatically extract tag information, and ultimately feed this information back into the Veeva Vault system. The post discusses the overall architecture, the steps to deploy a solution and dashboard, and a use case of asset metadata tagging. For more information about the proof of concept code base for this use case, see the GitHub repository.
The following diagram illustrates the updated solution architecture.
Previously, in order to import assets from Veeva Vault, you had to write your own custom code logic using the Veeva Vault APIs to poll for changes and import the data into Amazon Simple Storage Service (Amazon S3). This could be a manual, time-consuming process, in which you had to account for API limitations, failures, and retries, as well as scalability to accommodate an unlimited amount of assets. The updated solution uses Amazon AppFlow to abstract away the complexity of maintaining a custom Veeva to Amazon S3 data import pipeline.
As mentioned in the introduction, Amazon AppFlow is an easy-to-use, no-code self-service tool that uses point-and-click configurations to move data easily and securely between various SaaS applications and AWS services. AppFlow allows you to pull data (objects and documents) from supported sources and write that data to various supported destinations. The source or destination could be a SaaS application or an AWS service such as Amazon S3, Amazon Redshift, or Lookout for Metrics. In addition to the no-code interface, Amazon AppFlow supports configuration via API, AWS CLI, and AWS CloudFormation interfaces.
A flow in Amazon AppFlow describes how data is to be moved, including source details, destination details, flow trigger conditions (on demand, on event, or scheduled), and data processing tasks such as checkpointing, field validation, or masking. When triggered, Amazon AppFlow runs a flow that fetches the source data (generally through the source application’s public APIs), runs data processing tasks, and transfers processed data to the destination.
In this example, you deploy a preconfigured flow using a CloudFormation template. The following screenshot shows the preconfigured veeva-aws-connector flow that is created automatically by the solution template on the Amazon AppFlow console.
The flow uses Veeva as a source and is configured to import Veeva Vault component objects. Both the metadata and source files are necessary in order to keep track of the assets that have been processed and push tags back on the correct corresponding asset in the source system. In this situation, only the latest version is being imported, and renditions aren’t included.
The flow’s destination also needs to be configured. In the following screenshot, we define a file format and folder structure for the S3 bucket that was created as part of the CloudFormation template.
Finally, the flow is triggered on demand for demonstration purposes. This can be modified so that the flow runs on a schedule, with a maximum granularity of 1 minute. When triggered on a schedule, the transfer mode changes automatically from a full transfer to an incremental transfer mode. You specify a source timestamp field for tracking the changes. For the tagging use case, we have found that the Last Modified Date setting is the most suitable.
Amazon AppFlow is then integrated with Amazon EventBridge to publish events whenever a flow run is complete.
For better resiliency, the AVAIAppFlowListener AWS Lambda function is wired into EventBridge. When an Amazon AppFlow event is triggered, it verifies that the specific flow run has completed successfully, reads the metadata information of all imported assets from that specific flow run, and pushes individual document metadata into an Amazon Simple Queue Service (Amazon SQS) queue. Using Amazon SQS provides a loose coupling between the producer and processor sections of the architecture and also allows you to deploy changes to the processor section without stopping the incoming updates.
A second poller function (AVAIQueuePoller) reads the SQS queue at frequent intervals (every minute) and processes the incoming assets. For an even better reaction time from the Lambda function, you can replace the CloudWatch rule by configuring Amazon SQS as a trigger for the function.
Depending on the incoming message type, the solution uses various AWS AI services to derive insights from your data. Some examples include:
A DynamoDB table stores all the processed data. The solution uses DynamoDB Streams and Lambda triggers (AVAIPopulateES) to populate data into an OpenSearch Kibana cluster. The AVAIPopulateES function runs for every update, insert, and delete operation that happens in the DynamoDB table, and inserts one corresponding record in the OpenSearch index. You can visualize these records using Kibana.
To close the feedback loop, the AVAICustomFieldPopulator Lambda function has been created. It’s triggered by events in the DynamoDB stream of the metadata DynamoDB table. For every DocumentID in the DynamoDB records, the function tries to upsert tag information into a predefined custom field property of the asset with the corresponding ID in Veeva, using the Veeva API. To avoid inserting noise into the custom field, the Lambda function filters any tags that have been identified with a confidence score lower than 0.9. Failed requests are forwarded to a dead-letter queue (DLQ) for manual inspection or automatic retry.
This solution offers a serverless, pay-as-you-go approach to process, tag, and enable comprehensive searches on your digital assets. Additionally, each managed component has high availability built in by automatic deployment across multiple Availability Zones. For Amazon OpenSearch Service (successor to Amazon Elasticsearch Service), you can choose the three-AZ option to provide better availability for your domains.
For this walkthrough, you should have the following prerequisites:
You use a CloudFormation stack to deploy the solution. The stack creates all the necessary resources, including:
To get started, complete the following steps:
This is the ARN of the IAM role that the AVAIPopulateES function assumes. You use it later to configure access to the Amazon OpenSearch Service domain.
This section walks you through securing your Amazon OpenSearch Service cluster and installing a local proxy to access Kibana securely.
This creates an access policy that grants access to the AVAIPopulateES function and Kibana access from your IP address. For more information about scoping down your access policy, see Configuring access policies.
In 5–8 minutes, the data should start flowing in and entities are created in the Amazon OpenSearch Service cluster. You can now visualize these entities in Kibana. To do this, you use an open-source proxy called aws-es-kibana. To install the proxy on your computer, enter the following code:
aws-es-kibana your_OpenSearch_domain_endpoint
You can find the domain endpoint on the Outputs tab of the CloudFormation stack under ESDomainEndPoint. You should see the following output:
Please refer to the original blogpost.
To avoid incurring future charges, delete the resources when not in use. You can easily delete all resources by deleting the associated CloudFormation stack. Note that you need to empty the created S3 buckets of content in order for the deletion of the stack to be successful.
In this post, we demonstrated how you can use Amazon AI services in combination with Amazon AppFlow to extend the functionality of Veeva Vault PromoMats and extract valuable information quickly and easily. The built-in loop back mechanism allows you to update the tags back into Veeva Vault and enable auto-tagging of your assets. This makes it easier for your team to find and locate assets quickly.
Although no ML output is perfect, it can come very close to human performance and help offset a substantial portion of your team’s efforts. You can use this additional capacity towards value-added tasks, while dedicating a small capacity to check the output of the ML solution. This solution can also help optimize costs, achieve tagging consistency, and enable quick discovery of existing assets.
Finally, you can maintain ownership of your data and choose which AWS services can process, store, and host the content. AWS doesn’t access or use your content for any purpose without your consent, and never uses customer data to derive information for marketing or advertising. For more information, see Data Privacy FAQ.
You can also extend the functionality of this solution further with additional enhancements. For example, in addition to the AI and ML services in this post, you can easily add any of your custom ML models built using Amazon SageMaker to the architecture.
If you’re interested in exploring additional use cases for Veeva and AWS, please reach out to your AWS account team.
Veeva Systems has reviewed and approved this content. For additional Veeva Vault-related questions, please contact Veeva support.
Mayank Thakkar is Head of AI/ML Business Development, Global Healthcare and Life Sciences at AWS. He has more than 18 years of experience in varied industries like healthcare, life sciences, insurance, and retail, specializing in building serverless, artificial intelligence, and machine learning-based solutions to solve real-world industry problems. At AWS, he works closely with big pharma companies around the world to build cutting-edge solutions and help them along their cloud journey. Apart from work, Mayank, along with his wife, is busy raising two energetic and mischievous boys, Aaryan (6) and Kiaan (4), while trying to keep the house from burning down or getting flooded!
Anamaria Todor is a Senior Solutions Architect based in Copenhagen, Denmark. She saw her first computer when she was 4 years old and never let computer science and engineering go ever since. She has worked in various technical roles from full-stack developer, to data engineer, technical lead, and CTO at various Danish companies. Anamaria has a bachelor’s degree in Applied Engineering and Computer Science, a master’s degree in Computer Science, and over 10 years of hands-on AWS experience. At AWS, she works closely with healthcare and life sciences companies in the enterprise segment. When she’s not working or playing video games, she’s coaching girls and female professionals in understanding and finding their path through technology.