Implementing an effective fraud prevention system is one of the top priorities for businesses that operate online web or mobile platforms. Businesses report millions of dollars of lost revenue each year due to fraud. Platform abuse and fraud prevention largely remain reactive, and is achieved by studying the profile behavior and transaction history of a user after they sign up. This approach is often manual, time-consuming, and expensive. Early detection and prevention of fraudulent account sign-ups on online platforms using artificial intelligence (AI) is an effective defense mechanism for combating fraud and abuse.
This post shows how you can use Amazon Fraud Detector in real time along with Amazon Cognito custom authentication workflows to prevent fake account sign-ups. Amazon Fraud Detector is a fully managed service that can identify potentially fraudulent online activities, such as creation of fake accounts or online payment fraud. Plus, you can use it without the need for any prior machine learning (ML) expertise. Unlike general-purpose ML packages, Amazon Fraud Detector is designed specifically to detect fraud.
Amazon Cognito lets you add user sign-up, sign-in, and access control to your web and mobile applications quickly and easily. It’s serverless, and can scale up to millions of users. I also discuss how you can use Amazon Pinpoint to track user sign-up flow events via user journeys and categorize users into segments. This is useful for user profiles and activity analysis in order to run effective marketing or promotional campaigns while maintaining a quality user experience.
In its general design, the solution uses an Amazon Fraud Detector supervised ML model along with a customized Amazon Cognito sign-up workflow to implement a real-time new user fraud prevention mechanism for online web and mobile applications. It also uses Amazon DynamoDB and AWS Lambda to customize the Amazon Cognito sign-up workflow. The following diagram illustrates the high-level architecture.
Amazon Fraud Detector Online Fraud Insights is a supervised ML model designed to detect a variety of online fraud. You can use Online Fraud Insights to detect fraudulent accounts during the sign-up process. The model generates a model score between 0 and 1,000. The higher the score, the higher the risk of the new account being fraudulent.
Because it’s a supervised ML model, your model accuracy may vary depending on the quality and maturity of the labeled training data. The model requires at least two features in the training dataset along with two required features: EVENT_TIMESTAMP and EVENT_LABEL. Using more features may help achieve higher model accuracy and lower false positive rates. Amazon Fraud Detector provides information on the importance of the features used in training the model, which is useful for addressing model overfitting or underfitting. The training dataset can be prepared with data from an existing fraud prevention system by following the data preparation guidance. In this case, the Amazon Fraud Detector model is trained with a labeled dataset with the following features.
Feature | Description |
ip_address | User’s public IP address |
email_address | User’s email address |
user_agent | The User-Agent request header value |
billing_state | User’s postal address state |
billing_postal | User’s zip or postal code |
billing_address | User’s billing address |
phone_number | User’s phone number |
EVENT_TIMESTAMP | Required EVENT_TIMESTAMP variable |
EVENT_LABEL | The label (fraud or legitimate) |
Amazon Fraud Detector also provides a way to define rules that tell the detector how to interpret the inference outcome. These rules can be defined using the rule language. A set of three specific rules is defined for this solution:
You can define fewer or additional rules depending on the use case and the overall model accuracy. For the purposes of this solution, I defined three distinct user sign-up flows depending on which rule the model score outcome conforms to:
Fraud attack vectors are a mechanism by which bad actors obtain fraudulent access to an application in order to exploit the system. The most common fraud attack vector is sign-up attempts by users using synthetic identities, such as use of disposable emails or email tumbling. These methods involve using unique email addresses for every sign-up attempt. Fraudulent sign-up attempts are either carried out by individual users, group of users, or automated systems (bots). Another sophisticated form of fraud attack vector involves collusive behavior, also known as collusion fraud. In this scenario, a group of users gain access to the system and perform transactions in coordination with each other to game the system to their advantage.
Disposable email address domains can be identified by maintaining a list of known disposable email address domains in a DynamoDB table, and validating the email address against that list. Fraud graphs with Amazon Neptune provide a way to identify email tumbling and collusion fraud. Neptune is a fast, reliable, and fully managed graph database that can store fraud graphs and find relationships between the new user and existing users. With fraud graphs, you can use commonalities between user profiles such as the same postal address, phone numbers, and IP addresses to detect email tumbling or collusion fraud attempts. The following diagram shows an example of this process.
Amazon Cognito manages user sign-up and sign-in through a user directory known as a user pool. User pools let you customize authentication workflows using Lambda triggers. To customize a user pool workflow, you can create Lambda functions that are invoked by Amazon Cognito during various phases of the workflow. These functions can implement functionalities such as introducing authentication challenges, validating emails, sending confirmation messages, and other custom logic.
This solution uses Amazon Cognito pre sign-up Lambda trigger to implement a real-time fraud detection system. The Lambda trigger is invoked before Amazon Cognito performs a new user sign-up, which lets us run validations, and stores the user information and Amazon Fraud Detector rule outcome in a DynamoDB table. Because the function lets us run custom logic, we can also include validation of disposable emails or tumbling email addresses and subsequently assess the risk level of the user based on the rule outcome. The pre sign-up Lambda trigger lets us determine if the sign-up process should proceed normally, if additional validation steps (friction) should be introduced, or if the sign-up request should be denied.
The following diagram illustrates the logical flow of this function.
Amazon Pinpoint enables businesses to communicate with their customers using popular channels like email, SMS, voice, and push notifications. With Amazon Pinpoint, you can also create segments of marketing campaign audiences. Without early fraud prevention for sign-ups, businesses must analyze all user profiles with the same lens. Findings of such analyses are then used to create appropriate audience segments for new user marketing or promotional campaigns. This approach often introduces overhead that takes time away from effectively engaging with customers, especially when dealing with large volumes of user data. For example, businesses may want to run marketing and promotional campaigns for new users with low sign-up risk scores.
Events within the Amazon Cognito sign-up flow can also be sent to Amazon Pinpoint so businesses can create customer journeys. An Amazon Pinpoint journey, as illustrated in the following diagram, is a multi-step engagement experience that can be tailored to fit the overall marketing strategy of the business.
Online web and mobile platforms may evolve based on changing business needs. Businesses may expand to new geographic locations, letting users sign up from uniquely different email domains and IP addresses. The online platform may start letting users sign up using their phone numbers. In such cases, it becomes important that the Online Fraud Insights model is retrained with a more recent dataset in order to minimize biased prediction outcomes.
You can retrain a new version of the Amazon Fraud Detector model by using the data captured in DynamoDB. Data from the DynamoDB table can be exported to Amazon Simple Storage Service (Amazon S3) using DynamoDB table export. The data in Amazon S3 can then be formatted using the data preparation guidance for Amazon Fraud Detector training data. When the retraining data is ready, a new Amazon Fraud Detector model version can be trained.
To demonstrate the solution, we trained an Amazon Fraud Detector model using a fictitious, synthetically generated sample dataset. We used an Amazon Cognito user pool custom authentication workflow to define the three different flows based on each of the Amazon Fraud Detector rule outcomes.
The following diagram shows the sign-up flow events. The Amazon Fraud Detector Online Fraud Insights ML model evaluates either a low risk or high risk outcome for the new user.
Let’s walk through the flow:
The following diagram shows the sign-up flow events where the Online Fraud Insights ML model evaluates a medium risk outcome for the new user. In this case, friction is introduced in the sign-up flow by means of additional identity verification.
To do a walkthrough of this flow, let’s assume that the new user sign-up has passed the disposable and tumbling email validation checks in the pre sign-up Lambda trigger.
The starter code for setting up this real-time sign-up flow using Amazon Cognito and the Amazon Fraud Detector GetEventPrediction API is available on GitHub. For this walkthrough, you must have the following prerequisites:
To get started with setting up and testing Amazon Fraud Detector, complete the following steps:
Detailed step-by-step instructions on how to deploy the custom sign-up workflow are available in the GitHub repository. The repository consists of an AWS Cloud Development Kit (AWS CDK) application that deploys all the necessary AWS resources. The high-level steps are as follows:
You can use Amazon Cognito APIs via the AWS SDK (available for JavaScript, Java, .NET) and use API Gateway endpoints as REST endpoints to configure the sign-up or registration flow in your web or mobile app. Alternatively, you can use the AWS Amplify SDK Auth, API, and Analytics modules to integrate Amazon Cognito, API Gateway, and Amazon Pinpoint with your application.
To avoid incurring future charges, delete the resources created for the solution.
This post demonstrated how you can implement a real-time fraud prevention system by preventing fake account creation with AI using Amazon Fraud Detector. I discussed how to mitigate different fraud attack vectors by customizing authentication workflows in Amazon Cognito using Lambda functions. This solution helps businesses take steps towards building an AI-powered fraud prevention system for their web and mobile platforms. Fully managed AWS services such as Amazon Fraud Detector, Amazon Cognito, and Amazon Pinpoint help make the solution cost-effective by reducing operational overhead. This solution is also customizable to support mitigation of emerging fraud attack vectors. Early fraud prevention helps reduce the time businesses spend analyzing user behavior to identify fraud in their platforms and focus more on driving business value. To learn more about how Amazon Fraud Detector can help your business, visit the webpage!
Anjan Biswas is a Senior Solutions Architect with focus on AI/ML, Data Analytics, and enterprise applications. Anjan works with enterprise customers and is passionate about developing, deploying and explaining AI/ML, Data Analytics, and Big Data solutions. Anjan has over 14 years of experience working with global supply chain, manufacturing, and retail organizations and is actively helping customers get started and scale on AWS.