The Amazon International Seller Growth (ISG) team runs the CSBA (Customer Service by Amazon) program that supports over 200,000 third-party Merchant Fulfilled Network (MFN) sellers. Amazon call centers facilitate hundreds of thousands of phone calls, chats, and emails going between the consumers and Amazon MFN sellers. The large volume of contacts creates a challenge for CSBA to extract key information from the transcripts that helps sellers promptly address customer needs and improve customer experience. Therefore, it’s critical to automatically discover insights from these transcripts, perform theme detection to analyze multiple customer conversations, and automatically present a set of themes that indicate the top reasons for customer contact, so that the customer problems are addressed in the right way and as soon as possible.
This post presents a solution that uses a workflow and AWS AI and machine learning (ML) services to provide actionable insights based on those transcripts. We use multiple AWS AI/ML services, such as Contact Lens for Amazon Connect and Amazon SageMaker, and utilize a combined architecture. This solution is tested with ISG using a small volume of data samples. In this post, we discuss the thought process, building this solution, and the outcome from the test. We believe the lessons learned and our journey presented here may help you on your own journey.
The following figure shows the recommended operational landscape with stakeholders and business workflow for ISG so that sellers can stay close to their customers anytime, anywhere. The consumer contacts Customer Service through a contact center platform and engages with the Customer Service Associate (CSA). Then the transcripts of contacts become available to CSBA to extract actionable insights through millions of customer contacts for the sellers, and the data is stored in the Seller Data Lake. Sellers use the Amazon Seller Central portal to access the analytics outcomes and take action to quickly and effectively address customer problems.
The following diagram shows the architecture reflecting the workflow operations into AI/ML and ETL (extract, transform, and load) services.
The workflow steps are as follows:
In the following sections, we dive deeper into the AI/ML solution and its components.
In this section, we describe our approach for data labeling to identify the contact reason and resolution, and our methodology for keywords extraction for the sellers to perform root cause analysis.
To detect the contact reason from transcripts by ML, we utilized seven Standardized Issue Codes (SICs) as the data labels from the sample data provided by ISG team:
The contact reason labels can be further extended by adding the previously unknown issues to the seller; however, those issues had not been defined in the SIC. Unlike the contact reason, the contact resolution doesn’t have a label associated with the transcripts. The resolution categories were specified by the ISG team, and the resolutions needed to be labeled based on these categories. Therefore, we utilized Amazon SageMaker Ground Truth to create or update labels for each contact.
Ground Truth provides a data labeling service that makes it easy to label data, and gives you the option to use human annotators through Amazon Mechanical Turk, third-party vendors, or your own private workforce. For this solution, the ISG team defined for categories for contact resolution in over 140 transcript documents, which were labeled by Amazon Mechanical Turk contractors:
It only took a couple of hours for the contractors to complete the multi-label text classification contact center resolution labeling for the 140 documents, and have them reviewed by the customer. In the next step, we build the multi-class classification models, then predict the contact reason and resolution from the new call and chat transcripts coming from the customer service.
Another challenge is to extract the keywords from the transcripts that can guide the MFN sellers on specific actions. For this example, the seller needs to capture the key information such as product information, critical timeline, problem details, and refund offered by the CSA, which may not be clear. Here we built a custom key phrases extraction model in SageMaker using the RAKE (Rapid Automatic Keyword Extraction) algorithm, following the process shown in the following figure. RAKE is a domain-independent keyword extraction algorithm that determines key phrases by analyzing the frequency of word appearance and its co-occurrence with other words in the text.
After the standard document preprocessing, RAKE detects the most relevant key words and phrases from the transcript documents. The output is listed as follows:
[(‘im amazons chat helper .. im’, 0.08224299065420558), (‘jun 23 .. could’, 0.041588785046728964), <== timeline ('original payment method please', 0.04112149532710279), <== resolution: refund ('amazon gift card balance', 0.04112149532710279), <== resolution: refund ('previous conversation .. let', 0.04018691588785046), ('faulty pieces would like', 0.036448598130841114), <== call reason: faulty piece ('nice day !:)..', 0.025233644859813078), ('dual fuel gas', 0.025233644859813078), <== call reason: product info ('customer service hub', 0.025233644859813078), ('5 business days', 0.025233644859813078), <== timeline ('try .. got', 0.02476635514018691), ('right place ..', 0.023364485981308407), ('item .. let', 0.023364485981308407), ('youd like help', 0.02242990654205607), ('think would help', 0.02242990654205607), ('search help pages', 0.02242990654205607), ('gbc1793w ). order', 0.02242990654205607), <== call reason: product info ('moment .. ok', 0.021962616822429903), ('charcoal combo grill', 0.021028037383177565), <== call reason: product info ('call back ..', 0.021028037383177565), ('yes please anything', 0.020093457943925228), ('chat ..', 0.014953271028037382), ('would love', 0.014018691588785043), ('looks like', 0.014018691588785043), ('bent pieces', 0.013084112149532708), <== call reason: faulty details
This method captured key phrases with high relevance scores on the critical information such as timeline (“June 23”), refund resolution (“Amazon gift card,” “in 5 business days”), product information (“charcoal combo grill,” “dual fuel gas,” “gbc1793w”) and problem details (“faulty piece,” “bent pieces”). These insights not only tell the seller that this customer has been taken care of by getting a refund, but also guide the seller to further investigate the gas grill product defect and avoid having similar issues for other customers.
Contact Lens generated transcripts, contact summary, and sentiments for call and chat samples collected from ISG Customer Service. Throughout the testing, the transcription and sentiment scores were accurate as expected. Along with known issues, the ISG team also looks for detecting unknown issues from transcripts to meet the seller-specific needs such as delivery problems, product defects, the resolutions provided by the contact, and issues or key phrases leading to a return or refund.
To address this challenge, we extended our tests through custom models on SageMaker. Our experience pointed to “bag-of-words” based, more conventional (non-deep learning) models using SageMaker based on the size of the dataset and samples.
We performed the contact reason classification modeling following the three steps on SageMaker as shown in the following figure.
The steps are as follows:
We tested three algorithms aiming to obtain the best-performing model:
For our model training through AutoGluon, we used the MultilabelPredictor method from the AutoGluon library. This predictor performs multi-label prediction for tabular data. We used the sample notebook from AWS samples on GitHub. We used the same notebook by starting with importing AutoGluon libraries and defining the class for MultilabelPredictor(). To save space, we don’t show those lines in the following code snippet; you can copy/paste that part from the sample notebook. We employed the training in the file train.csv in our S3 bucket (your_path_to_s3/train.csv), specified the column used for label, and performed model training through MultilabelPredictor.
train_data = TabularDataset(‘your_path_to_s3/train.csv’) subsample_size = 106 # the sample size for training train_data = train_data.sample(n=subsample_size, random_state=0) labels = [‘label’] # column to predict based on the others problem_types = [‘multiclass’] # type of each prediction problem save_path = ‘your_save_path_to_results’ # the path to your s3 bucket for results to store time_limit = 60 # number of seconds to train the TabularPredictor for each label multi_predictor = MultilabelPredictor(labels=labels, problem_types=problem_types, path=save_path) multi_predictor.fit(train_data, time_limit=time_limit)
The following table lists the AI/ML services and models, and summarizes the accuracy.
. | Transcripts | Feature | Linear Learner | XGB with HPO | AutoGluon |
Validation set | 11 | 750 | 0.91 | 0.82 | 0.82 |
Validation set | 11 | 1500 | 0.82 | 0.82 | 0.91 |
Testing set | 34 | 750 | 0.71 | 0.71 | 0.74 |
Testing set | 34 | 1500 | 0.65 | 0.65 | 0.82 |
The following charts summarize the accuracy for the sample set based on amount of features.
In the following charts, we observed that the models of the decision tree with a gradient boosting machine, such as LGB, XGBoost, and Random Forest, were better choices for this type of problem for both the 750-feature models and 1,500-feature models. The neural net model is ranked lower among the 13 models, which confirmed our expectation that deep learning might not be suitable for our case.
With AWS AI/ML services, we can provide accurate and efficient contact reason and contact resolution detection and other actionable insights for Amazon International Seller Growth. MFN sellers can use these insights to better understand consumer problems, and take effective actions to resolve Amazon consumers’ issues, while also optimizing their process and costs.
You can tailor the solution for your contact center by developing your own custom model in SageMaker, and feeding the call and chat transcripts for training and inference. You could also apply this solution for general theme detection to analyze customer conversations in your contact center.
Yunfei Bai is a Senior Solutions Architect at AWS. With the background in AI/ML, Data Science and Analytics, Yunfei helps customers adopt AWS services to deliver business results. He designs AI/ML and Data Analytics solutions that overcome complex technical challenges and drive strategic objectives. Yunfei is a PhD in Electronic and Electrical Engineering . Outside of work, Yunfei enjoys reading and music.
Burak Gozluklu is a Principal ML Specialist Solutions Architect located in Boston, MA. Burak has +15 years of industry experience in simulation modeling, data science and ML technology. He helps global customers adopting AWS technologies and especially, AI/ML solutions to achieve their business objectives. Burak has a PhD in Aerospace Eng. from METU, MS in Systems Engineering and post-doc on system dynamics from MIT in Cambridge, MA. Burak is passionate about yoga and meditation.
Chelsea Cai is a Senior Product Manager at Amazon’s International Seller Growth (ISG) organization, where she works for Customer Service by Amazon service (CSBA) helping 3P sellers improve their customer service/CX through Amazon CS technology and worldwide organizations. In her spare time, she likes philosophy, psychology, swimming, hiking, good food, and spending time with her family and friends.
Abhishek Kumar is a Senior Product Manager at Amazon’s International Seller Growth (ISG) organization, where he develops software platforms and applications to help global 3P sellers manage their Amazon business. In his free time, Abhishek enjoys traveling, learning Italian, and exploring European cultures and cuisines with his extended Italian family.