Unlock patient data insights using Amazon HealthLake

AWS just announced the General Availability of Amazon HealthLake, a HIPAA-eligible service for healthcare providers, health insurance companies, and pharmaceutical companies to securely store, transform, query, analyze, and share health data in the cloud at petabyte scale. We believe that the combination of the innovation trends in healthcare (such as reimbursement models around data-driven evidence), standardization around interoperability (such as federal and global incentives and mandates in adopting the Fast Healthcare Interoperability Resources standard, or FHIR), and the advancement of scientific methods (such as with deep learning) enable our healthcare and life sciences (HCLS) customers to improve clinical and research efforts.

Health data is complex

Over the past decade, we’ve witnessed a digital transformation with healthcare organizations capturing huge volumes of patient information in electronic medical records (EMRs) every day, making the medical record a source of big data containing information regarding sociodemographics, medical conditions, genetics, and treatments. Making sense of all this data provides the biggest opportunity to transform care by tailoring disease treatment and prevention to individuals and populations. This so-called precision medicine takes into account the individual variability in genes, environment, and lifestyle for each individual.

However, health data is complex. Our HCLS customers want to organize, analyze, manage, and use this information to improve the overall quality of care while ensuring they run sustainable business operations. But the human ability to process this data without effective decision support is finite—much of this data contains semi-structured or unstructured information (such as clinical notes), which means the data needs to be extracted and transformed before it can be searched and analyzed.


Although all this data is in digital format, it’s locked into thousands of different incompatible data formats. It doesn’t get exposed through modern APIs or microservices. It’s often siloed for business reasons. This lack of interoperability inhibits the sharing of data across providers, meaning pieces of the puzzle can go unseen and potentially impact patient health, so unlocking that is important. Global health organizations are compelled to adopt FHIR data specification, as mandated by the 21st Century Cures Act.

In 2020, the US government through the Centers for Medicare and Medicaid Services (CMS) proposed new regulations that require the use of specific interoperable data standards and API frameworks based on the HL7 FHIR specification as the mandated standard for all healthcare data (fully effective by end of 2022).

FHIR provides a set of standardized specifications and consistent technical requirements that serve as a foundation for data sharing and interoperability. Partners like Diameter Health, HealthLX, Redox, and InterSystems have built connectors that enable organizations to translate legacy clinical data (HL7, CSV, CCA) for ingestion into Amazon HealthLake as standardized FHIR R4 records.

Amazon HealthLake seamlessly transforms semi-structured or unstructured content, which constitutes the majority of medical data, with integrated medical natural language processing (NLP) using machine learning (ML) models that have been trained to understand and extract meaningful information. For example, you can extract medications and conditions from patient progress notes, then map these medical concepts to the appropriate ontology codes (such as ICD-10 or RxNorm) with high accuracy and low latency. Amazon HealthLake then structures this data as a FHIR extension, thereby enriching the Amazon HealthLake record.

The highly nested design of FHIR captures data across table hierarchies, including custom extensions, which makes traditional query methods (such as SQL) unfeasible with this FHIR data. This requires complex transforms, which can impede business outcomes. Amazon HealthLake solves for this and enables you to easily apply advanced analytics using Amazon QuickSight or build ML models with Amazon SageMaker.

Making sense of the data

Patient outcome predictions can be used by healthcare providers, payers, and pharmaceutical companies interested in making recommendations for early intervention to improve outreach communication, improve the patient care experience, or reduce overall cost. Let’s walk through some examples.

Create a complete view of patient’s medical history

Amazon HealthLake creates a complete view of each patient’s medical history and structures it in the FHIR standard format to facilitate the exchange of information, consequently improving overall quality of care. You can use SageMaker to build patient outcome prediction applications with MIMIC-III data stored in Amazon HealthLake, coupled with a lightweight application for visualization and interpretability. The prediction target for this example is mortality prediction within 90 days after ICU discharge, but you can easily modify the target variable to suit your use case.

The referenced blog post provides examples of how to process, analyze, and create clinical dashboards using either a preloaded dataset from Synthea (supported in FHIR version R4) or the MIMIC-III (Medical Information Mart for Intensive Care III) data. MIMIC-III is a large, freely-available database comprising de-identified health-related data associated with over 40,000 patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001–2012.

Create better predictions

Amazon HealthLake presents a chronological order of medical events so that you can look at trends like disease progression over time. Healthcare organizations can add more to this by building ML models with SageMaker to unlock novel insights, find patterns, and identify anomalies from their data. Amazon HealthLake helps healthcare organizations analyze population health trends, outcomes, and costs with ML and analytics tools like QuickSight, Jupyter notebooks in SageMaker, and knowledge graph models in Amazon Neptune. In the post Building predictive disease models using Amazon SageMaker with Amazon HealthLake normalized data, you can learn about the impact additional attributes about patients that are embedded within the unstructured medical notes have on the accuracy of these predictions. The authors of the post included structured data resources extracted from the MIMIC-III dataset (demographics, vital signs, and medications), then augmented with additional unstructured data extracted and normalized from clinical notes to test and compare the performance of two disease prediction models.

In the first method, a binary disease classification model is used to demonstrate patient cohort clustering for predicting congestive heart failure (CHF). CHF is a major healthcare concern, with nearly 1 million new cases in the US each year, where the direct and indirect healthcare costs exceeded $30 billion annually just in the US.

In the second method, k-means clustering and Principal Component Analysis (PCA) are used to identify six cohort groups of patients diagnosed with sepsis. Patient assessment in the context of clinically similar peers informs timely treatment decisions, which lead to optimized healthcare and an overall improvement in patient care. This level of accuracy and precision can support practitioners’ efforts to implement risk factor reduction strategies and can help researchers systematically evaluate interventions to potentially delay or avert development of diseases with high mortality, morbidity, and significant costs.

Manage population health

Healthcare providers or caregivers often use analytical dashboards of Amazon HealthLake data to create holistic patient views over time, in comparison to a population. The post Population health applications with Amazon HealthLake – Part 1: Analytics and monitoring using Amazon QuickSight provides a guide to creating a population health dashboard (using QuickSight) from Amazon HealthLake enriched data. Specifically, the authors demonstrate how to parse clinical narratives in the FHIR DocumentReference resource to extract, tag, and structure the medical entities, including ICD-10-CM codes. This transformed data is then added to the patient’s record, providing a complete view of all the patient’s attributes (such as medications, tests, procedures, and diagnoses).

As a companion to this post, we also provide the Amazon HealthLake Workshop to walk you each step of creating an Amazon HealthLake data store, importing synthetic data, and generating a QuickSight dashboard.

Build a smarter search for better decision support and novel discoveries

Lastly, we suggest the post Build a cognitive search and a health knowledge graph using AWS AI services if you’re interested in modeling drug/target discovery, identifying novel patient cohorts, and a host of other use cases demanding relational models of health-related entities. The authors show how to index EMRs into Amazon Kendra for a semantic and accurate representation of a patient’s entire medical history and rank content by using Neptune knowledge graphs.

The use of semantic search, which introduces context to the query, provides more accurate search results in comparison to lexical search methods. Better search results translate to improved efficiency when physicians need to compare patient notes, using natural language queries in the form of questions, in order to identify shared clinical characteristics. The use of Neptune to build knowledge graph applications allows users to view metadata associated with patient notes in a more simple and normalized view, highlighting important relational characteristics stemming from patient information.


In this post, we discussed how healthcare providers, payers, and pharmaceutical companies can use Amazon HealthLake with other AWS services to improve outreach communication, improve the patient care experience, or reduce overall cost.

The use cases in this post are just some of the ways you can use Amazon HealthLake. For example, you can also collect and catalog data from Amazon HealthLake along with additional data sources by using services like AWS Lake Formation.

In addition to published resources, we have teams of Solutions Architect health specialists, ProServe Consultants, and ML Solutions Lab Scientists ready to work with you to accelerate your project and achieve your desired outcomes faster. Let our account teams help you connect to the AWS resources that you need to succeed!

About the Authors

Dr. Taha Kass-Hout is Director of Machine Learning and Chief Medical Officer at Amazon Web Services, and leads our Health AI strategy and efforts, including Amazon Comprehend Medical and Amazon HealthLake. He works with teams at Amazon responsible for developing the science, technology, and scale for COVID-19 lab testing, including Amazon’s first FDA authorization for testing our associates—now offered to the public for at-home testing. A physician and bioinformatician, Taha served two terms under President Obama, including the first Chief Health Informatics officer at the FDA. During this time as a public servant, he pioneered the use of emerging technologies and the cloud (the CDC’s electronic disease surveillance), and established widely accessible global data sharing platforms: the openFDA, which enabled researchers and the public to search and analyze adverse event data, and precisionFDA (part of the Presidential Precision Medicine initiative).


Dr. Mia Champion is a Senior Technical Program Manager for Health AI Product Strategy at AWS. She is passionate about transformative technologies that accelerate customer business outcomes in the areas of healthcare, life sciences, machine learning, and cloud computing. Prior to AWS, she was a Principle Investigator of Bioinformatics and Medical Informatics and started her own company dedicated to performing comparative analysis for customers using workflows optimized for the AWS Cloud. She is an active Angel Investor supporting a portfolio of startups in the healthcare and technology sectors and serves as an advisory board member. Mia also loves kayaking, hiking, and building sandcastles at the lake with her daughter.