DataVault

Introduction

Data Pipeline

Medicaid data transformation journey

From raw Medicaid files to governed, AI-ready datasets

Step through the one-directional 9-stage DataVault AI Platform pipeline. Use the Next or Back buttons to watch unstructured Medicaid feeds become a refined, secure, AI-powered asset.

Blue → Teal → Green → Gold → Purple = increasing trust

Structure

Intelligence

Trust

Back

Next stage

External

Bronze

Catalog +…

Silver

Entity res…

Gold

Analytics

ML + GenAI

Governan…

01

Data sources

Represents raw Medicaid data originating from external systems (eligibility, claims, providers, pharmacy, encounters). This is the chaotic input zone before the platform applies any control.

AWS services

Non-AWS feeds (SFTP, portals, partner systems)

02

Bronze ingestion

Secure storage of raw, unchanged data as it arrives in the cloud. Data is preserved exactly as received, but now lives inside a hardened landing zone.

AWS services

Amazon S3 · AWS Transfer Family · AWS Lambda

03

Catalog

Automated discovery of metadata and application of ML-driven quality checks, building a living catalog for Medicaid data.

AWS services

AWS Data Catalog • AWS Data Quality

04

Silver Layer

Cleansing, validation, and standardization of data across feeds, codes, and identities, tuned for Medicaid complexity.

AWS services

AWS Glue DataBrew • Amazon SageMaker • Amazon Comprehend

05

ETL + entity res…

Transformations and identity resolution to create a unified Medicaid Master Patient index (MPI) across plans, providers, and time.

AWS services

AWS Glue ETL • AWS Entity Resolution

06

Gold Layer

Creation of business-ready, analytics-optimized datasets tuned for Medicaid performance, cost, and usabillity.

AWS services

Amazon Redshift • Amazon S3 (Parquet/ORC) • AWS Glue schema management

07

Analytics

Dashboards, ad-hoc queries, and self-service analytics for Medicaid performance, quality, and operational reporting.

AWS services

Amazon QuickSight • Amazon Athena • Amazon Redshift

08

ML + generative AI

Predictive and generative AI capabilities power risk scores, care management insights, and narrative explanations for Medicaid programs.

AWS services

Amazon SageMaker • Amazon Bedrock

09

Summary

Enforces data governance, security controls, compliance, and continuous monitoring across every layer of the Medicaid data journey.

AWS services

AWS CloudTrail • Amazon GuardDuty • AWS Security Hub • Amazon CloudWatch • AWS IAM

Stage 1 · External

External Medicaid data sources

Represents raw Medicaid data originating from external systems (eligibility, claims, providers, pharmacy, encounters). This is the chaotic input zone before the platform applies any control.

How the data looks here

Represents raw Medicaid data originating from external systems (eligibility, claims, providers, pharmacy, encounters). This is the chaotic input zone before the platform applies any control.