DataVault
Introduction
Data Pipeline
Medicaid data transformation journey
From raw Medicaid files to governed, AI-ready datasets
Step through the one-directional 9-stage DataVault AI Platform pipeline. Use the Next or Back buttons to watch unstructured Medicaid feeds become a refined, secure, AI-powered asset.
Blue → Teal → Green → Gold → Purple = increasing trust

Structure
Intelligence
Trust
Back
Next stage
External
Bronze
Catalog +…
Silver
Entity res…
Gold
Analytics
ML + GenAI
Governan…
01
Data sources
Represents raw Medicaid data originating from external systems (eligibility, claims, providers, pharmacy, encounters). This is the chaotic input zone before the platform applies any control.
AWS services
Non-AWS feeds (SFTP, portals, partner systems)
→
02
Bronze ingestion
Secure storage of raw, unchanged data as it arrives in the cloud. Data is preserved exactly as received, but now lives inside a hardened landing zone.
AWS services
Amazon S3 · AWS Transfer Family · AWS Lambda
→
03
Catalog
Automated discovery of metadata and application of ML-driven quality checks, building a living catalog for Medicaid data.
AWS services
AWS Data Catalog • AWS Data Quality
→
04
Silver Layer
Cleansing, validation, and standardization of data across feeds, codes, and identities, tuned for Medicaid complexity.
AWS services
AWS Glue DataBrew • Amazon SageMaker • Amazon Comprehend
→
05
ETL + entity res…
Transformations and identity resolution to create a unified Medicaid Master Patient index (MPI) across plans, providers, and time.
AWS services
AWS Glue ETL • AWS Entity Resolution
→
06
Gold Layer
Creation of business-ready, analytics-optimized datasets tuned for Medicaid performance, cost, and usabillity.
AWS services
Amazon Redshift • Amazon S3 (Parquet/ORC) • AWS Glue schema management
→
07
Analytics
Dashboards, ad-hoc queries, and self-service analytics for Medicaid performance, quality, and operational reporting.
AWS services
Amazon QuickSight • Amazon Athena • Amazon Redshift
→
08
ML + generative AI
Predictive and generative AI capabilities power risk scores, care management insights, and narrative explanations for Medicaid programs.
AWS services
Amazon SageMaker • Amazon Bedrock
→
09
Summary
Enforces data governance, security controls, compliance, and continuous monitoring across every layer of the Medicaid data journey.
AWS services
AWS CloudTrail • Amazon GuardDuty • AWS Security Hub • Amazon CloudWatch • AWS IAM
Stage 1 · External
External Medicaid data sources
Represents raw Medicaid data originating from external systems (eligibility, claims, providers, pharmacy, encounters). This is the chaotic input zone before the platform applies any control.
How the data looks here
Represents raw Medicaid data originating from external systems (eligibility, claims, providers, pharmacy, encounters). This is the chaotic input zone before the platform applies any control.