DataVault

Introduction

Data Pipeline

Medicaid data transformation journey

From raw Medicaid files to governed, AI-ready datasets

Step through the one-directional 9-stage DataVault AI Platform pipeline. Use the Next or Back buttons to watch unstructured Medicaid feeds become a refined, secure, AI-powered asset.

Blue → Teal → Green → Gold → Purple = increasing trust

Structure

Intelligence

Trust

Back

Next stage

External

Bronze

Catalog +…

Silver

Entity res…

Gold

Analytics

ML + GenAI

Governan…

01

Data sources

Represents raw Medicaid data originating from external systems (eligibility, claims, providers, pharmacy, encounters). This is the chaotic input zone before the platform applies any control.

AWS services

Non-AWS feeds (SFTP, portals, partner systems)

02

Bronze ingestion

Secure storage of raw, unchanged data as it arrives in the cloud. Data is preserved exactly as received, but now lives inside a hardened landing zone.

AWS services

Amazon S3 · AWS Transfer Family · AWS Lambda

    03

    Catalog

    Automated discovery of metadata and application of ML-driven quality checks, building a living catalog for Medicaid data.

    AWS services

    AWS Data Catalog • AWS Data Quality

      04

      Silver Layer

      Cleansing, validation, and standardization of data across feeds, codes, and identities, tuned for Medicaid complexity.

      AWS services

      AWS Glue DataBrew • Amazon SageMaker • Amazon Comprehend

        05

        ETL + entity res…

        Transformations and identity resolution to create a unified Medicaid Master Patient index (MPI) across plans, providers, and time.

        AWS services

        AWS Glue ETL • AWS Entity Resolution

          06

          Gold Layer

          Creation of business-ready, analytics-optimized datasets tuned for Medicaid performance, cost, and usabillity.

          AWS services

          Amazon Redshift • Amazon S3 (Parquet/ORC) • AWS Glue schema management

            07

            Analytics

            Dashboards, ad-hoc queries, and self-service analytics for Medicaid performance, quality, and operational reporting.

            AWS services

            Amazon QuickSight • Amazon Athena • Amazon Redshift

              08

              ML + generative AI

              Predictive and generative AI capabilities power risk scores, care management insights, and narrative explanations for Medicaid programs.

              AWS services

              Amazon SageMaker • Amazon Bedrock

                09

                Summary

                Enforces data governance, security controls, compliance, and continuous monitoring across every layer of the Medicaid data journey.

                AWS services

                AWS CloudTrail • Amazon GuardDuty • AWS Security Hub • Amazon CloudWatch • AWS IAM

                  Stage 1 · External

                  External Medicaid data sources

                  Represents raw Medicaid data originating from external systems (eligibility, claims, providers, pharmacy, encounters). This is the chaotic input zone before the platform applies any control.

                  How the data looks here

                  Represents raw Medicaid data originating from external systems (eligibility, claims, providers, pharmacy, encounters). This is the chaotic input zone before the platform applies any control.