Advanced

Data Lake Schema Drift Detection

When new files arrive in a dataset path, the flow infers their schema, compares it to the registered schema, and on drift (new/removed/changed columns) blocks promotion, opens a ticket, and alerts data engineering with the diff. Prevents schema drift from breaking downstream consumers.

Azure Data LakeMicrosoft DataverseMicrosoft Teams

Unique name

FlowLibsDataLakeSchemaDriftDetection

Publisher

FlowLibs (flowlibs)

Version

1.0.0.0

Components

6 env vars + 1 cloud flow

Request access

Provided as-is, without warranty of any kind. Review and test each pattern in a non-production environment before deploying it to live automations. See our Terms.

Report an issue with this flow

What it does

Overview

This flow guards Azure Data Lake datasets against silent schema drift. When a new file arrives in a monitored dataset path, the flow infers its schema from the CSV header, compares it to the registered baseline schema stored in Dataverse, and classifies any change as additive (new columns) or breaking (removed columns). On drift it sets the dataset's promotion status to Blocked, opens a drift ticket in Dataverse, and alerts the data-engineering channel in Teams with the exact column diff. On first sight of a dataset it registers the baseline automatically. Why it matters: silent schema changes break downstream pipelines and reports. Detecting drift at ingestion stops bad data from propagating and gives data engineering a precise, actionable diff.

Why you'd use it

Use Case

A data engineering team ingests curated CSV extracts into a Data Lake zone. They want automatic protection against unexpected schema changes from upstream producers: new columns should be allowed (configurable), but removed or renamed columns must block promotion and raise an alert before consumers break.

Step-by-step

Flow Architecture

When a File Is Added or Modified in the Data Lake

Azure Data Lake - Trigger_ListFiles (metadata polling, splitOn)

Fires per new or modified file in the monitored dataset path; splitOn fans each file into its own run.

Initialize Trace & Config

Initialize variable

Mints a correlation guid for traceability, captures the file name (pathSuffix) and full read path, loads the additive policy flag and registry table name, and sets up working state for the comparison.

Check If File Is CSV

Condition

Only .csv files of type FILE are processed.

Read & Infer Schema

Azure Data Lake - ReadFile + Compose

Reads the file content and extracts/trims the CSV header into the current column array.

Get Registered Schema

Microsoft Dataverse - ListRecords

Looks up the baseline schema for this dataset path and branches on whether a baseline exists.

Classify Drift

Filter Array + Set Variable

Filters added columns (in new, not baseline) and removed columns (in baseline, not new), then sets severity (None / First Registration / Additive / Breaking) and promotion status (Blocked on breaking or disallowed additive).

Gate & Alert

Microsoft Dataverse - CreateRecord + Microsoft Teams - PostMessageToConversation

On drift, opens a Dataverse drift ticket with the diff and posts the column diff to the data-engineering Teams channel; on first sight, registers the baseline schema.

Solution config

Environment Variables

Schema name	Type	Default	Description
flowlibs_DataLakeAccount	String	REPLACE_WITH_DATALAKE_ACCOUNT	Data Lake storage account name.
flowlibs_DataLakeDatasetPath	String	curated/sales	Folder path (no leading slash) monitored for arrivals.
flowlibs_SchemaRegistryTable	String	flowlibs_schemaregistries	Dataverse entity-set holding baseline schemas.
flowlibs_AllowAdditive	String	true	When true, new columns are allowed (additive).
flowlibs_TeamsGroupId	String	<your-team-id>	Teams group/team id for the alert channel.
flowlibs_TeamsChannelId	String	<your-channel-id>	Teams channel id for the alert.

Auth dependencies

Connectors & Connections

Connector	API name	Actions used
Azure Data Lake	shared_azuredatalake	Trigger_ListFiles (trigger) ReadFile
Microsoft Dataverse	shared_commondataserviceforapps	ListRecords CreateRecord
Microsoft Teams	shared_teams	PostMessageToConversation

Note — All connections are referenced as solution connection references; the flow is portable between environments as long as a connection is mapped at import time.

Tweaks & variations

Customization Guide

Almost every realistic variant of this flow can be implemented by changing environment variable values. A few cases require small edits inside the flow definition — those are called out explicitly below.

Event-driven vs scheduled: Trigger_ListFiles polls metadata hourly by default. Lower the recurrence interval for faster detection, or pair with an Event Grid blob-created subscription for true push.
Allow / block additive: Flip flowlibs_AllowAdditive to false to enforce a strict data contract where any new column also blocks promotion.
File formats: The header-inference logic targets CSV. For Parquet/JSON, replace Infer Header Row with a footer/metadata parse.
Auto-evolve: Extend the additive-allowed branch to update the registry (increment flowlibs_schemaversion) so the baseline tracks approved additions automatically.
Ticketing: Swap or supplement Create Drift Ticket with an Azure DevOps / Jira work item if drift should open a tracked engineering ticket.

Helpers & literals

Key Expressions

The flow is intentionally light on Power Fx / WDL gymnastics — the heaviest expressions are the branch-name concatenation and the approval outcome check. They are listed below in the order they appear in the flow.

EXPR.01Infer current schema (CSV header)

Extracts and trims the CSV header row as the current schema.

workflow definition language

trim(replace(first(split(string(body('Read_New_File_From_Data_Lake')), decodeUriComponent('%0A'))), decodeUriComponent('%0D'), ''))

EXPR.02Added columns (Filter Array where)

Columns present in the new file but not in the baseline.

workflow definition language

@not(contains(variables('varRegisteredColumns'), item()))

EXPR.03Removed columns (Filter Array where)

Columns present in the baseline but not in the new file (breaking).

workflow definition language

@not(contains(variables('varNewColumns'), item()))

EXPR.04Drift detected

Baseline exists AND columns were added or removed.

workflow definition language

or(length(Filter_Added) > 0, length(Filter_Removed) > 0)

EXPR.05Registry lookup filter

OData filter to find the baseline for this dataset path.

odata

flowlibs_datasetpath eq '<dataset path>'

Make it yours

Customize & download

Generate a ready-to-import copy of this solution with your environment-variable values baked in — available on Base, Pro, or Team.

Upgrade to customize

Overview

Use Case

Flow Architecture

When a File Is Added or Modified in the Data Lake

Initialize Trace & Config

Check If File Is CSV

Read & Infer Schema

Get Registered Schema

Classify Drift

Gate & Alert

Environment Variables

Connectors & Connections

Customization Guide

Key Expressions

EXPR.01Infer current schema (CSV header)

EXPR.02Added columns (Filter Array where)

EXPR.03Removed columns (Filter Array where)

EXPR.04Drift detected

EXPR.05Registry lookup filter

Customize & download

Comments