Advanced

Data Lake Raw-Zone Ingestion and Cataloging

When files land in the Data Lake raw zone, the flow validates and registers them in a Dataverse catalog (schema, partition, source, row count), promotes valid files to a date-partitioned curated zone, quarantines bad ones, and notifies data engineering. Operationalizes raw-zone intake for a lakehouse.

Azure Data LakeMicrosoft DataverseMicrosoft Teams

Unique name

FlowLibsDataLakeRawZoneIngestionAndCataloging

Publisher

FlowLibs (flowlibs)

Version

1.0.0.0

Components

8 env vars + 1 cloud flow

Request access

Provided as-is, without warranty of any kind. Review and test each pattern in a non-production environment before deploying it to live automations. See our Terms.

Report an issue with this flow

What it does

Overview

This flow operationalizes raw-zone intake for a lakehouse on Azure Data Lake (Gen1). On a schedule it scans the raw/landing zone, reads each file, validates it, registers it in a Dataverse data catalog (file name, extension/schema, date partition, source system, row count, status), then promotes valid files to a date-partitioned curated zone and moves invalid ones to a quarantine zone. It finishes by posting a run summary to the data-engineering Teams channel.

Why it matters: ungoverned raw-zone dumps become data swamps. Cataloging every file and enforcing zone promotion keeps the lake organized, auditable, and trustworthy, with a single correlation id tracing each ingestion batch end to end.

Ships Off (demo). All-connector reference implementation - no HTTP fallbacks.

Why you'd use it

Use Case

A data-engineering team lands files into a Data Lake raw zone from upstream systems (ERP exports, partner feeds, IoT batches) and needs them cataloged, validated, and promoted reliably without manual babysitting - a Dataverse catalog of everything that arrived, automatic promotion of conformant files, automatic quarantine of bad ones, and a per-run Teams summary.

Step-by-step

Flow Architecture

Scan Raw Zone (Recurrence)

Recurrence

Polls the raw zone on a schedule (default every 15 min); swap for the Data Lake list-files trigger for event-style firing.

Initialize Config & Counters

Initialize variable

Mints a correlation id; binds raw/curated/quarantine paths, allowed extensions, source system, account name, Teams ids; seeds cataloged/quarantined counters.

List Raw Zone Files

Azure Data Lake - ListFiles

Lists the raw-zone path (the real data source).

For Each File

Apply to each (concurrency 1)

For each file (skip directories): reads content, computes extension + CSV row count, validates (non-empty AND extension on the allow-list).

Promote or Quarantine

Dataverse CreateRecord + ADL UploadFile/DeleteFile

Valid files: catalog as Cataloged, upload to the date-partitioned curated path, delete from raw (move). Invalid files: catalog as Quarantined, upload to quarantine, delete from raw. Each branch increments its counter.

Notify Data Engineering

Compose + Teams

Builds an HTML run summary (correlation id, source, counts) and posts it to the data-engineering channel.

Solution config

Environment Variables

Schema name	Type	Default	Description
flowlibs_RawZonePath	String	/raw/inbound	Raw/landing zone folder scanned each run.
flowlibs_CuratedZonePath	String	/curated	Curated zone root (date-partitioned subfolders appended).
flowlibs_QuarantineZonePath	String	/quarantine	Destination for files that fail validation.
flowlibs_DataLakeAllowedExtensions	String	csv,json,tsv,xml	Comma-separated allow-list of accepted extensions.
flowlibs_DataLakeSourceSystem	String	ERP-Export	Source-system label stamped on each catalog row.
flowlibs_DataLakeAccountName	String	your-adls-gen1-account	ADLS Gen1 account name targeted by every file op.
flowlibs_TeamsGroupId	String	<your-team-id>	Teams team (group) id for the notification.
flowlibs_TeamsChannelId	String	<your-channel-id>	Teams channel id for the notification.

Auth dependencies

Connectors & Connections

Connector	API name	Actions used
Azure Data Lake	shared_azuredatalake	ListFiles ReadFile UploadFile DeleteFile
Microsoft Dataverse	shared_commondataserviceforapps	CreateRecord
Microsoft Teams	shared_teams	PostMessageToConversation

Note — All connections are referenced as solution connection references; the flow is portable between environments as long as a connection is mapped at import time.

Tweaks & variations

Customization Guide

Almost every realistic variant of this flow can be implemented by changing environment variable values. A few cases require small edits inside the flow definition — those are called out explicitly below.

Trigger style: Replace the Recurrence with the Data Lake list-files trigger (or Event Grid blob-created) for near-real-time intake.
Validation depth: Extend the valid check with schema-conformance (header match, column count, JSON/XML parse) beyond extension + non-empty.
Row counting: Compose Row Count handles CSV; add branches for JSON/TSV/XML to populate the row count per format.
Partitioning: The curated path uses yyyy/MM/dd; switch to source- or entity-based partition folders by editing Compose Curated Path.
Heavy transforms: For large files, trigger Azure Data Factory/Databricks from the valid branch instead of processing content in the flow.

Helpers & literals

Key Expressions

The flow is intentionally light on Power Fx / WDL gymnastics — the heaviest expressions are the branch-name concatenation and the approval outcome check. They are listed below in the order they appear in the flow.

EXPR.01Iterate listing

The ADLS Gen1 file listing.

workflow definition language

body('List_Raw_Zone_Files')?['FileStatuses']?['FileStatus']

EXPR.02Extension

Lowercase extension for the allow-list check.

workflow definition language

toLower(last(split(items('Apply_To_Each_File')?['pathSuffix'],'.')))

EXPR.03CSV row count

Data lines minus header for CSV.

workflow definition language

if(equals(outputs('Compose_File_Extension'),'csv'),sub(length(split(replace(string(body('Read_File_Content')),decodeUriComponent('%0D'),''),decodeUriComponent('%0A'))),1),0)

EXPR.04Curated path

Date-partitioned destination.

workflow definition language

concat(variables('varCuratedZonePath'),'/',formatDateTime(utcNow(),'yyyy/MM/dd'),'/',items('Apply_To_Each_File')?['pathSuffix'])

Make it yours

Customize & download

Generate a ready-to-import copy of this solution with your environment-variable values baked in — available on Base, Pro, or Team.

Upgrade to customize

Overview

Use Case

Flow Architecture

Scan Raw Zone (Recurrence)

Initialize Config & Counters

List Raw Zone Files

For Each File

Promote or Quarantine

Notify Data Engineering

Environment Variables

Connectors & Connections

Customization Guide

Key Expressions

EXPR.01Iterate listing

EXPR.02Extension

EXPR.03CSV row count

EXPR.04Curated path

Customize & download

Comments