Data Lake Cost and Storage Growth Report
On a schedule, the flow measures Data Lake storage by zone/folder, computes growth trends and top consumers, estimates cost, flags abnormal growth, and reports to data and finance teams via Teams and Power BI. Gives visibility and control over lake storage cost.
Provided as-is, without warranty of any kind. Review and test each pattern in a non-production environment before deploying it to live automations. See our Terms.
Overview
This flow gives data and finance teams ongoing visibility into Azure Data Lake storage cost and growth. On a weekly schedule it measures the size and file count of a Data Lake zone, converts the total to GB, estimates monthly cost, compares against the previous snapshot to compute week-over-week growth, writes the result to a Dataverse trend table, refreshes a Power BI dataset, and posts either a routine scorecard or an anomaly alert to a Teams channel. Why it matters: lake storage grows quietly and cost accumulates without anyone watching. A weekly snapshot plus an automatic anomaly flag keeps spend visible and catches abnormal growth early. The flow ships Off (demo); going live requires only connection authorization and environment-variable configuration - no logic changes.
Use Case
Data platform and finance teams want to track and forecast Azure Data Lake storage cost without standing up a separate monitoring stack. This flow turns a folder listing into a trend table, a refreshed dashboard, and a Teams scorecard, with a threshold-based alert when a zone grows abnormally week-over-week.
Flow Architecture
Weekly_Mon_0700
RecurrenceRuns every Monday at 07:00 Eastern. Built-in schedule, no connector.
Init Trace & Config
Initialize variableMints a correlation id (guid) and loads nine env vars (Data Lake account, scan path, price per GB, growth alert %, Power BI workspace/dataset ids, Teams group/channel) into working variables; sets TotalBytes and FileCount accumulators to 0.
List Data Lake Files
Azure Data Lake - ListFilesLists files over account + path (the scanned zone). Returns FileStatuses/FileStatus[] with length (bytes) and type (FILE/DIRECTORY).
Apply to each File
Apply to each + ConditionIterates the listing. Check Is File (type == 'FILE') increments TotalBytes by int(length) and FileCount by 1. Subdirectories are skipped.
Compose TotalGB
ComposeConverts TotalBytes to GB (bytes / 1073741824).
Get Previous Snapshot
Microsoft Dataverse - ListRecordsReads flowlibs_lakestorages filtered to this zone, ordered by snapshot date desc, top 1, to form the comparison baseline.
Compose PrevGB / GrowthPct / EstCost
ComposeComputes previous total GB (0 if first run), week-over-week growth percent, and estimated monthly cost (TotalGB x price per GB).
Create Snapshot
Microsoft Dataverse - CreateRecordWrites a new flowlibs_lakestorages row: name, zone, file count, total GB, estimated cost, growth %, snapshot date, correlation id.
Refresh PowerBI Dataset
Environment Variables
| Schema name | Type | Default | Description |
|---|---|---|---|
| flowlibs_DataLakeAccount | String | <configure> | Azure Data Lake (Gen1) account name to measure. |
| flowlibs_LakeScanPath | String | curated | Zone/folder path to measure (no leading slash). |
| flowlibs_PricePerGb | String | 0.018 | Estimated storage cost per GB/month (USD). |
| flowlibs_GrowthAlertPct | String | 25 | Week-over-week growth % that triggers an alert. |
| flowlibs_StorageTable | String | flowlibs_lakestorages | Dataverse entity set for the trend store. |
| flowlibs_PowerBIWorkspaceId | String | <configure> | Power BI workspace (group) id. |
| flowlibs_PowerBIDatasetId | String | <configure> | Power BI dataset id to refresh. |
| flowlibs_TeamsGroupId | String | <your-team-id> | Teams team (group) id for the report channel. |
| flowlibs_TeamsChannelId | String | <your-channel-id> | Teams channel id for the report. |
Connectors & Connections
| Connector | API name | Actions used |
|---|---|---|
| Azure Data Lake | shared_azuredatalake | ListFiles |
| Microsoft Dataverse | shared_commondataserviceforapps | ListRecords CreateRecord |
| Power BI | shared_powerbi | RefreshDataset |
| Microsoft Teams | shared_teams | PostMessageToConversation |
Note — All connections are referenced as solution connection references; the flow is portable between environments as long as a connection is mapped at import time.
Customization Guide
Almost every realistic variant of this flow can be implemented by changing environment variable values. A few cases require small edits inside the flow definition — those are called out explicitly below.
- Multiple zones
- Wrap the measure/snapshot block in an outer loop over a comma-separated list of zones (split flowlibs_LakeScanPath), writing one snapshot row per zone.
- Chargeback
- Group file sizes by top-level subfolder and write a row per team/folder to attribute cost.
- Forecast
- Read the last N snapshots from Dataverse and project next-quarter spend using the average weekly growth rate.
- Cleanup tie-in
- Flag folders with the largest growth as tiering/archival candidates and route them to a Box/Blob retention flow.
- Cost accuracy
- For large accounts, swap the per-file enumeration for Azure Storage metrics/inventory and estimate by access tier.
Key Expressions
The flow is intentionally light on Power Fx / WDL gymnastics — the heaviest expressions are the branch-name concatenation and the approval outcome check. They are listed below in the order they appear in the flow.
EXPR.01Bytes to GB
Convert accumulated bytes to GB.
EXPR.02Growth %
Week-over-week growth percent (0 when no prior baseline).
EXPR.03Estimated cost
Total GB times price per GB.
EXPR.04Previous-snapshot filter
OData filter selecting prior snapshots for this zone.
EXPR.05File-only guard
Count only files, skipping subdirectories.
Comments
Sign in to join the conversation.
Sign inNo comments yet. Be the first to share your experience with this flow.