Cosmos DB Multi-Region Consistency Check
On a schedule, the flow reads a sample of documents from multiple Cosmos DB read regions, compares versions to detect replication lag or divergence, logs lag metrics, and alerts engineering if lag exceeds an SLA. Monitors geo-replication health for globally distributed Cosmos.
Provided as-is, without warranty of any kind. Review and test each pattern in a non-production environment before deploying it to live automations. See our Terms.
Overview
This flow monitors geo-replication health for a globally distributed Azure Cosmos DB account. On a schedule it reads a sample of documents from two Cosmos read-region accounts, compares each document's version stamp (_ts), measures replication lag (and flags documents missing in the secondary region), logs a per-document metric to a Dataverse table, and posts an alert to a Teams channel when any document exceeds the lag SLA.
Why it matters: replication lag affects user experience and data correctness. Active monitoring catches regional drift before users do - and the per-document Dataverse log builds a history you can trend in Power BI.
Ships Off (demo).
Use Case
A team running multi-region Cosmos DB wants continuous assurance that their read regions stay in sync. Instead of waiting for a customer to report stale data, this flow samples documents from both regions every hour, computes the replication lag per document, and raises a Teams alert the moment lag breaches the agreed SLA.
Flow Architecture
Run Consistency Check On Schedule
Recurrence (hourly)Kicks off the consistency sweep on a configurable interval.
Initialize Trace & Config
Initialize variableMints a correlation id and binds the SLA seconds, region names, Teams ids, and breach accumulators from env vars.
Query Primary Region
Azure Cosmos DB - QueryDocuments_V5Reads a sample of documents (id + _ts) from the primary region account.
Query Secondary Region
Azure Cosmos DB - QueryDocuments_V5Reads the same sample from the secondary region account.
For Each Primary Document
Apply to eachFilters the matching secondary doc, composes version stamps (0 if missing), computes absolute lag, derives the SLA verdict, and logs a snapshot record to Dataverse; on breach increments the counter and appends a summary line.
Alert On Any Breach
Microsoft Teams - PostMessageToConversationIf any document breached the SLA, posts a single consolidated alert to the engineering channel.
Environment Variables
| Schema name | Type | Default | Description |
|---|---|---|---|
| flowlibs_CosmosPrimaryAccount | String | cosmos-primary-eastus | Cosmos account name for the primary read region. |
| flowlibs_CosmosSecondaryAccount | String | cosmos-secondary-westeurope | Cosmos account name for the secondary read region endpoint. |
| flowlibs_CosmosDatabaseId | String | ReferenceDb | Cosmos database id. |
| flowlibs_CosmosContainerId | String | Documents | Cosmos container id. |
| flowlibs_CosmosConsistencyQuery | String | SELECT TOP 100 c.id, c._ts FROM c ORDER BY c._ts DESC | Sample query (id + version). |
| flowlibs_CosmosPrimaryRegionName | String | East US | Friendly name of the primary region. |
| flowlibs_CosmosSecondaryRegionName | String | West Europe | Friendly name of the secondary region. |
| flowlibs_ConsistencySLASeconds | String | 60 | Max acceptable replication lag (seconds). |
| flowlibs_TeamsGroupId | String | <your-team-id> | Teams team (group) id for alerts. |
Connectors & Connections
| Connector | API name | Actions used |
|---|---|---|
| Azure Cosmos DB | shared_documentdb | QueryDocuments_V5 |
| Microsoft Dataverse | shared_commondataserviceforapps | CreateRecord |
| Microsoft Teams | shared_teams | PostMessageToConversation |
Note — All connections are referenced as solution connection references; the flow is portable between environments as long as a connection is mapped at import time.
Customization Guide
Almost every realistic variant of this flow can be implemented by changing environment variable values. A few cases require small edits inside the flow definition — those are called out explicitly below.
- Cadence
- Change the Recurrence interval (default hourly) to match your replication-lag tolerance.
- Sample size / scope
- Edit the consistency query to target hot partitions or canary documents instead of TOP 100.
- More regions
- Duplicate the secondary query and the compare block for a third or fourth region account.
- Canary writer
- Pair with a companion flow that writes heartbeat documents to one region so lag is always measurable.
- Power BI
- Point a report at the snapshot table to trend lag over time and per region.
Key Expressions
The flow is intentionally light on Power Fx / WDL gymnastics — the heaviest expressions are the branch-name concatenation and the approval outcome check. They are listed below in the order they appear in the flow.
EXPR.01Secondary version (0 if missing)
Falls back to 0 when the document is missing in the secondary region.
EXPR.02Raw diff
Difference of the two version stamps.
EXPR.03Absolute lag (no abs() in Logic Apps)
Absolute value via negate since Logic Apps has no abs().
EXPR.04SLA verdict
Breached when missing or lag exceeds the SLA.
Comments
Sign in to join the conversation.
Sign inNo comments yet. Be the first to share your experience with this flow.