Data Sovereignty and Residency#

Data sovereignty is the principle that data is subject to the laws of the country where it is stored or processed. Data residency is the requirement to keep data within a specific geographic boundary. These are not abstract legal concepts — they dictate where you deploy infrastructure, how you replicate data, and what services you can use.

Get this wrong and the consequences are regulatory fines, contract violations, and loss of customer trust. GDPR fines alone have exceeded billions of euros since enforcement began.

Key Regulations#

GDPR (EU/EEA)#

The General Data Protection Regulation applies to any organization processing personal data of EU/EEA residents, regardless of where the organization is headquartered.

Core requirements for infrastructure:

  • Lawful basis for processing. You must have a legal reason to process personal data (consent, contract, legitimate interest, etc.).
  • Data minimization. Collect and store only what is necessary.
  • Storage limitation. Do not keep data longer than needed. Implement retention policies.
  • Data subject rights. Users can request access, deletion, portability, and correction of their data. Your architecture must support these operations.
  • Transfer restrictions. Personal data cannot leave the EU/EEA unless the destination country has an adequacy decision or appropriate safeguards are in place (Standard Contractual Clauses, Binding Corporate Rules).
  • Data Protection Impact Assessment (DPIA). Required for high-risk processing activities.
  • 72-hour breach notification. You must detect, assess, and report breaches within 72 hours. This requires real-time monitoring and incident response processes.

GDPR does not require data to stay in the EU. It requires that transfers outside the EU have adequate protections. However, many organizations choose EU-only storage to simplify compliance.

CCPA / CPRA (California)#

The California Consumer Privacy Act and its amendment (CPRA) apply to businesses handling personal information of California residents.

Key differences from GDPR:

  • Focused on consumer rights: right to know, right to delete, right to opt out of sale/sharing.
  • No explicit data localization requirement.
  • Applies based on business thresholds (revenue, data volume).
  • Broader definition of “sale” — sharing data with third parties for cross-context behavioral advertising counts.

Infrastructure impact: CCPA requires the ability to identify and delete all data about a specific consumer across all systems. Your data architecture must support lookup-by-identity and cascading deletion.

PIPEDA (Canada)#

Canada’s Personal Information Protection and Electronic Documents Act requires organizations to protect personal information and obtain consent for collection, use, and disclosure.

Transfer rules: PIPEDA allows international transfers if the organization ensures comparable protection through contractual means. Quebec’s Law 25 is stricter and may require explicit consent for out-of-province transfers.

National Data Localization Laws#

Some countries mandate that data physically stays within their borders:

Country Requirement Scope
Russia Personal data of Russian citizens must be stored on servers in Russia All personal data
China Critical information infrastructure operators must store data locally; cross-border transfer requires security assessment Personal data and “important data”
India RBI mandates payment data stored in India; draft DPDP Act may add broader requirements Financial transaction data; expanding
Indonesia Public sector data must be stored domestically; private sector encouraged Government data; expanding
Vietnam Data localization for important data; local storage required in many cases Broad scope under Decree 13
Saudi Arabia Government and health data must remain in Saudi Arabia Sector-specific
Turkey Personal data transfers abroad require explicit consent or board approval All personal data
Brazil (LGPD) No strict localization but transfer restrictions similar to GDPR Personal data

This landscape changes frequently. New laws are proposed and enacted regularly. Treat this as a starting point for research, not a complete reference.

Designing for Data Residency#

Multi-Region Architecture Patterns#

Pattern 1: Regional Isolation (Strictest)

Each region operates independently with no data replication across borders:

EU Region (eu-west-1)              US Region (us-east-1)
├── Application instances          ├── Application instances
├── Database (EU data only)        ├── Database (US data only)
├── Cache (EU data only)           ├── Cache (US data only)
├── Object storage (EU)            ├── Object storage (US)
└── Logs/monitoring (EU)           └── Logs/monitoring (US)

Pros: Simplest compliance story. Data never crosses borders. Cons: No cross-region failover. Operational complexity of managing independent deployments. Increased cost.

Pattern 2: Regional Data with Global Control Plane

Operational metadata (config, service discovery, deployment state) is global. Customer data stays regional:

Global Control Plane
├── Service mesh config
├── Deployment orchestration
├── Aggregated metrics (anonymized)
└── No personal data

EU Data Plane                    US Data Plane
├── Customer data                ├── Customer data
├── Application logs             ├── Application logs
└── Full database                └── Full database

Pros: Centralized operations, compliant data handling. Cons: Must ensure control plane never receives personal data. Aggregated metrics must be truly anonymized.

Pattern 3: Data Residency with Cross-Region Metadata

Customer data stays regional. Anonymized or pseudonymized metadata can flow globally for analytics and operations:

EU Region                        US Region
├── Customer PII                 ├── Customer PII
├── Transactions                 ├── Transactions
├── Pseudonymized ID → EU user   ├── Pseudonymized ID → US user
│                                │
└──→ Global Analytics (anonymized/pseudonymized data only)

Pros: Enables global analytics and ML while respecting residency. Cons: Pseudonymization must be robust. Re-identification risk must be assessed.

Routing Users to the Correct Region#

Users must be directed to infrastructure in their jurisdiction:

User request
  → DNS (GeoDNS or Cloudflare geo-routing)
  → Edge proxy determines user's jurisdiction
    → Route to EU infrastructure (EU users)
    → Route to US infrastructure (US users)
    → Route to AP infrastructure (APAC users)

Jurisdiction is determined by user residence, not request origin. An EU citizen traveling in the US is still covered by GDPR. Use account-level jurisdiction assignment, not IP geolocation.

# Example: user record includes jurisdiction
{
  "user_id": "u-12345",
  "jurisdiction": "EU",
  "data_region": "eu-west-1",
  "created_at": "2026-01-15T10:00:00Z"
}

Route all data operations for this user to eu-west-1, regardless of where the request originates.

Database Design for Residency#

Option 1: Separate databases per region

Each region has its own database. No replication between regions. Simplest for compliance, hardest for global features.

-- EU database
CREATE TABLE users (
    id UUID PRIMARY KEY,
    email TEXT NOT NULL,
    name TEXT NOT NULL,
    jurisdiction TEXT NOT NULL DEFAULT 'EU',
    -- All personal data stays here
);

-- US database (same schema, different data)

Option 2: Sharded database with region-aware routing

Single logical database sharded by jurisdiction. The application layer routes queries to the correct shard.

def get_db_connection(user_jurisdiction: str):
    regions = {
        "EU": "postgres://eu-west-1.db.internal:5432/app",
        "US": "postgres://us-east-1.db.internal:5432/app",
        "AP": "postgres://ap-southeast-1.db.internal:5432/app",
    }
    return connect(regions[user_jurisdiction])

Option 3: CockroachDB or YugabyteDB with locality-aware partitioning

Distributed databases that support geo-partitioning at the row level:

-- CockroachDB: partition by region
ALTER TABLE users PARTITION BY LIST (jurisdiction) (
    PARTITION eu VALUES IN ('EU'),
    PARTITION us VALUES IN ('US'),
    PARTITION ap VALUES IN ('AP')
);

-- Pin partitions to specific regions
ALTER PARTITION eu OF TABLE users CONFIGURE ZONE USING
    constraints = '[+region=eu-west-1]';
ALTER PARTITION us OF TABLE users CONFIGURE ZONE USING
    constraints = '[+region=us-east-1]';

Data physically resides in the specified region. Queries are automatically routed to the correct partition.

Object Storage and Backups#

Object storage (S3, GCS, Azure Blob) must also respect residency:

# Terraform: EU-only S3 bucket
resource "aws_s3_bucket" "eu_data" {
  bucket = "myapp-eu-customer-data"
}

resource "aws_s3_bucket_server_side_encryption_configuration" "eu_data" {
  bucket = aws_s3_bucket.eu_data.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm     = "aws:kms"
      kms_master_key_id = aws_kms_key.eu_key.arn  # EU-region KMS key
    }
  }
}

Backups are data too. If customer data must stay in the EU, backups of that data must also stay in the EU. Cross-region backup replication for disaster recovery can violate residency requirements unless the backup destination is in an approved jurisdiction.

Logging and Monitoring#

Logs often contain personal data — IP addresses, user IDs, email addresses, request parameters. If logs are shipped to a centralized monitoring service in another region, that is a data transfer.

Options:

  1. Regional log aggregation — each region has its own logging stack.
  2. Log sanitization before export — strip PII from logs before sending to a global aggregator.
  3. Pseudonymization in transit — replace identifiers with pseudonymous tokens before cross-border transfer.
# Fluentd: strip PII before forwarding
<filter application.**>
  @type record_transformer
  remove_keys user_email, client_ip, authorization_header
  <record>
    client_ip_hash ${record["client_ip"] ? OpenSSL::Digest::SHA256.hexdigest(record["client_ip"]) : ""}
  </record>
</filter>

Cloud Provider Residency Controls#

AWS#

  • Region selection per service — data does not leave the selected region unless you configure replication.
  • AWS Control Tower with region deny guardrails — prevent resource creation in unapproved regions.
  • S3 Object Lock and region constraint — prevent accidental cross-region copies.

Azure#

  • Azure Policy with allowed locations — restrict deployments to specific Azure regions.
  • Azure Sovereign Clouds — separate clouds for government (Azure Government, Azure China, Azure Germany).
  • Data residency commitments documented per service.

GCP#

  • Organization policies with location constraints — restrict resource creation to approved regions.
  • Assured Workloads — automatically enforces data residency for regulated workloads.
  • VPC Service Controls — prevent data exfiltration from approved regions.

Compliance Verification#

Build automated checks that verify data residency:

# Check that no S3 buckets exist outside approved regions
aws s3api list-buckets --query 'Buckets[].Name' --output text | \
  while read bucket; do
    region=$(aws s3api get-bucket-location --bucket "$bucket" --query LocationConstraint --output text)
    if [[ "$region" != "eu-west-1" && "$region" != "eu-central-1" ]]; then
      echo "VIOLATION: Bucket $bucket is in region $region"
    fi
  done

Run these checks in CI/CD and as scheduled scans. Compliance is not a one-time assessment — it is continuous verification.

Common Mistakes#

  1. Assuming data residency means data sovereignty. Residency is about location. Sovereignty is about legal jurisdiction. Data stored in the EU on a US provider’s infrastructure may still be subject to US law (CLOUD Act). Understand both dimensions.
  2. Using IP geolocation to determine jurisdiction. A user’s legal jurisdiction is based on their residence, not where they are browsing from. Store jurisdiction as an account attribute set during registration.
  3. Forgetting that logs, backups, and caches are data. Every copy of personal data must comply with residency requirements. CDN caches, Redis replicas, log aggregators, and database backups all count.
  4. Not accounting for third-party data processors. If you use a SaaS tool that processes customer data (analytics, support, email), that tool’s data location matters too. Map all data processors and their regions.
  5. Treating regulation as static. Data sovereignty laws change frequently. India, Indonesia, and the EU are all actively expanding requirements. Build architecture that can adapt to new jurisdictions without redesigning the system.