Building Sovereign Data Lineage: A Decentralized Storage Architecture with DID and Proxy Re-Encryption

Preface

There is a fundamental contradiction in centralized data governance: to use data, we must entrust it to a platform. Once entrusted, control no longer belongs to the Knowledge Contributor.

The architecture described here combines three technologies to answer whether data can circulate without relying on platform trust while maintaining owner control.

Core Design Philosophy: From Platform Trust to Cryptographic Constraints

Four engineering principles guide the architecture:

Minimal On-Chain Data - Only contribution fingerprints, version relationships, and identity declarations remain on-chain; raw data stays off-chain.
Data Sovereignty - “Who holds the keys” determines control. The platform cannot access plaintext data or decryption keys without authorization.
Embedded Permission - Permissions are cryptographic facts bound to data versions via Verifiable Credentials, not database records.
Auditability First - Every version evolution and authorization creates verifiable traces for third-party verification.

System Architecture Overview

graph TB
    subgraph On-Chain
        L[Lineage Registry]
        I[Identity Registry]
    end
    subgraph Off-Chain Storage
        F[Encrypted Data - IPFS/Arweave/S3]
    end
    subgraph Key Layer
        PRE[Proxy Re-Encryption Node]
        VC[Verifiable Credentials]
    end
    Owner -->|publish fingerprint| L
    Owner -->|encrypt & upload| F
    Owner -->|issue VC| VC
    Owner -->|generate re-enc key| PRE
    User -->|present VC| PRE
    PRE -->|re-encrypt key| User
    User -->|decrypt locally| F

On-Chain Data Lineage Layer

Stores metadata only through a LineageRecord structure:

{
  "contributionFingerprint": "Hash",
  "version": "string",
  "previousHash": "Hash",
  "operatorDID": "string",
  "dataUri": "string"
}

Off-Chain Storage Layer

Stores strongly encrypted data on Filecoin, Arweave, S3, or hybrid solutions. The storage layer is publicly readable by default. Data security relies entirely on encryption and key distribution.

Key and Permission Layer

Verifiable Credentials answer authorization questions; Proxy Re-Encryption handles secure key delivery.

Access Layer

The platform verifies credentials and executes re-encryption but never gains decryption capability.

Data Evolution and Version Control

Two distinct concepts organize data management:

DataEntityID - Maintains constant identity across modifications
DataVersion - Represents snapshots at specific timepoints; authorization targets versions, not entities

Food Science Dataset Example (v1-v3 Evolution)

Version 1 (Raw Collection): 10,000 food photos with basic metadata

On-chain: ContributionFingerprint, zero previousHash
Permission: Data cleaning companies and model trainers

Version 2 (Expert Annotation): Added structured annotations (calories, ingredients, allergens)

On-chain: New ContributionFingerprint, previousHash points to v1
Permission: Commercial users like nutrition apps; v1 access does not automatically grant v2 access

Version 3 (Correction and Compliance): Fixed nutrition labels and applied face blurring

On-chain: ContributionFingerprint, previousHash points to v2
Forward Secrecy: Excluded users cannot decrypt v3

Core Mechanism: Secure Key Delivery via Proxy Re-Encryption

The architecture employs hybrid encryption:

Data encryption: Symmetric algorithms (AES-256); each version generates an independent symmetric key
Key protection: Keys never leave the Knowledge Contributor’s local environment in plaintext

Full Authorization and Access Flow

Phase 1: Data Publishing

Owner generates version and symmetric key locally
Encrypts raw data; uploads ciphertext to IPFS/OSS/AWS
Self-encapsulates key by encrypting with their public key; stores in metadata

Phase 2: Decentralized Authorization

User requests access; Owner issues Verifiable Credential
Owner generates proxy re-encryption key locally
Owner distributes credentials and re-encryption key to platform

Phase 3: Proxy Access

User presents VC to platform
Platform verifies VC and executes re-encryption
User decrypts locally using their private key

This achieves “encrypt once, authorize many” without re-encrypting large files for each user.

Permission Revocation and Forward Secrecy

Three-level revocation approach:

Platform Layer: Owner instructs platform to delete re-encryption key
Verification Layer: Owner uses VC Revocation List or expiration mechanisms
Version Layer: Owner generates new version with fresh key; old authorizations naturally fail for new data

The system protects future data from compromised past keys, similar to Signal’s ratchet mechanism.

System Availability When Platform Fails

Scenario	Impact	Resolution
Platform unavailable, storage exists	Users with obtained keys unaffected	Owner can redeploy PRE logic
Platform acts maliciously	Service denial only, not fact tampering	Platform cannot forge owner-signed VCs
Platform and partial storage fail	Old lineage remains verifiable	Owner re-uploads to new storage network

Platform failure reduces automation, not data sovereignty.

Auditability

Auditability maximizes traceability when data inevitably leaks. Four-step verification:

Confirm Data Identity: Recalculate hash fingerprint; compare with on-chain record
Trace Complete Access Path: Review lineage records, VC authorizations, and platform audit logs
Narrow Responsibility: Compress uncertainty into investigable scope
Provide Legal Evidence: Create reproducible, immutable evidence package

The system cannot prevent downloads but can narrow an infinite responsibility space into a manageable, investigable, litigable scope.

Summary

The architecture balances performance, security, and decentralization by returning permission control to users rather than relying on traditional Access Control Lists. It provides more engineering feasibility than Fully Homomorphic Encryption while introducing a semi-trusted proxy node.

Extremely high-confidentiality scenarios might benefit from Trusted Execution Environments as stronger trust anchors.