Preface
There is a fundamental contradiction in centralized data governance: to use data, we must entrust it to a platform. Once entrusted, control no longer belongs to the Knowledge Contributor.
The architecture described here combines three technologies to answer whether data can circulate without relying on platform trust while maintaining owner control.
Core Design Philosophy: From Platform Trust to Cryptographic Constraints
Four engineering principles guide the architecture:
- Minimal On-Chain Data - Only contribution fingerprints, version relationships, and identity declarations remain on-chain; raw data stays off-chain.
- Data Sovereignty - “Who holds the keys” determines control. The platform cannot access plaintext data or decryption keys without authorization.
- Embedded Permission - Permissions are cryptographic facts bound to data versions via Verifiable Credentials, not database records.
- Auditability First - Every version evolution and authorization creates verifiable traces for third-party verification.
System Architecture Overview
graph TB
subgraph On-Chain
L[Lineage Registry]
I[Identity Registry]
end
subgraph Off-Chain Storage
F[Encrypted Data - IPFS/Arweave/S3]
end
subgraph Key Layer
PRE[Proxy Re-Encryption Node]
VC[Verifiable Credentials]
end
Owner -->|publish fingerprint| L
Owner -->|encrypt & upload| F
Owner -->|issue VC| VC
Owner -->|generate re-enc key| PRE
User -->|present VC| PRE
PRE -->|re-encrypt key| User
User -->|decrypt locally| F
On-Chain Data Lineage Layer
Stores metadata only through a LineageRecord structure:
{
"contributionFingerprint": "Hash",
"version": "string",
"previousHash": "Hash",
"operatorDID": "string",
"dataUri": "string"
}
Off-Chain Storage Layer
Stores strongly encrypted data on Filecoin, Arweave, S3, or hybrid solutions. The storage layer is publicly readable by default. Data security relies entirely on encryption and key distribution.
Key and Permission Layer
Verifiable Credentials answer authorization questions; Proxy Re-Encryption handles secure key delivery.
Access Layer
The platform verifies credentials and executes re-encryption but never gains decryption capability.
Data Evolution and Version Control
Two distinct concepts organize data management:
- DataEntityID - Maintains constant identity across modifications
- DataVersion - Represents snapshots at specific timepoints; authorization targets versions, not entities
Food Science Dataset Example (v1-v3 Evolution)
Version 1 (Raw Collection): 10,000 food photos with basic metadata
- On-chain: ContributionFingerprint, zero previousHash
- Permission: Data cleaning companies and model trainers
Version 2 (Expert Annotation): Added structured annotations (calories, ingredients, allergens)
- On-chain: New ContributionFingerprint, previousHash points to v1
- Permission: Commercial users like nutrition apps; v1 access does not automatically grant v2 access
Version 3 (Correction and Compliance): Fixed nutrition labels and applied face blurring
- On-chain: ContributionFingerprint, previousHash points to v2
- Forward Secrecy: Excluded users cannot decrypt v3
Core Mechanism: Secure Key Delivery via Proxy Re-Encryption
The architecture employs hybrid encryption:
- Data encryption: Symmetric algorithms (AES-256); each version generates an independent symmetric key
- Key protection: Keys never leave the Knowledge Contributor’s local environment in plaintext
Full Authorization and Access Flow
Phase 1: Data Publishing
- Owner generates version and symmetric key locally
- Encrypts raw data; uploads ciphertext to IPFS/OSS/AWS
- Self-encapsulates key by encrypting with their public key; stores in metadata
Phase 2: Decentralized Authorization
- User requests access; Owner issues Verifiable Credential
- Owner generates proxy re-encryption key locally
- Owner distributes credentials and re-encryption key to platform
Phase 3: Proxy Access
- User presents VC to platform
- Platform verifies VC and executes re-encryption
- User decrypts locally using their private key
This achieves “encrypt once, authorize many” without re-encrypting large files for each user.
Permission Revocation and Forward Secrecy
Three-level revocation approach:
- Platform Layer: Owner instructs platform to delete re-encryption key
- Verification Layer: Owner uses VC Revocation List or expiration mechanisms
- Version Layer: Owner generates new version with fresh key; old authorizations naturally fail for new data
The system protects future data from compromised past keys, similar to Signal’s ratchet mechanism.
System Availability When Platform Fails
| Scenario | Impact | Resolution |
|---|---|---|
| Platform unavailable, storage exists | Users with obtained keys unaffected | Owner can redeploy PRE logic |
| Platform acts maliciously | Service denial only, not fact tampering | Platform cannot forge owner-signed VCs |
| Platform and partial storage fail | Old lineage remains verifiable | Owner re-uploads to new storage network |
Platform failure reduces automation, not data sovereignty.
Auditability
Auditability maximizes traceability when data inevitably leaks. Four-step verification:
- Confirm Data Identity: Recalculate hash fingerprint; compare with on-chain record
- Trace Complete Access Path: Review lineage records, VC authorizations, and platform audit logs
- Narrow Responsibility: Compress uncertainty into investigable scope
- Provide Legal Evidence: Create reproducible, immutable evidence package
The system cannot prevent downloads but can narrow an infinite responsibility space into a manageable, investigable, litigable scope.
Summary
The architecture balances performance, security, and decentralization by returning permission control to users rather than relying on traditional Access Control Lists. It provides more engineering feasibility than Fully Homomorphic Encryption while introducing a semi-trusted proxy node.
Extremely high-confidentiality scenarios might benefit from Trusted Execution Environments as stronger trust anchors.