title: "Web3 Proof-of-Process Identity"
The Provenance Problem
A digital asset has no inherent provenance. A JPEG of a generative artwork is byte-for-byte identical whether it was produced by a licensed pipeline operating under specific creative constraints, or copied from a Discord server two minutes ago. The file contains no unforgeable record of the process that created it. A signed certificate in the file's metadata can be stripped and replaced in seconds.
This is not a minor inconvenience for high-value digital asset markets. It is an architectural deficiency that suppresses market formation. When a pharmaceutical company cannot prove that a batch of AI-generated molecular visualization assets was produced by the certified pipeline under the validated parameters, those assets cannot be submitted to regulators. When a generative art studio cannot prove that a 10,000-piece collection was produced by the stated algorithm with the stated seed range, the collection is vulnerable to fraud claims.
The systems documented here solve this problem with cryptographic finality. Proof of Process — a verifiable, tamper-evident record of the inputs, parameters, and outputs of any computational process — is constructed at generation time, encoded in a deterministic binary format, pinned to a content-addressed storage network, and anchored to a blockchain with a transaction that cannot be altered without invalidating every subsequent block.
Evidence Packets
The atomic unit of the Proof-of-Process system is the Evidence Packet. An Evidence Packet is a structured data container that captures the complete computational context of a generation event — everything required to verify that a specific output was produced by a specific process applied to specific inputs.
A minimal Evidence Packet for a generative image pipeline contains:
interface EvidencePacket {
version: '1.0';
packetId: string; // UUIDv4
tenantId: string; // Opaque client identifier
processId: string; // Identifies the pipeline version
timestamp: number; // Unix timestamp, millisecond precision
inputs: {
promptHash: string; // SHA-256 of the normalized prompt string
modelId: string; // Model identifier + version hash
seed: number; // Generation seed
parameters: Record<string, unknown>; // All non-default parameters
};
execution: {
durationMs: number; // Wall-clock generation time
deviceClass: 'A100' | 'H100'; // Hardware class
pipelineHash: string; // SHA-256 of the pipeline config
};
outputs: {
assetHash: string; // SHA-256 of the raw output bytes
assetSize: number; // Output size in bytes
mimeType: string;
resolutionW: number;
resolutionH: number;
};
signature: string; // EdDSA signature over the canonical packet bytes
signingKeyId: string; // Key identifier (references HSM-stored key)
}
Every field in the Evidence Packet is deterministic given the same inputs. The signature field is computed over the canonical serialization of all other fields — not the JSON string representation, which is non-deterministic due to key ordering and whitespace — but the CBOR encoding, described in the next section.
RFC 8949: Deterministic CBOR Encoding
JSON is not a suitable encoding for cryptographic commitments. JSON serialization is non-deterministic: key ordering is undefined by the specification, floating-point representation varies across implementations, and whitespace normalization is a convention rather than a requirement. Two JSON serializers applied to the same data structure will frequently produce different byte sequences, making byte-level comparison and signature verification unreliable.
CBOR (Concise Binary Object Representation, RFC 8949) eliminates these ambiguities. CBOR is a binary data format that defines:
- —A canonical encoding mode (
deterministic_encoding: true) where map keys are sorted in a length-first, then lexicographic order, producing a unique byte sequence for any given data structure - —Native binary type support (
bstr) eliminating the need for Base64 encoding of hash values - —Integer encoding that does not suffer from floating-point precision loss
- —A type tagging system (CBOR Tags) that encodes semantic type information —
Tag(1)for Unix timestamps,Tag(37)for UUIDs — without relying on field naming conventions
The Evidence Packet is encoded as canonical CBOR. The canonical CBOR bytes are then hashed with SHA-256 to produce the packetHash — the identifier used in all downstream references. Because the encoding is deterministic, anyone with the same data can independently compute the same packetHash and verify that a given on-chain reference matches a given Evidence Packet.
The size reduction from JSON to canonical CBOR for a typical Evidence Packet is 62 to 71 percent, which becomes relevant at scale: a pipeline producing 100,000 assets per day generates 100,000 Evidence Packets. At 3KB average CBOR versus 8KB average JSON, the storage difference over a year is non-trivial.
The EdDSA signature in the signature field is computed over the canonical CBOR bytes using the Ed25519 curve. Ed25519 was selected over RSA and ECDSA-secp256k1 for three reasons: deterministic signatures (no randomness in signing, eliminating nonce reuse attacks), fast verification (batch verification of 64 signatures per millisecond on a single CPU core), and compact signature size (64 bytes versus 71 bytes for ECDSA-secp256k1).
Signing keys are generated and stored in an AWS CloudHSM instance. The private key material never leaves the HSM boundary. Signing requests are issued to the HSM via the CloudHSM JCE provider. Key rotation occurs at 12-month intervals with a 30-day overlap window for verification of assets signed under the previous key.
JSON Chainscripts
The Evidence Packet captures a single generation event. A Chainscript links multiple Evidence Packets into a verifiable sequence — useful for pipelines that process assets through multiple stages (generation, upscaling, color grading, format conversion) where each stage must be individually provable.
A Chainscript is a JSON document with the following structure:
{
"version": "1.0",
"chainId": "uuid-v4",
"tenantId": "opaque-client-identifier",
"entries": [
{
"sequence": 0,
"packetCid": "bafyreib...",
"packetHash": "sha256-hex",
"previousHash": null,
"timestamp": 1748000000000,
"stageId": "initial_generation",
"eddsaSignature": "base64url-encoded-64-bytes"
},
{
"sequence": 1,
"packetCid": "bafyreic...",
"packetHash": "sha256-hex",
"previousHash": "sha256-hex-of-sequence-0",
"timestamp": 1748000012000,
"stageId": "upscale_4x",
"eddsaSignature": "base64url-encoded-64-bytes"
}
],
"chainHash": "sha256-of-all-entry-hashes-concatenated",
"chainSignature": "eddsaSignature-over-chainHash"
}
The previousHash field in each entry creates a hash chain: entry N contains a hash of entry N-1. Modifying any entry in the chain invalidates all subsequent entries' previousHash values, making tampering detectable without trusted intermediaries. This is the same construction used in blockchain ledgers, applied here at the process audit level.
The chainHash is computed as the SHA-256 of the concatenation of all packetHash values in sequence order. The chainSignature is computed over the chainHash using the tenant's EdDSA signing key. A verifier needs only the Chainscript JSON, the tenant's public key, and access to IPFS to fully verify the chain.
IPFS Pinning Architecture
Evidence Packets (in canonical CBOR) and Chainscripts (in UTF-8 JSON) are both pinned to IPFS immediately after creation. IPFS content addressing uses CIDv1 with SHA-256 (bafyrei... prefix) — the CID is a cryptographic commitment to the content. Any modification to the content changes the CID, making the reference invalid.
The pinning architecture uses dual-provider redundancy: Pinata as the primary pinning service and Filebase (S3-compatible, IPFS-backed object store) as the secondary. Both providers are pinned simultaneously. The CID stored on-chain is the same regardless of which provider serves the content — IPFS routing resolves any provider.
Pinning is performed via the Pinata IPFS API within 500ms of Evidence Packet creation. The IPFS CID is returned synchronously and included in the Chainscript entry before the Chainscript itself is pinned. This ensures that the on-chain anchor always points to a CID that is already pinned and retrievable.
Retrieval is served through a dedicated IPFS gateway (ipfs.yehor.ai) running on a bare-metal server with 2TB NVMe cache. Frequently accessed CIDs — active collection assets, recently minted items — are cached at the gateway level with a 24-hour TTL. This eliminates IPFS routing overhead for hot content.
Base Network On-Chain Anchoring
Storing Evidence Packets on-chain directly is not feasible: at 3KB per packet and $0.001 per byte on most EVM chains, a million-asset pipeline would incur $3M in storage costs. The correct architecture stores only the CID and chain hash on-chain — a 32-byte commitment — and retrieves the full Evidence Packet from IPFS when verification is required.
The production system deploys a custom ERC-1155 contract on Base (the Coinbase-operated Ethereum L2). Base was selected for three properties:
- —Gas cost: Base processes transactions at 100 to 1,000x lower cost than Ethereum mainnet. Anchoring a single chain hash costs approximately $0.001 on Base versus $0.50 to $2.00 on mainnet at comparable network load.
- —EVM equivalence: Base is OP Stack-based with full EVM equivalence. Ethereum tooling, Hardhat, Foundry, Ethers.js, and Viem work without modification.
- —Data availability: Base posts transaction data to Ethereum mainnet via EIP-4844 blob transactions. The anchored data inherits Ethereum's finality guarantees within approximately 15 minutes.
The anchor transaction stores the following data in the contract's event log (not in contract state, which is more expensive):
event ProcessAnchored(
bytes32 indexed tenantId,
bytes32 indexed chainId,
bytes32 chainHash,
string ipfsCid,
uint256 timestamp
);
Using events rather than state storage reduces anchor cost by approximately 80%. Events are permanent, indexable, and retrievable via standard Ethereum log queries. The indexed fields on tenantId and chainId enable efficient log filtering — a verifier can retrieve all anchors for a specific tenant or chain without scanning the full event history.
EIP-712 typed structured data signing is used for off-chain anchor authorization: the signing key in the HSM produces an EIP-712 signature over the anchor payload before the transaction is submitted. This creates an additional verification path that does not require trust in the submitting transaction's sender.
Verification Protocol
A verifier — a regulator, audit firm, licensing party, or downstream buyer — performs verification through a four-step protocol that requires no trusted intermediaries and no contact with the original system operator.
Step 1 — On-chain retrieval. Query the Base network for ProcessAnchored events filtered by tenantId and chainId. Retrieve the chainHash and ipfsCid from the matching event.
Step 2 — IPFS resolution. Resolve the ipfsCid via any IPFS gateway. Download the Chainscript JSON and verify that its chainHash field matches the value retrieved from the blockchain. If it does not match, the Chainscript has been tampered with after anchoring.
Step 3 — Evidence Packet verification. For each entry in the Chainscript, resolve the packetCid via IPFS. Download the canonical CBOR Evidence Packet. Compute the SHA-256 of the CBOR bytes and verify against the packetHash in the Chainscript entry. Verify the eddsaSignature using the tenant's published Ed25519 public key.
Step 4 — Hash chain integrity. Verify that each entry's previousHash matches the packetHash of the preceding entry. Verify the chainSignature over the chainHash. If all checks pass, the chain is intact and the process is proven.
This protocol can be executed by any party with access to a Base RPC endpoint and an IPFS gateway. No proprietary software, no trusted oracle, no contact with the system operator is required. The proof is self-contained and publicly verifiable.
Production Deployment
The system is deployed across three client verticals: a generative art studio (10,000-piece collections on a quarterly cycle), a pharmaceutical research firm (molecular visualization assets for regulatory submissions), and a legal services provider (AI-generated document summaries for discovery packages).
Across all three deployments over 14 months:
- —Evidence Packets created: 4.2 million
- —Chainscripts anchored to Base: 847 (each covering a production batch)
- —IPFS retrieval success rate: 99.997% (dual-provider redundancy)
- —Verification failures (tamper detection): zero (no tampering attempts detected)
- —Anchor transaction cost average: $0.0008 per batch on Base
- —Regulatory submission acceptance rate: 100% for pharmaceutical client (previously 0% without provenance documentation)
- —Average verification time (Steps 1–4): 3.2 seconds end-to-end
The architecture makes process integrity a property of the cryptographic record, not a claim made by the system operator. The verification protocol produces a binary outcome — the chain is intact or it is not — with no ambiguity and no reliance on the trustworthiness of any single party.