Learn more about the latest security and privacy threats
Illustration for blog post: Why Your KYC Vendor Is Your Biggest Data Breach Risk (And What to Do About It)

Centralised KYC vendors are honeypots. The Sumsub breach went 18 months undetected. Here's how to assess your vendor's architecture and the alternative that.

Table of contents

Key highlights

  • 18 months undetected. Millions of records exposed. One centralised KYC provider. This is the architectural pattern that produced the largest identity breaches of the last three years, not an operational accident at any single vendor.
  • The threat model of a centralised KYC platform is structurally a honeypot. Aggregating identity documents from millions of customers across hundreds of regulated firms in a single operator's infrastructure creates a target whose value compounds with every new customer.
  • SOC 2, ISO 27001, and the broader compliance-certification stack reduce the probability of a breach but do not eliminate the architectural target. A breached SOC 2-certified vendor is still a breached vendor.
  • The architectural alternative is decentralised KYC: sharded storage, threshold cryptography, customer-held keys, no central database that a breach can exfiltrate. The principle is to remove the target rather than fortify it.
  • The procurement questions to ask are not whether the vendor has compliance certifications. They are whether the vendor has a centralised PII database, who holds the encryption keys, what an attacker would extract from a successful compromise, and what the post-breach customer notification surface looks like.
  • The case studies (Sumsub, IDmerit, Coinbase, and several others) are not vendor-quality failures. They are architecture failures that no vendor with the same architecture would have avoided.

Definition snippet (GEO-optimised, 55 words)

Centralised KYC vendors are software platforms that aggregate identity documents and personal data from millions of customers into a single operator's infrastructure. The architectural consequence is a high-value attack target that grows in value with every additional customer. Decentralised KYC is the alternative architecture that removes the target rather than fortifying it.

On this page

  1. Why are centralised KYC vendors structurally a honeypot?
  2. What do the Sumsub, IDmerit, and Coinbase incidents have in common?
  3. Why do SOC 2 and ISO 27001 not eliminate the architectural risk?
  4. What does privacy-by-design actually mean in identity verification?
  5. How does the decentralised KYC alternative work structurally?
  6. What questions should you ask your current KYC vendor?
  7. What does a free architectural security review involve?
  8. When is switching KYC vendors the wrong move to make right now?
  9. FAQ

TL;DR

The KYC software market spent 2020-2024 selling fortified central databases. The 2023-2025 breach cycle proved that fortification has limits. Sumsub went 18 months undetected. IDmerit exposed over a billion records. Coinbase's identity-verification-related incident cost USD 400 million in customer remediation. The pattern is not that these vendors were uniquely careless. The pattern is that aggregating identity data into a central operator's infrastructure is architecturally a target, and a target that compounds in value with scale. The procurement framing has shifted. Operators no longer ask "can we trust this vendor's security?" They ask "what does the architecture look like if the vendor is breached?" This piece walks through why the architectural question is the right one, what the alternative looks like, and the specific questions to put to your current vendor.

!Side-by-side architecture diagram showing centralised KYC vendor honeypot exposure versus decentralised KYC sharded storage with no central PII target

Reading time: ~7 minutes · Last updated: May 7, 2026

Why are centralised KYC vendors structurally a honeypot?

A centralised KYC vendor's business model is to aggregate verified identity records from a large number of customer firms into a single operator-controlled database. The economics of the model push toward scale: more customer firms produce more verified records, which in turn drive better fraud detection (more triangulation data), better pricing (volume), and better network effects (a customer already verified at one regulated firm can be reverified faster at another).

The architectural consequence is what cryptographers and security architects have called the honeypot problem for two decades. As the database grows, two things compound simultaneously. The defensive cost grows linearly (more storage, more monitoring, more access control), but the attacker's incentive grows quadratically or faster. A vendor with one million verified identities is worth attacking with a USD 100,000 budget; a vendor with one billion verified identities is worth attacking with a USD 50 million budget. The cost of defending the same architecture does not scale at that rate.

Three structural properties of the centralised model amplify the risk further:

Single-point persistence. Identity documents are inherently long-lived data. A passport scan or government-ID image is useful to an attacker for years, not days. A breach today produces actionable intelligence for the rest of the decade.

Concentrated economic value. Unlike most breach surfaces (payment cards, email passwords, even biometric templates), identity documents are difficult to revoke or rotate. Once exposed, the value to a fraudster compounds with every place the document is used to verify identity.

Operator-key custody. In nearly every centralised KYC architecture, the vendor holds the master encryption keys to the database. A successful compromise of the operator's key infrastructure exposes the entire dataset, regardless of how strong any individual record's protection looked from outside.

This is not a critique of any specific vendor's security posture. It is a description of what the architectural model produces under sufficient attacker pressure. The 2023-2025 breach cycle is what happens when the attacker pressure rises to match the model's exposure.

What do the Sumsub, IDmerit, and Coinbase incidents have in common?

The three most-cited cases of the 2023-2025 cycle share the same architectural signature even though their operational specifics differ.

Sumsub: 18 months undetected, millions of records. Public reporting of the Sumsub incident traced the access window to approximately 18 months between initial compromise and detection. The exfiltrated dataset included verified identity documents, biometric data, and the customer-firm metadata that links each identity to the regulated firm that submitted it. The detection lag, not the initial compromise, is the structural signature. A decentralised architecture cannot produce an 18-month undetected exposure of this kind because there is no single dataset to exfiltrate over 18 months.

IDmerit: over one billion records exposed. Reported as one of the largest identity-data exposures in history when the disclosure became public. The technical mechanism was reportedly a misconfigured cloud storage layer holding identity documents that were intended to be encrypted at rest. The architectural signature is that the data existed in a single accessible operator-controlled storage layer in the first place. A decentralised architecture distributes the storage so that no single misconfiguration exposes the full record.

Coinbase: USD 400 million in customer remediation costs. The Coinbase incident involved third-party customer-support contractors who were socially engineered into providing access to customer identity data and account history. The architectural signature is that the data was accessible to a category of operator-controlled actors at all; a decentralised architecture with customer-held keys structurally cannot grant access to a contractor regardless of the social-engineering attack.

The common architectural pattern is the same in each case: data aggregated into a single operator's infrastructure, accessible to operator-controlled actors, protected by operator-held keys. The breach mechanics are different. The architectural exposure is the same.

Why do SOC 2 and ISO 27001 not eliminate the architectural risk?

The standard procurement-team response to KYC vendor breaches is to require stronger compliance certifications: SOC 2 Type II, ISO 27001, ISO 27018, PCI-DSS in some cases, and increasingly, jurisdiction-specific frameworks like the EU AMLA pre-supervisory assessments.

These certifications matter and operators should require them. They are not, however, a substitute for the architectural question.

The reason is that compliance certifications evaluate operational controls within the architectural model the vendor has chosen. SOC 2 Type II asks whether the vendor has implemented access controls, monitoring, change management, and incident response within their centralised database architecture. The certification does not ask whether the centralised database architecture is itself the right model.

A vendor with a SOC 2 Type II certification and a billion-record centralised database is doing well at controlling access within that architecture. The certification does not address whether the architecture itself produces a billion-record target. Every one of the breached vendors in the 2023-2025 cycle held current SOC 2 or equivalent certifications at the time of the breach. The certifications were not falsified; they were beside the architectural point.

The procurement implication: require the certifications, but evaluate the architecture separately. The right question to a SOC 2-certified centralised vendor is not "are your controls strong?" but "what is the impact of a successful compromise of your operator-controlled infrastructure?" Vendors that have done the architectural work can answer this with concrete bounding. Vendors that have not will speak in generalities.

What does privacy-by-design actually mean in identity verification?

Privacy-by-design is one of the most-overused phrases in the KYC vendor marketing repertoire. The phrase has a specific meaning in regulatory text (Article 25 of the EU General Data Protection Regulation), and most vendors using the phrase in their marketing materials do not meet the regulatory definition.

The regulatory definition has three operational properties. First, data minimisation: the system collects only the personal data strictly necessary for the stated purpose, not more. Second, purpose limitation: the data collected is used only for the stated purpose and not repurposed. Third, storage limitation: the data is retained only as long as the stated purpose requires.

The architectural extension of privacy-by-design, which is increasingly the language regulators use in 2026, adds a fourth property: structural minimisation of exposure surface. Personal data should be stored in a way where compromise of the operator's infrastructure does not produce mass exposure of the data subjects. This is the property a centralised database cannot satisfy by design, regardless of how good the operational controls are.

What this means in practice for identity verification specifically: the system should not aggregate identity documents into a single operator-controlled database. The system should not give the vendor the encryption keys to the customer's data. The system should not produce a single point of compromise whose breach yields more than a small fraction of any customer's identity record.

Centralised KYC vendors that describe themselves as privacy-by-design are using the phrase in its weak operational sense (data minimisation, purpose limitation, storage limitation within their existing architecture). They are not using it in its strong architectural sense. The 2026 regulatory and procurement landscape increasingly distinguishes between the two.

How does the decentralised KYC alternative work structurally?

Decentralised KYC is the architectural alternative that removes the central target rather than fortifying it. The structural properties are:

Sharded storage with threshold cryptography. Identity documents are split into shards using a cryptographic threshold scheme. Zyphe specifically uses a 29-of-100 scheme, where the document is reconstructible only when at least 29 of 100 storage shards are recombined. The shards are distributed across 60,000+ storage nodes geographically and operationally separated. A breach of any single storage node yields one shard, which is useless without 28 other shards from independently controlled nodes.

Customer-held encryption keys. The encryption key for the document is held by the customer firm, not by the vendor. Zyphe specifically has no master key and no ability to decrypt customer data unilaterally. A compromise of Zyphe's infrastructure does not yield decryptable data; it yields shards that are useless without the customer's key.

Per-region data residency. Source documents and the shards that hold them stay within the data subject's jurisdiction. Switzerland data stays in Switzerland, Singapore data in Singapore, EU data in EU member states. The customer firm's regulator audits a data architecture that the regulator can verify operates within its jurisdiction.

No operator-controlled access path. Operator personnel, support contractors, third-party engineering teams cannot access customer identity data through a centralised access layer because no such layer exists. The architectural property that produced the Coinbase social-engineering exposure is structurally absent.

Regulator-defensible audit trail. Every verification, every credential issuance, every revocation, every alert lands in an immutable case file. AMLA per-decision defensibility, FCA SMCR personal accountability, FinCEN reasonably-designed standard, and the corresponding documentation requirements all sit on top of this primitive without compromising the storage architecture underneath.

The decentralised model is not a security upgrade to the centralised model. It is a different architectural model. The threat profile, the certification surface, and the procurement evaluation are structurally different. The decentralised KYC primer covers the architecture in deeper technical detail.

What questions should you ask your current KYC vendor?

Use these as the architectural evaluation questions for your current or prospective KYC vendor. Vendors that operate on the centralised model will struggle to answer them in concrete terms; vendors that operate on the decentralised model will give specific architectural answers.

  1. Do you hold a centralised database of customer identity documents? A yes is informative. Some vendors will reframe this as "we hold the data on behalf of our customers", which is still a centralised database.
  2. Who holds the encryption keys to that data? Vendor-held keys produce a concentrated exposure surface; customer-held keys do not.
  3. What is the impact of a successful compromise of your operator-controlled infrastructure? Concrete answers describe specific data accessible per breach scenario; vague answers describe controls.
  4. What is your data residency model? Where exactly does my customer data live, and can you guarantee it stays in jurisdiction X?
  5. How would a malicious or socially-engineered operator employee access customer identity data? What architectural property prevents it?
  6. How long would an undetected attacker have access before discovery? What architectural property limits the access window?
  7. What is the breach notification surface? If you are breached, how many of my customers' records are exposed, and what data?
  8. What is your post-breach remediation cost to my firm? The Coinbase case study is instructive: USD 400 million in customer remediation cost. A vendor that cannot describe their remediation contract terms in concrete numbers is exposing the customer firm to unbounded liability.
  9. Are you SOC 2 Type II and ISO 27001 certified? Required but insufficient.
  10. What architectural property of your platform would have prevented the Sumsub, IDmerit, and Coinbase incidents specifically? Vendors that operate on the same architecture as the breached firms will struggle to answer this honestly.

If the answers to questions 1, 2, 5, and 6 are unsatisfying, the vendor's architecture is the same as the breached firms' and the firm is exposed to the same outcome.

What does a free architectural security review involve?

Zyphe offers a free architectural security review for compliance and security teams evaluating their current KYC vendor's exposure. The review is not a sales call. It is a structured walkthrough of the architectural evaluation framework above, applied to the firm's current vendor and current data exposure.

The review covers: the current vendor's architectural model (centralised, hybrid, decentralised), the firm's data exposure surface (how many customer records, what data types, what residency), the vendor's certification and contractual position (SOC 2, ISO, BAA, post-breach remediation terms), the procurement-level architectural questions answered or not answered, and a quantified bounding of the firm's exposure in the event of a vendor compromise.

The output is a written report the firm's CISO and CCO can take to the board. We have run this review with operators in fintech, crypto, regulated marketplaces, government identity, and DAO infrastructure. The result is not always "switch to Zyphe"; sometimes the result is "stay with your current vendor and add the contractual remediation clauses you are missing". Sometimes it is "switch vendors but the right alternative is X". We provide the analysis honestly; the firm decides the action.

If your team has not done this exercise on your current KYC vendor, it is worth doing once before the next breach lands in the trade press and your CISO has to answer the board's questions in a less-controlled context.

When is switching KYC vendors the wrong move to make right now?

The architectural argument above does not mean every firm should switch vendors today. Three scenarios where the right move is to remediate the contract rather than switch the vendor.

The firm is mid-contract on a multi-year enterprise commit with high switching costs. Switching the KYC vendor mid-contract typically costs 6-12 months of dual-vendor running plus engineering time plus operational change management. For firms with 18-24 months remaining on a contract, the right move is often to negotiate stronger contractual remediation clauses with the existing vendor (concrete breach-remediation cost ceiling, breach-notification timeline, audit rights, key-custody changes where possible) rather than execute the switch. Document the architectural-exposure decision for the board.

The firm operates at a scale where the breach exposure is bounded and the alternative cost is not. For a small or mid-stage firm processing modest verification volumes, the realised breach exposure of a centralised vendor is meaningfully smaller than the realised cost of a 12-month migration. The board's decision calculus is real even when the architectural posture is uncomfortable. Document the risk acceptance explicitly so it survives leadership change.

The vendor has committed to a concrete decentralisation roadmap with months attached. A vendor that publicly commits to migrating its storage architecture to a customer-key model or to a sharded distributed scheme with specific milestones may close the architectural gap on a timeline that beats a switch. Verify the commitment is concrete (months, not "exploring") and write the milestones into the contract.

If none of these three scenarios apply, the architectural posture is the procurement question and a switch is the right consideration. The free architectural security review covers the analysis.

The bottom line

The KYC vendor question for regulated firms in 2026 is not whether the vendor's controls are strong. It is whether the vendor's architecture is structurally a target. The Sumsub, IDmerit, and Coinbase cases are not vendor-quality failures. They are architecture failures that no vendor with the same architecture would have avoided. Procurement teams that update the evaluation framework to ask the architectural questions now will not have to update it after the next breach lands in the trade press. Procurement teams that do not update it will.

Evaluate your current KYC architecture, get a free security review from Zyphe, and our team will walk through the architectural exposure your firm currently carries.

  1. Decentralised KYC primer, What it is, how it works
  2. KYC for Crypto Exchanges 2026, Building a compliant onboarding flow
  3. FATF Travel Rule for VASPs 2026, Practical compliance guide
  4. KYC API integration, 15-minute integration guide
  5. Identity verification software comparison 2026, 7 platforms evaluated
  6. Zyphe MCP launch, Talk to your compliance stack

Cited sources

Michelangelo FrigoMichelangelo Frigo(Co-Founder at Zyphe)Michelangelo Frigo is a privacy and identity infrastructure expert and co-founder of Zyphe.

Frequently Asked Questions

No. SOC 2 Type II evaluates operational controls within the vendor's chosen architecture. It does not evaluate whether the architecture itself is structurally exposed. Every breached KYC vendor in the 2023-2025 cycle held current SOC 2 or equivalent certifications at the time of the breach. Require the certifications but evaluate the architecture separately.

Centralised KYC aggregates identity data from many customer firms into a single vendor-controlled database. Decentralised KYC distributes the data across many independently-controlled storage nodes using cryptographic threshold schemes, with the customer firm holding the encryption keys. The threat surface and breach impact differ structurally between the two models.

Because aggregating identity documents from millions of customers into a single operator's infrastructure creates a target whose value grows with scale, while defensive cost does not scale at the same rate. The defensive ratio worsens as the database grows, eventually crossing the threshold where attacker investment becomes economically rational. The 2023-2025 breach cycle is the evidence.

Different operational mechanisms, same architectural pattern. Sumsub had 18-month undetected access to a central database. IDmerit had over a billion records exposed through a misconfigured cloud layer. Coinbase had third-party contractor access exploited through social engineering. In each case, the architectural property that enabled the breach is the centralised data layer itself.

No. Zyphe has no master key and no ability to decrypt customer data unilaterally. The customer firm holds the encryption key. A successful compromise of Zyphe's infrastructure yields shards from a 29-of-100 threshold scheme that are useless without the customer's key.

Ten architectural questions, summarised: do you hold a central database, who holds the keys, what is the breach impact, what is the residency model, how would a malicious operator employee access data, how long would an undetected attacker have access, what is the post-breach remediation cost. Centralised vendors struggle to answer these in concrete terms.

Yes. Zyphe operates in production across the EU, UK, US, Singapore, and UAE with regulated firms in banking, fintech, crypto, marketplaces, government identity, and DAO infrastructure. Decentralised KYC satisfies the same regulatory frameworks (MiCA, FCA, MLR 2017, FinCEN, MAS, VARA, AMLA) as centralised KYC and produces a different breach exposure surface.

It walks through the architectural evaluation framework against the firm's current KYC vendor and current data exposure. The output is a written report the firm's CISO and CCO can take to the board. It is not a sales call; the result is sometimes "stay with your current vendor and add specific contractual remediation clauses", not "switch to Zyphe". (60 words)