The EMV consortium released several standards detailing how “network” tokenization should be handled. There is now a general consensus within the consortium that tokenization could be the next major task for EMV payments. The PCI council also issued several standards and guidelines indicating how merchants should handle the reduction of the PCI-DSS scope by using tokenization along with what methods and technologies should be used.
The clear trend for defeating data breaches is to substitute a transaction’s original data with a ‘useless’ token. For example, tokenization is becoming de facto condition to securely conduct credit card transactions in a mobile environment and has gained the confidence of the users.
Here we will discuss the technologies behind a successful tokenization implementation.
Generating a Token
Generating a token involves several techniques:
- A one-way mathematical cryptographic hash, similar to SHA-256 or MD-5 for instance;
- A random number generator;
- A ciphering function.
A token must be either reversible or irreversible. This is important because in some uses it must be clearly proved  that the token cannot be reverted to its original value.
In the case of a one-way hash, the reversibility will be obtained by the mathematical property, usually entropy, of the hashed data, and the impossibility to build “reversed” tables that could be searched with Rainbow-Tables algorithms.
For instance, in order to check that a token T corresponds to a PAN P, it is enough to compare H(P) and T where H is the hashing function. There is absolutely no way to reverse the token because it has not been stored in a dictionary or table with a corresponding PAN<-> TOKEN.
With randomly generated tokens, the reversibility will not be guaranteed. This is because the tokens must be stored in a table so that they can be compared back later on.
The same is true for tokens generated with encryption functions. Even if the tokens do not have to be stored in a table, knowledge of the encryption key will allow decryption of the token to its original value.
Technologies involving irreversible hashes imply that it must be guaranteed that hashes cannot be reversed. It has, however, been shown that irreversible cryptographic hashes could be reversed.
For instance, what if we “simply” want to generate tokens from PANs using a hash function, say SHA-256. There are about 10^15 possible PAN values , so the size of the spaces of the tokens will be about 10^15, or 1.000 Tera-hashes.
Non-Asic hardware like the A10-5800K can compute around 26.25 Mhash/second/core. We would need 38,095,238 seconds/core to compute the entire dictionary. With 100 cores, that means just about 110 hours (4.5 days!)
If we use specialized ASIC hardware similar to what is used for Bitcoin mining and is adapted for SHA-256, we should get computation rates of the order of magnitude of Tera-Hashes/second.
Therefore, tokenization using such algorithms would be quite useless. An attacker could easily reverse the tokens quite using a giant, exhaustive dictionary of thousands of terabytes. Presently, this scenario is something that is not out of the realm of possibility.
Additional techniques then to be sought, like using a seed parameter or computing an HMAC. However, generating truly non-reversible tokens for the 16 digits of a PAN will not be a simple task. It must be impossible to guess the tokens and no bias should exist.
When a key is used to generate a token, changing the key must change the token generation in following the avalanche effect.
With a randomly-created token, you must make sure that the random generator is truly random, unpredictable, and succeeds with at least passing the die hard or die harder tests for randomness.
For instance, the Cryptomathic CSG token generation platform uses a parametric data tokenization system which is based on a hybrid approach to generate tokens. The parametric data tokenization system combines format preserving encryption (FPE) and data from database storage.
With an encryption-based token, the keys used for encryption must be stored on a secure element, such as protected memory, HSM, smartcard, or a trusted module. When keys are used, they should follow the minimum key length requirements and a key lifecycle policy.
HSMs and Tokenization
A merchant implementing tokenization technology will have two choices:
- Using a Tokenization Service Provider (TSP);
- Implementing its own tokenization system.
Many merchants may be tempted to use the second option. This eliminates the need for third-party tokenization services that will increase transaction costs. A merchant choosing 2) will usually have to select a cloud-based HSM to avoid the burden of installing hardware HSMs in various geographical websites where data retention laws may be different. Encryption keys will need to be separately managed from tokens or any other encrypted data, thus providing an added level of security against cyber-attacks.
Access to the tokenization should be done via a modern API, e.g. cloud-friendly and RESTful, such as defined by the Payment Services Directive PSD2 .
The software token platform should be run in a trusted environment. This means that the token generator data cannot be accessed by other software components or that there is a parallel process permanently checking the integrity of the software token generator platform.
The implementation of a completely secure token service should be left to TSPs since it requires a lot of skill and know-how. Additionally, formal security models should be used with the token service, as defined by the PCI Council Guidelines. These include the Bell—LaPadula Confidentiality Model or the Brewer—Nash (Chinese Wall) security model. The PCI Council also recommends modelization of token systems by a formal language and then creating a security proof.
Usually, a single PAN will be linked to multiple tokens, which will belong to different domains. But it is not easy to map the right token to the right PAN in a multi-domain token architecture.
The tokenization ecosystem consists of issuer banks, clients, merchants, token requestors, and token service providers. And they all interact and overlap with each other.
For instance, someone might be a client of several merchants that are all using the same token provider. In such a configuration, the token provider must perform a secure and rigorous matching, identification, and verification of the token requester to deliver the right PAN when receiving a token request.
The combination of token requests between interacting actors is clearly a technological challenge since it requires a highly-trusted and efficient token platform. Domain control consists in limiting the token usage to a specific merchant, channel, or spending limit by applying a set of several parameters. Using an intercepted token outside of its domain parameters would flag the token as fraudulent and makes the token useless.
The Token Data Vault
The token vault or card data vault (CDV) is where tokens and original values are stored. This must be a highly-secure vault with strict access controls in place.
It can be an “isolated” server with no access to the internet and only reachable from the outside world by a second server acting as a “gateway” or “fortress”. The connection between them may be a secure data cable.
The PAN must be encrypted inside the card vault. Role-based access control (RBACs) should be used. All backups or mirrored versions should be protected in an equivalent way.
Typically, the card vault will be the most attractive target for an attacker because it contains all the original values. It is evidently subject to the PCI-DSS scope. Therefore, following the terms of the PCI Council, “additional security controls above and beyond those required in PCI DSS may be warranted”[for the CDV]. All sorts of security procedures should be ready following an intrusion.
The Complex Technological Challenges of Tokenization
As described above, tokenization presents several technological challenges, including:
- Generating random tokens in a secure way;
- Securely generating tokens with formal security models and integrity checks;
- Managing tokens in a multi-domain environment within a complex ecosystem;
- Securing tokens into a card data vault.
Furthermore, encryption must be used in addition to tokenization. Tokenization does not replace encryption. Instead, it creates protection against data leaks. Beside the mandatory encryption and authentication mechanism as described by the EMV and PCI-DSS standards, encryption should be used for tokens. For example, when a token requester receives a token from the card vault, it should be inside a secure, encrypted channel.
Additional strategies may involve recursive tokenization. The tokenization architecture is obviously incredibly rich in choices and possibilities. The end-goal remains the non-divulgence of the data in transactions.
References and Further Reading
- More articles on tokenization (2018 - today), by Martin Rupp, Dawn M. Turner, and more.
- More articles on Crypto Service Gateway (2018 - today), by Chris Allen, Jo Lintzen, Terry Allen, Rob Stubbs, Stefan Hansen, Martin Rupp, and more.
- EMV Payment Tokenisation Frequently Asked Questions (FAQ) – General FAQ (2017), by the EMV Consortium
- PCI DSS Applicability in an EMV Environment, A Guidance Document, Version 1 (5 October 2010), by the PCI Security Standards Council
-  For instance, Tokenization Product Security Guidelines, PCI Security Standards Council, Irreversible Tokens, Summary of Tokenization Guidelines/Best Practices, ”IT 1A The process/mechanism/algorithm used to create the token provably is not reversible.”
-  The last digit is always obtained by Luhn’s number. There are in fact even less combinations than 10^15 because of many “fixed” banking formats like the BINs or Mastercard PANs starting with ‘5’ and Visa PANs starting with ‘4’.
-  Payment services (PSD2) - Directive (EU) 2015/2366 (2015) by the European Commission