Digital fingerprint technology is designed to protect large documents, the contents of which do not change, or change little. The technique used by InfoWatch Traffic Monitor to detect digital fingerprints can automatically identify phrases from sample documents containing confidential information that appears in an analyzed text.
During system set-up, a database of confidential documents is put together, which is then used to create the digital fingerprints. On the basis of the digital fingerprints of the sample documents, a database of reference documents is established. In digital fingerprinting technology, the parameters used to establish the reference base of sample documents are selected in order to ensure the best possible detection with the minimum size of reference base. It is impossible to restore the source text of the sample documents on the basis of the reference base. Thus all confidential documents transferred in the system will be reliably protected. Even if those with malicious intent gain access to the InfoWatch Traffic Monitor digital fingerprint reference base, data leaks will not occur.
The key difference between the digital fingerprint technology used by InfoWatch Traffic Monitor and the fingerprints used in other companies’ products is the comprehensive processing of the text being analyzed, including linguistic support.
This approach substantially increases the quality of detection of confidential information and ensures that the technique is reliable not only in cases of simple changes to a text (such as changes in formatting, littering the text with unnecessary spaces or punctuation, etc.) but also in more complex instances, for example:
- different spellings of the same word (for example, changes in spelling in different registers, the letters е/ё in Russian, variations in the use of ligatures in German, etc. );
- different forms of the same word and different spellings of compound words, technology to detect typos/errors and transliteration, etc.
Advantages of the Technology:
- protects static documents, or documents that change rarely;
- detects not only exact matches, but also modified fragments of text;
- supports linguistic processing of analyzed text (including for multilingual documents), morphological analysis capability;
- automatic recognition of similarities between documents and identification of quotes from sample documents.