Document-based databases – why they are so helpful

Classical relational databases usually work with keys. The most important among them is the primary key. This characteristic can uniquely define a data set and link it to others. One such characteristic would be, for example, the customer number, which is assigned exclusively to each client of a business. But the world out there is sometimes too complex to be managed with relational databases. That’s why there is software that works document-oriented. How this works and why it is so important, we explain here.

Who was where with whom and when?

In 2016, the offshore service provider Mossack-Fonseca was hacked in Panama. A company that helped rich people launder money and hide black money. Countless digital documents were exposed. Reporter networks from all over the world helped in untiring manual work to sift through and evaluate them. This was done with the help of document-oriented databases. You have to imagine it like this:

Reporter A finds a paper proving that on May 20th, 2012 Mr. XYZ met with Mr. ABC in Rio.

Reporter B finds a document proving that on July 1, 2013 Mr. XYZ transferred money to Mr. FGH in Rome.

This document now contains several important details: date, name and location. Anyone who has had the pleasure of working for an employer who has HCL Notes licenses knows that these things can be easily connected to it.

All documents are scanned. Places with important information can be linked to each other at corresponding passages or provided with a feature. If, for example, there are further findings about May 20, 2012, these can be identified by the date. If Mr. XYZ appears again in another place, his name can act as a link between all the documents concerned.

In the end, one then receives a database in which countless papers can be evaluated according to certain criteria. For example, you can retrieve all those in which Mr. ABC is mentioned. But you can also look at whether there were any more transactions on 20 May 2012. It is also possible to check whether Mr. FGH also crosses the paths of Mr. ABC at some point. With such markings, a directory is created about who had contact with whom, when and where.

With the help of this method, the journalists were later able to allocate all the evidence found to all persons exactly. They could trace them crisscross connections between important protagonists. It’s not without reason that this topic has made such waves and convicted numerous tax evaders.

This type of database is a slightly different structure than relational databases. There is no completeness or wrong data sets. You simply take the documents that are available to you and then you can link them together.

Especially with e-mails such analyses are quite easy. They are available in digital form and can be scanned for keywords immediately. With photographed papers the matter is a bit more difficult. But in any case, such tools are very powerful and can be useful to users in many areas. Especially police investigators, historians or even journalists.