Following up on the Tezos DigiSign article, I wanted to elaborate a bit more about the technical concept of hashing algorithms which plays an important role in authenticating digital files and documents.
As explained in the previous article, digitally authenticating documents can be a very important, and sometimes necessary step to add proof of authenticity and the original, unaltered state of certain documents or files.
How does it work
The key is to fixate any document or file in such a way, that is is immutably registered and simple to verify. For example a legal document. You want to be able to proof the content today, is the same, unaltered version as it was when the content was signed and agreed on. To do this, you need to reduce the document to a simple file, like a series of numbers and characters, and you need to be able to trace this back to the moment of signing. Hashing algorithms are the solution.
What is a hashing algorithm?
A hashing algorithm, or hash function, is a mathematical formula that reduces an amount of data to a smaller amount of data with a set amount of characters. No matter how large the input is, the outcome is always the same size. So whether your input is a digital movie, an mp3, or a few lines of text, the output will always have the same amount of characters. What that amount is, depends on the hashing function that is used. Here are some examples, using the hashing algorithm “SHA256”:
If the input is “Blockchain”
The output is: 625da44e4eaf58d61cf048d168aa6f5e492dea166d8bb54ec06c30de07db57e1
If the input is the paragraph above, (From “A hashing […] algorithm “SHA256”:”)
The output is: f81037a9d1654450c63c1f2a8bd1cb3532e4d91112d0c6848652acb4dfeaab87
Even small changes to the input will completely change the output. For example:
So no matter the input, the output is always a set amount of characters (for SHA-256, this is always 64 characters.) You can play around with this concept here: https://www.xorbin.com/tools/sha256-hash-calculator
Hashing algorithms work for anything in digital form: movies, music, pictures, text, anything. No matter the size, the output is always 64 characters. (For SHA256)
SHA256 is collision resistant. This means that it should be close to impossible that the outcome of two different inputs will have the same output. Close to impossible means in this case: it’s possible, but the chances are so slim that in practice it can be referred to as impossible.
The documents stay private
You can store these hashes on a public blockchain without the risk that people can find out what the original content of the document is. SHA256 is a one-way hashing function. This means that it should be close to impossible to derive the input from the output, even if you would run the calculations through a supercomputer. Close to impossible means here that it would take far longer than a human lifetime to do so. It is theoretically possible, but it takes so long that it just doesn't matter.
Storing them on blockchain and the advantages this has, is explained in my previous article.
Authentication of the documents
If you want to verify if a digital file is authentic and unaltered, you use the hashing algorithm, calculate the hash of the document and compare the outcome with the signature that represents the original.
The beauty is in the simplicity.
If these type of concepts interest you, I'd like to suggest my series about blockchain and how quantum computers could pose a serious future threat to blockchain security. I discuss hashing algorithms, explain how private- public key cryptography works, and how the latter should be replaced with quantum resistant cryptography eventually. I discuss the difference between centralized systems / decentralized systems and why blockchain will face extra challenges. Why hashing public keys does not work as a protection against quantum hacks, and how the time-frame of switching towards a completely quantum resistant cryptocurrency is extremely underestimated. Here's part one: Quantum resistant blockchain and cryptocurrency, the full analysis in seven parts.