Notes on Cryptography for Dummies (Part 2: Hashes)

By Great White Snark | Return to the Source | 20 Dec 2022

In part one, I covered some terminology and the different categories of ciphers. This part part picks up where that left off.

The Basics (Continued)

Part one got quite lengthy (as this part no doubt will) without completely covering the basics of cryptography.

Disclaimer: I write my more lengthy posts (especially tutorials) in MarkDown, since I tend to think in it. Converting it to use the formatting that Publish0x/TinyMCE supports is a manual process, since the editor has no MarkDown view option. Consequently, the odd formatting character or two might remain, despite my edits. However, this shouldn't have a noticeable impact on the information presented here.

Making a Hash of Things

The important idea behind hashes is that they are used for verification. If I calculate a hash (sometimes called a digest) of a block of bytes (such as a file), you should get the same result when you calculate the hash using the same algorithm. If the hash is different, then the data you have is not the same as what I have.

This is useful for doing integrity/parity checks on data sent across networks and is often used to confirm that application executable binary files have not been tampered with (such as having malicious code inserted) in transit or storage. It is also useful for secure storage of passwords. A system doesn't need to know/store your password when the hash will suffice, since using the same algorithm on a correctly-entered password will always result in the same hash. The system can simply hash an entered password and store the result on registration. Every time a user enters a password on login, the system hashes the password and compares it to the stored hash. That way, only the user knows what the plaintext is; it's not stored in an online database that could be breached.

Hashes come in two varieties:

An unsalted hash algorithm has only one input: the plaintext.
A salted hash algorithm has two inputs: a salt (key) and the plaintext.

Salted hashes tend to be more secure than unsalted hashes, provided that both the password and salt are of significant length (12 or more characters for the password and 16 or more bytes for the salt, depending on the algorithm used).

When generating a salt, one should always use a cryptographically secure random number generator (RNG). Using something such as a username or account creation timestamp (or part thereof) is bad practice as it offers no security advantage. There are also a number of ways to store the salt. One is to have it in a separate column from the hash, but it is also common to append/prepend it to the hash, often with a separator of some sort (typically a period or colon).

That's the theory, anyway. In practice, systems that use old and weak hashing algorithms (such as MD5 and SHA1, both of which are unsalted) are vulnerable to breaches. I have run lists of hashed passwords through hashcat and found the plaintext on more than one occasion (~5% of entries on average), presenting my findings to my bosses as motivation for modifying the relevant code to use salted SHA2/3 hashes. The passwords found usually tend to lack sufficient complexity/entropy to begin with, such as containing part of the user's name, initials or date of birth. This compounds the problem. (The zxcvbn library, for a number of languages, is a handy tool for analysing password strength. It can be fed a few additional data inputs, such as username and date of birth, to see if these appear in a supplied password.) The databases in question have been small, too (between 100 and 500 entries). Some databases (such as LinkedIn's, Yahoo's and Tumblr's) contain millions of entries. (I'm sure you've seen at least one news post about large hash/password databases being breached and being urged to change your passwords regularly.)

Watch the first part, about breaches. Those hashes weren't salted.

Exclusive Or (`XOR`/`^`)

The Exclusive-Or (XOR) operation is not an algorithm, but a mathematical function. Some people try to pass it off as a way to do encryption or hashing, but it is insufficiently secure as one, being very easy to crack/reverse if done only once (precisely because it is a simple operation).

The xor operation is a bit manipulator. (It works with binary code/values. Do not confuse it with a binary operator, which takes two inputs, although it is also that). What xor does is compare the sequence of bits (ones and zeros) in two values and, if the bits in a position are the same in both values, the result for that position is zero (0). If the bits are not the same, it places a one (1) in that position. For example:

plaintext = b"baby"  # = 01100010 01100001 01100010 01111001
key = b"data"  # = 01100100 01100001 01110100 01100001
ciphertext = plaintext ^ key  # ???? = 00000110 00000000 00010110 00011000
print (repr (ciphertext))

Don't worry if you're not familiar with binary logic and bit manipulation; it still confuses me. (I'm not going to go into an explanation of binary numbers and how characters are represented in binary, or number bases other than decimal; you can look that up if you're interested.)

Who cares?

That's all well and good, but why should you care? That's easy to answer: Without hashing, there would be no cryptocurrencies or mining. If you didn't choose your wallet addresses, they are most likely hashes. (Your Ethereum address is definitely a hash.) Transaction IDs are hashes. The miner software you use instructs your computer's CPU and/or GPUs to calculate hashes for blocks. Hashes are a big part of cryptocurrency and blockchain.

Since you're reading this on a blogging platform that pays out in crypto, I'm assuming that this information is interesting to you.

Post thumbnail: Photo by Polina Tankilevitch on Pexels

Resources

Cryptocurrency Blockchain Technology Cryptography Publish0xTutorial Hash

How do you rate this article?

Great White Snark

I'm currently seeking fixed employment as a S/W & Web developer (C# & ASP .NET MVC, PHP 8+, Python 3), hoping to stash the farmed fiat and go full Crypto, quit the 07:30-18:00 grind. Unsigned music producer; snarky; white; balding; smashes Patriarchy.

Return to the Source

Use the Force; read the source! This blog is mostly a collection of study notes on ASM, ASP .NET, Blender, BASIC, C/C++, C#, ChucK, Computer Architecture, Computer Literacy, CSS, Digital Logic, Electronics, F#, GIMP, GTK+, Haskel, Java, Julia, JavaScript (ES6+) & JSON, LISP, Nim, OOP, Photoshop, PLAD, Python, Qt, Ruby, Scheme, SQL (MySQL & SQLite), Super Collider, UML, Verilog, VHDL, WASM, XML. If I can learn it and make notes on it, I'll write about it. || Blog images copyright Markus Spiske and Pixabay