Hash(ing)


Hashing is the process of converting information into a sequence of letters and numbers. It is a form of cryptography.

A hash function is the specific type of function used for hashing. It is the conversion function (from information input to string output). It is computationally easy to run a hash function because it is a defined algorithm.

A hash is the output of a hash function. The hash function converts information into a unique 64 character hexadecimal string. In other words, it is a string of text. The hash function used by Ethereum adds a “0x” to the beginning (making it 66 characters).

Hashing accomplishes several important goals.

First, hashing keeps the input information private. The process of hashing cannot be reversed to produce the information. Hashing is one-way only. The input produces a unique output, but the output cannot be used to produce the input. For instance, trying to guess the input by incrementally guessing pieces of it to calibrate the output will not work. There is no way of “getting close” in the guessing. In this way, the input information is kept secret.

Second, hashing produces a hash that can be used for public purposes. The hash of the information can be shared with others without sharing the underlying information. In this way, the information is private but the hash can be made public.

Hashing has several features that help it accomplish these goals.

A hash function can be fed any information (input). The “information” that is input into the hash function can be text or digital image. Remember that even digital images are made of 0s and 1s. In general, it can be any type of computer file. You can think of this as any form of data. An important aspect of this feature is that the size of the input (text, file, etc.) is not limited or prescriptive in any way. The input can be small, like the two words “hello world,” or large like a .png digital image. The size of the output will always be 64 characters.

Hashing a given input will always produce the same 64 character string (output). In this way, hashing is a deterministic function. A given input of information will result in a unique hash. In other words, hashing the same information at different points in time will produce the same hash and hashing different information will never produce that unique hash. You can think of the hash as the unique “fingerprint” of the input information. It is a unique identifier.

These features have several useful implications.

A hash function can have multiple inputs and the inputs can be hashes themselves. Two hashes can be hashed together to produce a new single hash. This would be hashing of other hashes. Creating layers of hashing results in a tree structure where the initial hashes are the leaves and the final hash is the root. This is the basic premise of a Merkle tree. Merkle trees are named after Ralph Merkle, who proposed the idea in a 1987 paper titled “A Digital Signature Based on a Conventional Encryption Function.” The Merkle tree allows a user to verify a specific transaction without downloading the whole blockchain.

In this way, blockchain uses hashing so that a single block references all previous blocks. This forms a blockchain that cannot be easily broken and changed. It is a record of hashed data. The blockchain is an immutable history of all previous transactions.

The “Hash Function” video on ETH.Build provides a good overview of hashing.