Sometimes need arises in order to check if some data has been tampered or not by chance or because a bilious user. Imagine downloading a file from the internet where the owner of the file has placed another file with the hash signature of the former. You want to check if the file you downloaded has the same signature as the one the author claims to have. If so, then you can be pretty sure it hasn’t been tampered.
Here is an implementation in Python 3.x to hash files and strings of any given size. The first implementation is probably more understandable by anyone having some programming experience. The latter is more idiomatic and makes use of the built-in function iter(callable, sentinel)
(documentation here). By default the algorithm used in order to do the hash is SHA1, but another one can be easily used just passing a new instance of it to the hasher_alg
parameter.
Basic hash utils file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | # hashutils.py
import hashlib
from io import BytesIO
BLOCKSIZE = 65536
def digest_file(filename, hasher_alg=hashlib.sha1()):
hasher = hasher_alg
with open(filename, 'rb') as afile:
buf = afile.read(BLOCKSIZE)
while len(buf) > 0:
hasher.update(buf)
buf = afile.read(BLOCKSIZE)
return hasher.hexdigest()
def digest_string(content, encoding='utf8', hasher_alg=hashlib.sha1()):
hasher = hasher_alg
content_io = BytesIO(content.encode(encoding))
buf = content_io.read(BLOCKSIZE)
while len(buf) > 0:
hasher.update(buf)
buf = content_io.read(BLOCKSIZE)
return hasher.hexdigest()
|
Idiomatic hash utils file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # hashutils_idiomatic.py
import hashlib
from io import BytesIO
BLOCKSIZE = 65536
def digest_file_pythonic(filename, hasher_alg=hashlib.sha1()):
hasher = hasher_alg
with open(filename, 'rb') as afile:
for chunk in iter(lambda: afile.read(BLOCKSIZE), b''):
hasher.update(chunk)
return hasher.hexdigest()
def digest_string_pythonic(content, encoding='utf8', hasher_alg=hashlib.sha1()):
hasher = hasher_alg
content_io = BytesIO(content.encode(encoding))
for chunk in iter(lambda: content_io.read(BLOCKSIZE), b''):
hasher.update(chunk)
return hasher.hexdigest()
|
I’ve based this post mainly from these answers given on StackOverflow:
And that’s it! Happy hashing!