As a Python enthusiast, I’m always on the lookout for efficient ways to handle data and ensure its integrity. One such method that often comes into play is creating a hash of a file, and in this case, specifically a SHA256 hash. In this blog post, I’ll guide you through the process of generating a SHA256 hash of a file using Python. It’s a handy skill to have, especially when it comes to verifying the integrity and authenticity of files. So, let’s dive in and uncover the power of SHA256 hashing in Python!
Encryption and hashing have served as the foundation for new security modules, among other network security developments. One of the most used hash algorithms is the Secure Hash Algorithm(SHA) with a digest size of 256 bits, or SHA 256. Although there are numerous variations, SHA 256 has been the most often used in practical applications. There are weaker predecessors to SHA3 like MD5, SHA1, SHA2. Interested in knowing the difference between SHA1, SHA2 & SHA3, this will give you a great insight on how SHA has evolved over the years.
SHA-256 is a part of the SHA 2 family of algorithms, where SHA stands for Secure Hash Algorithm. It was a joint effort between the NSA and NIST to introduce a successor to the weaker SHA 1 family. SHA2 was published in 2001 and has been effective ever since.
The hash function generates the same output hash for the same input string. This means that, you can use this string to validate files or text or anything when you pass it across the network or even otherwise. SHA256 can act as a stamp or for checking if the data is valid or not.
The 256 in the name SHA256 refers to the final hash digest value, meaning that regardless of the amount of plaintext or cleartext, the hash value will always be 256 bits.
For example –
Input String | Output Hash |
---|---|
hi | 8f434346648f6b96df89dda901c5176b10a6d83961dd3c1ac88b59b2dc327aa4 |
debugpointer | ce7a00e4bf3e576bceb605c846923a634051ca695ff8a3270af998959e72d265 |
computer science is amazing! I love it. | a3f2b30d5d6ef9006dd09741aa90d595d8a90666f3fc3c3ae4bf1c1e9a8e3042 |
Check this out – If you are looking for SHA256 hash of a String.
Create SHA256 hash of a file in Python
SHA256 hash can be created using the python’s default module hashlib
.
Incorrect Way to create SHA256 Hash of a file in Python
But, you have to note that you cannot create a hash of a file by just specifying the name of the file like this-
# this is NOT correct
import hashlib
print(hashlib.sha256("filename.jpg".encode('UTF-8')).hexdigest())
Output of the above code-
e57e2454280342b30a5d32e814acdefb7f454120203e5ae1c9ae69980ef6253d
The above value is NOT the SHA256 hash of the file. But, it is the SHA256 hash of the string filename.jpg
itself.
Correct Way to create SHA256 Hash of a file in Python
You have to read the contents of the file to create SHA256 hash of the file itself. It’s simple, we can just read the contents of the file and create the hash.
The process of creating an SHA256 hash in python is very simple. First import hashlib, then encode your string that you want to hash i.e., converts the string into the byte equivalent using encode(), then pass it through the hashlib.sha256()
function. We print the hexdigest
value of the hash m
, which is the hexadecimal equivalent encoded string.
import hashlib
file_name = 'filename.jpg'
with open(file_name) as f:
data = f.read()
sha256hash = hashlib.sha256(data).hexdigest()
SHA256 Hash of Large Files in Python
In the above code, there is one problem. If the file is a 10 Gb file, let’s say a large log file or a dump of traffic or a Game like FIFA or others. If you want to compute SHA256 hash of it, it would probably chew up your memory.
Here is a memory optimized way of computing SHA256 hash, where we read chunks of 4096 bytes (can be customised as per your requirement, size of your system, size of your file etc.,). So, in this process we sequentially process the chunks and update the hash. So, in this process, let’s say there are 1000 such chunks of the file, the hash_sha256
is updated 1000 times.
At the end we return the hexdigest
value of the hash m
, which is the hexadecimal equivalent encoded string.
import hashlib
# A utility function that can be used in your code
def compute_sha256(file_name):
hash_sha256 = hashlib.sha256()
with open(file_name, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_sha256.update(chunk)
return hash_sha256.hexdigest()
Compare and Verify SHA256 hash of a file using python
You need to verify the SHA256 hash at the server or at some other point or logic in your code.
To verify the SHA256 hash you will have to create the SHA256 hash of the original file again.
Then compare the original SHA256 value that the source has generated and SHA256 that you generate.
import hashlib
file_name = 'filename.jpg'
original_sha256 = 'e57e2454280342b30a5d32e814acdefb7f454120203e5ae1c9ae69980ef6253d'
with open(file_name) as f:
data = f.read()
sha256_returned = hashlib.sha256(data).hexdigest()
if original_sha256 == sha256_returned:
print "SHA256 verified."
else:
print "SHA256 verification failed."
A more complex hash can be created in SHA 2 family using the SHA-512 algorithm in python for a file.
The process of SHA256 creation and verification is easy as we discussed above. Happy Coding!
I’m glad that you found the content useful. And there you have it! We’ve successfully traversed the path of creating a SHA256 hash of a file using Python. It’s an essential skill to possess in today’s digital landscape, where data security is paramount. With this newfound knowledge, you can confidently verify the integrity and authenticity of files, ensuring that your data remains unaltered and trustworthy. So go ahead, implement this technique in your Python projects, and take your data handling to the next level. Happy hashing, Happy Coding.