As a Python developer, I understand the importance of data integrity and security. In today’s digital landscape, ensuring the authenticity and integrity of files is paramount. That’s why I’m excited to share with you the process of creating a SHA-512 hash of a file in Python. By leveraging the power of this cryptographic algorithm, we can generate a unique hash value that serves as a digital fingerprint for file verification. Join me on this journey as we dive into the world of file hashing in Python.
Encryption and hashing have served as the foundation for new security modules, among other network security developments. One of the most used hash algorithms is the Secure Hash Algorithm(SHA) with a digest size of 512 bits, or SHA 512.
Introduction to SHA-512 Hashing
The SHA-512 hashing algorithm is currently one of the best and secured hashing algorithms after hashes like MD5 and SHA-1 has been broken down. Due to their complicated nature it is not well accepted and SHA-256 is a general standard, but the industry is slowing moving towards this hashing algorithm.
SHA-512 is a part of the SHA 2 family of algorithms, where SHA stands for Secure Hash Algorithm. It was a joint effort between the NSA and NIST to introduce a successor to the weaker SHA 1 family. SHA2 was published in 2001 and has been effective ever since.
The hash function generates the same output hash for the same input string. This means that, you can use this string to validate files or text or anything when you pass it across the network or even otherwise. SHA-512 can act as a stamp or for checking if the data is valid or not.
SHA-512 hasn’t gained the popularity that SHA-256 is experiencing or even other types of newer hashing algorithms when it comes to real-world use in blockchain. That being said it does have a few non-blockchain applications that are noteworthy.
The 512 in the name SHA-512 refers to the final hash digest value, meaning that regardless of the amount of plaintext or cleartext, the hash value will always be 512 bits.
For example –
Input String | Output Hash |
---|---|
hi | 150a14ed5bea6cc731cf86c41566ac427a8db48ef1b9fd626664b3bfbb99071fa4c922f33dde38719b8c8354e2b7ab9d77e0e67fc12843920a712e73d558e197 |
debugpointer | 7e2dd654680000d5e27bf67e5a2c440d122da180a7ee4f1dd792a28d17c8a2d34fafa7f9f6e5f837607d41521de42f628e8fa48c0be8a86953d9bc20006ca1fc |
computer science is amazing! I love it. | d46bc2b8b0e30ee6f2bfe42826a01d550451223e36d8ea73e46f283eeed3514480b16681ebb9ad8d72c7f9247b5711e5f0797578200afe8229abf86b6ade79cd |
Check this out – If you are looking for SHA-512 hash of a String.
Create SHA-512 hash of a file in Python
SHA-512 hash can be created using the python’s default module hashlib
.
Incorrect Way to create SHA-512 Hash of a file in Python
But, you have to note that you cannot create a hash of a file by just specifying the name of the file like this-
# this is NOT correct
import hashlib
print(hashlib.sha512("filename.jpg".encode('UTF-8')).hexdigest())
Output of the above code-
2e0cf3b3ec65868637a1a0ef99ad53a14ab6a24bea42c5ef1957d81a39506acd8d0655b868438f45feff9d31376222da87b194874b4a225f9936d300acd6156e
The above value is NOT the SHA-512 hash of the file. But, it is the SHA-512 hash of the string filename.jpg
itself.
Correct Way to create SHA-512 Hash of a file in Python
You have to read the contents of the file to create SHA-512 hash of the file itself. It’s simple, we can just read the contents of the file and create the hash.
The process of creating an SHA-512 hash in python is very simple. First import hashlib, then encode your string that you want to hash i.e., converts the string into the byte equivalent using encode(), then pass it through the hashlib.sha512()
function. We print the hexdigest
value of the hash m
, which is the hexadecimal equivalent encoded string.
import hashlib
file_name = 'filename.jpg'
with open(file_name) as f:
data = f.read()
sha512hash = hashlib.sha512(data).hexdigest()
SHA-512 Hash of Large Files in Python
In the above code, there is one problem. If the file is a 10 Gb file, let’s say a large log file or a dump of traffic or a Game like FIFA or others. If you want to compute SHA-512 hash of it, it would probably chew up your memory.
Here is a memory optimized way of computing SHA-512 hash, where we read chunks of 4096 bytes (can be customised as per your requirement, size of your system, size of your file etc.,). So, in this process we sequentially process the chunks and update the hash. So, in this process, let’s say there are 1000 such chunks of the file, the hash_sha512
is updated 1000 times.
At the end we return the hexdigest
value of the hash m
, which is the hexadecimal equivalent encoded string.
import hashlib
# A utility function that can be used in your code
def compute_sha512(file_name):
hash_sha512 = hashlib.sha512()
with open(file_name, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
hash_sha512.update(chunk)
return hash_sha512.hexdigest()
Compare and Verify SHA-512 hash of a file using python
You need to verify the SHA-512 hash at the server or at some other point or logic in your code.
To verify the SHA-512 hash you will have to create the SHA-512 hash of the original file again.
Then compare the original SHA-512 value that the source has generated and SHA-512 that you generate.
import hashlib
file_name = 'filename.jpg'
original_sha512 = '2e0cf3b3ec65868637a1a0ef99ad53a14ab6a24bea42c5ef1957d81a39506acd8d0655b868438f45feff9d31376222da87b194874b4a225f9936d300acd6156e'
with open(file_name) as f:
data = f.read()
sha512_returned = hashlib.sha512(data).hexdigest()
if original_sha512 == sha512_returned:
print "SHA-512 verified."
else:
print "SHA-512 verification failed."
The process of SHA-512 creation and verification is easy as we discussed above. Happy Coding!
In conclusion, we’ve explored the fascinating world of creating SHA-512 hashes of files using Python. Armed with this knowledge, you now have the ability to verify the integrity of your files, detect any modifications, and ensure data authenticity. The SHA-512 algorithm provides a robust level of security, making it a reliable choice for file hashing. Remember to implement proper error handling and consider the computational cost of hashing large files. With these considerations in mind, you can confidently incorporate file hashing into your Python projects and enhance the security of your data. Happy hashing! Happy Coding.