Signature based detection is one of the first techniques used to identify potentially malicious software. Signatures are often represented in the form of a hash (MD5, SHA1, SHA256).
This is a great start into your investigation, but there are limitations. For example, an attacker could make small change in their code and the signature value would change. By definition, this is exactly what Polymorphic Malware does. It constantly changes to evade signature based detection mechanisms. See the examples below of some simple code with one change that is then compiled into an executable.
We’ll use MD5 in this example to keep the signatures short. In your investigations I’d recommend SHA1 or SHA256.
Notice how the single addition of the letter “s” changes the MD5 hash value. Advanced malware could easily slip through simple signature based detections. This is where Fuzzy Hashes come into the picture.
We can use a tool called SSDeep to helps us identify software that is similar. The first step is to create a file that contains our hashes. In this example the file is called fuzzyhashes.ssd. The output of the file is below.
Using the fuzzyhashes.ssd file, we can use SSDeep to perform analysis of these values and provide us with a percentage of similarity of these files. Additional files were included and will be explained later in this post.
Notice in the output above that although there was a small change in the source code of hello2.c, SSDeep was able to show the two executable were similar. Please note, that due to the simplicity of the source code, a small change magnified the percent reported.
File1 & File2 were of larger size and thus the change in percentage was smaller.
As with the source code above, the differences were very small. Notice the extra period.
I hope this helps others better understand Fuzzy Hashes. My goal is to share my knowledge, as many other have done with me.
Thank you for reading!