What is binary diffing ?
Binary diffing is the comparison between two or more binary files (in this case executable files) using different heuristics.
What kind of heuristics are used ?
In most cases the heuristics are oriented to graph isomorphism.
For what is used the binary diffing ?
Binary diffing could be used to detect code changes between different versions of the same program, to detect code theft or virus code mutation.
What is the main problem with binary diffing ?
Well, if we need to compare binary files ( executables ) it's because we don't have the source code. That's a big problem. :-(.
The next problem (the real one) is that you need to compare binaries, and it's not the same as comparing source codes (e.g C/C++/Delphi/etc).
In the compilation and linking process (".C" --> ".OBJ" --> ".EXE") much info is lost.
Comparing each byte of the files wouldn't work, because the compilation process changes multiple bytes in the binary (functions get moved around, different registers are chosen to store variables, different compiler switches may have been used).
So the problem changes from comparing many simple code sentences (e.g "file1.c" vs "file2.c") to disassembling the binary, re-building it's functions based on the assembly code, converting this functions to graphs and finally comparing and matching them, partially or completely.
What tools exist to do that ?
* turbodiff (free)
* bindiff (commercial)
* patchdiff2 (free)
* darungrim (free)
To be continued (Part 2) ...