How common are CRC-16 collisions? Let’s find out with some Python magic.
Library’s used: crcmod .
from itertools import product import crcmod.predefined from math import factorial import string # Init variables found = False currLen = 1 txtTarget = "" crcTarget = 0 # Init functions def crc(data): crcInstance = crcmod.predefined.Crc("crc-16") crcInstance.update(str(data).encode("utf-8", "replace")) return crcInstance.crcValue def nCr(n,r): f = factorial return round(f(n) / f(r) / f(n-r)) # Get input txtTarget = input("Enter string to find CRC16 collisions for: ") crcTarget = crc(txtTarget) print ("CRC of target: " + hex(crcTarget)) availableChars = input(""" Enter level of detail: 1. Only lowercase letters (abc) 2. Letters (ABc) 3. Letters and numbers (ABc123) 4. Letters, numbers, punctuation and whitespace (ABc123 .,) """) if availableChars == "1": availableChars = string.ascii_lowercase elif availableChars == "2": availableChars = string.ascii_letters elif availableChars == "3": availableChars = string.ascii_letters + string.digits elif availableChars == "4": availableChars = string.printable else: print ("Incorrect choice, detail set to 1") availableChars = string.ascii_lowercase # Main loop while True: # Get new combinations print ("\nGenerating CRC's of all combinations of " + str(currLen) + " characters...") print (" (Total of " + str(nCr(len(availableChars),currLen)) + " combinations)") for possibleCombination in product(availableChars, repeat=currLen): currTxt = "".join(possibleCombination) currCrc = crc(currTxt) if currCrc == crcTarget: found = True if currTxt == txtTarget: print (" ORIGINAL FOUND (" + currTxt + ": " + hex(currCrc) + ")") else: print (" COLLISION FOUND: " + currTxt + ": " + hex(currCrc)) currLen = currLen + 1 if found: if input("\nContinue searching? Y/N ").upper() == "Y": found = False else: break else: print (" Nothing found.")
In the start of this script I generate the target CRC and define the available characters to create combinations with. Because Python 3.x uses unicode strings by default and the crc lib doesn’t like that, I have to convert it to an utf-8 byte array. (See highlighted line 15.)
After the initialization, I just loop through all available combinations and check their CRC’s for a match. When something is found, you can abort or continue searching for more.
Example results:
Enter string to find CRC16 collisions for: sample text CRC of target: 0xb40d Enter level of detail: 1. Only lowercase letters (abc) 2. Letters (ABc) 3. Letters and numbers (ABc123) 4. Letters, numbers and special chars (ABc123# %€) 1 Generating CRC's of all combinations of 1 characters... (Total of 26 combinations) Nothing found. Generating CRC's of all combinations of 2 characters... (Total of 325 combinations) Nothing found. Generating CRC's of all combinations of 3 characters... (Total of 2600 combinations) Nothing found. Generating CRC's of all combinations of 4 characters... (Total of 14950 combinations) COLLISION FOUND: cgxj: 0xb40d COLLISION FOUND: ckxo: 0xb40d COLLISION FOUND: csxe: 0xb40d COLLISION FOUND: scti: 0xb40d COLLISION FOUND: sotl: 0xb40d COLLISION FOUND: swtf: 0xb40d COLLISION FOUND: wgwj: 0xb40d COLLISION FOUND: wkwo: 0xb40d COLLISION FOUND: wswe: 0xb40d Continue searching? Y/N N
So that’s 9 collisions in 17901 combinations in ~5 minutes.