Hey, pabs3! Actually this is not using a rolling checksum for detection but rather a combo of language model, checksums, automatons, bitvectors, inverted indexes and multiple sequences alignment (e.g. a specialized diff). I put some docs there to explain the approach at ahttps://github.com/nexB/scancode-toolkit/blob/develop/src/li...