The software and text similarity tester SIM

Dick Grune

SIM tests lexical similarity in natural language texts and in programs in C, C++, Java, Pascal, Modula-2, Miranda, Lisp, and 8086 assembler code. It is used
  • to detect duplicated code in large software projects, in program text, in shell scripts and in documentation;
  • to detect plagiarism in (software) projects, educational and otherwise.
SIM 3.0.2 is available as C sources and as MSDOS binaries. (The C sources for the previous version, 2.89, are still available here.)
There is a Unix-style manual page.

The software similarity tester is very efficient and allows us to compare this year's students' work with that collected from many past years (much to the dismay of some, mostly non-CS, students). Students are told that their work is going to be compared, but some are non-believers ...

We are not afraid that students would try to tune their work to the similarity tester. We reckon if they can do that they can also do the exercise.

Since this piece of handicraft did not qualify as research, there are no international papers on it. The work was described in Dutch in Dick Grune, Matty Huntjens, Het detecteren van kopieën bij informatica-practica, Informatie, 31, 11, Nov 1989, pp. 864-867 ( lit. ref.)). An English translation of the paper is also available. There is a (probably obsolete) terse technical report about the internal workings of the program.

[Home Page]
The software and text similarity tester SIM / Dick Grune /