Part of my work on refactoring legacy code was to find a way to identify code clones (duplicates) in large data repositories; and during my experiments on refactoring, I found that it would be much faster to follow this order:
- Remove dead code (CSRS ~ 600 LPH, also reduces 10% of duplicate code)
- Remove exact clones (CSRS ~ 300-400 LPH)
- Remove similar clones
CSRS is the Cost Size Reduction Speed, measured in LOC per Hour (LPH)
Exact clones: are clones which are typically the same (probably they may differ in white space only)
Similar clones: are exact ones but with renamed variables.
Detecting Exact and Similar Code Duplicates:If you would like to try this, download this adjusted set of blocks, then follow these steps to replace the old ConQat blocks on your machine.
- Unzip the file “ConQat blocks - vX.Y.rar”, and replace the following folder: <conqat eclipse folder>\ conqat-2011.9\conqat\bundles\org.conqat.engine.code_clones\blocks
- Press the “Enforce Full Model Rebuild”:
- Open ConQat Runtime view and press “New”. You will find new parameters called “equality” and “similarity”:
- Add the “equality” parameter, and set the equality threshold to 1 (meaning that the clones should be 100% identical):
See the Result!This is an example of the result, only exact clones are listed:
Works for Which Language?
This applies on (and has been tested against):
- Java code using the JavaCloneAnalysis block
- .Net code using the CsCloneAnalysis block
- C/C++ code using the StatementCloneAnalysis block. Actually, the StatementCloneAnalysis may be applied on 10 different languages. Review the ConQat documentation for more information.