Mullen, Lincoln
(rOpenSci, 2015-11-05)
This R package provides a set of functions for measuring similarity among documents and detecting passages which have been reused. It implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity ...