This queue is for tickets about the Text-Levenshtein CPAN distribution.

Report information
The Basics
 Id: 76008 Status: new Priority: 0/ Queue: Text-Levenshtein

People
 Owner: Nobody in particular Requestors: swapnil.tailor [...] gmail.com Cc: AdminCc:

Bug Information
 Severity: Wishlist Broken in: 0.05 Fixed in: (no value)

Attachments
LevenshteinDistance.pm

History
Download (untitled) / with headers
text/plain 563b
As mentioned on wiki: http://en.wikipedia.org/wiki/Levenshtein_distance#Possible_improvements "If we are only interested in the distance if it is smaller than a threshold k, then it suffices to compute a diagonal stripe of width 2k+1 in the matrix. In this way, the algorithm can be run in O(kl) time, where l is the length of the shortest string." I have implemented the functionality and would like to be added in the module. Attaching the file with source code and how to use it. Let me know if you can include that as part of your module. Thanks, Swapnil
 Subject: LevenshteinDistance.pm
#Call should be fastdistancewiththreshold("word1" , "word2", \$threshold) sub fastdistancewiththreshold { my (\$word1, \$word2, \$threshold) = @_; return 0 if \$word1 eq \$word2; my \$i; my \$j; my \$len1 = length(\$word1); my \$len2 = length(\$word2); if(\$len1 > \$len2){ \$tmp = \$word1; \$word1 = \$word2; \$word2 = \$tmp; \$tmp = \$len1; \$len1 = \$len2; \$len2 = \$tmp; } my \$limit = 2*\$threshold+1; \$d = 0; for (1 ... \$limit) { \$d[\$_] = \$_; if(\$_ != \$m && substr(\$word1, \$_) eq substr(\$word2, \$_)){ return \$_; } } for (1 ... \$limit) { \$d[\$_] = \$_; if(\$_ != \$n && substr(\$word1, \$_) eq substr(\$word2, \$_)){ return \$_; } } my \$l = 1; for my \$i (1 ... \$m) { my \$w = substr(\$word1, \$i-1, 1); for (\$j = \$l-\$threshold; \$j <= \$l + \$threshold && \$j <= \$n; \$j++){ if(\$w eq substr(\$word2, \$j-1, 1)){ \$d[\$i][\$j] = \$d[\$i-1][\$j-1]; } else { if(\$i == \$j) { \$d[\$i][\$j] = minimum(\$d[\$i-1][\$j], \$d[\$i][\$j-1], \$d[\$i-1][\$j-1]) + 1; } else { if(\$i < \$j) { \$d[\$i][\$j] = min(\$d[\$i][\$j-1] , \$d[\$i-1][\$j-1]) + 1; } else { \$d[\$i][\$j] = min(\$d[\$i-1][\$j] , \$d[\$i-1][\$j-1]) + 1; } } } } \$l++; } return \$d[\$m][\$n]; }

This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.