This queue is for tickets about the Algorithm-Diff CPAN distribution.

Report information
The Basics
Id:
101105
Status:
resolved
Priority:
Low/Low

People
Owner:
Nobody in particular
Requestors:
XENU [...] cpan.org
Cc:
AdminCc:

BugTracker
Severity:
(no value)
Broken in:
(no value)
Fixed in:
(no value)



Subject: [PATCH] 2x faster LCS
Hello, I have optimized LCS function, so now it runs 96% faster when $keyGen is undef, the only change is removal of unnecessary calls of keygen functions. Looks like subroutine calls are really expensive in Perl. Unfortunately my patch has one downside, when $keyGen is actually defined, LCS is 4% slower than it was without my changes. I don't think it's a problem, because typically LCS is used without $keyGen. Patch is attached. Cheers, Tomasz Test script: use strict; use warnings; use Algorithm::Diff; use Algorithm::OldDiff; use Benchmark qw/:all/; use Digest::MD5 qw/md5_hex/; my @aa = (('z') x 100, 'a', 'b', 'c', ('dupa')x70000, 'd', 'kupa'); my @bb = (('z') x 100, 'a', 'b', 'c', ('dupz')x70000, 'f', 'kupa'); my $hash = sub { md5_hex(shift) }; cmpthese(50, { 'my patch' => sub { Algorithm::Diff::LCS(\@aa, \@bb) }, 'orginal' => sub { Algorithm::OldDiff::LCS(\@aa, \@bb) }, }); cmpthese(50, { 'my patch' => sub { Algorithm::Diff::LCS(\@aa, \@bb, $hash) }, 'original' => sub { Algorithm::OldDiff::LCS(\@aa, \@bb, $hash) }, }); Results: PS E:\Dokumenty\difftest> perl .\lcs.pl Rate orginal my patch orginal 2.41/s -- -49% my patch 4.72/s 96% -- Rate my patch original my patch 1.39/s -- -3% original 1.44/s 4% --
Subject: faster_lcs.patch
ÿþ--- Diff.pm.old 2014-11-26 06:41:54.000000000 +0100 +++ Diff.pm 2014-12-23 21:03:51.000000000 +0100 @@ -42,7 +42,7 @@ for ( $index = $start ; $index <= $end ; $index++ ) { my $element = $aCollection->[$index]; - my $key = &$keyGen( $element, @_ ); + my $key = $keyGen ? &$keyGen( $element, @_ ) : $element; if ( exists( $d{$key} ) ) { unshift ( @{ $d{$key} }, $index ); @@ -147,12 +147,7 @@ # set up code refs # Note that these are optimized. - if ( !defined($keyGen) ) # optimize for strings - { - $keyGen = sub { $_[0] }; - $compare = sub { my ( $a, $b ) = @_; $a eq $b }; - } - else + if ( $keyGen ) # optimize for strings { $compare = sub { my $a = shift; @@ -175,7 +170,8 @@ # First we prune off any common elements at the beginning while ( $aStart <= $aFinish and $bStart <= $bFinish - and &$compare( $a->[$aStart], $b->[$bStart], @_ ) ) + and ( $keyGen ? &$compare( $a->[$aStart], $b->[$bStart], @_ ) + : ( $a->[$aStart] eq $b->[$bStart] ) ) ) { $matchVector->[ $aStart++ ] = $bStart++; $prunedCount++; @@ -184,7 +180,8 @@ # now the end while ( $aStart <= $aFinish and $bStart <= $bFinish - and &$compare( $a->[$aFinish], $b->[$bFinish], @_ ) ) + and ( $keyGen ? &$compare( $a->[$aFinish], $b->[$bFinish], @_ ) + : ( $a->[$aFinish] eq $b->[$bFinish] ) ) ) { $matchVector->[ $aFinish-- ] = $bFinish--; $prunedCount++; @@ -200,7 +197,7 @@ my ( $i, $ai, $j, $k ); for ( $i = $aStart ; $i <= $aFinish ; $i++ ) { - $ai = &$keyGen( $a->[$i], @_ ); + $ai = $keyGen ? &$keyGen( $a->[$i], @_ ) : $a->[$i]; if ( exists( $bMatches->{$ai} ) ) { $k = 0;
Oops, just noticed that patch file contains BOM, sorry for that, I'm attaching corrected version.
Subject: faster_lcs.patch
--- Diff.pm.old 2014-11-26 06:41:54.000000000 +0100 +++ Diff.pm 2014-12-23 21:03:51.000000000 +0100 @@ -42,7 +42,7 @@ for ( $index = $start ; $index <= $end ; $index++ ) { my $element = $aCollection->[$index]; - my $key = &$keyGen( $element, @_ ); + my $key = $keyGen ? &$keyGen( $element, @_ ) : $element; if ( exists( $d{$key} ) ) { unshift ( @{ $d{$key} }, $index ); @@ -147,12 +147,7 @@ # set up code refs # Note that these are optimized. - if ( !defined($keyGen) ) # optimize for strings - { - $keyGen = sub { $_[0] }; - $compare = sub { my ( $a, $b ) = @_; $a eq $b }; - } - else + if ( $keyGen ) # optimize for strings { $compare = sub { my $a = shift; @@ -175,7 +170,8 @@ # First we prune off any common elements at the beginning while ( $aStart <= $aFinish and $bStart <= $bFinish - and &$compare( $a->[$aStart], $b->[$bStart], @_ ) ) + and ( $keyGen ? &$compare( $a->[$aStart], $b->[$bStart], @_ ) + : ( $a->[$aStart] eq $b->[$bStart] ) ) ) { $matchVector->[ $aStart++ ] = $bStart++; $prunedCount++; @@ -184,7 +180,8 @@ # now the end while ( $aStart <= $aFinish and $bStart <= $bFinish - and &$compare( $a->[$aFinish], $b->[$bFinish], @_ ) ) + and ( $keyGen ? &$compare( $a->[$aFinish], $b->[$bFinish], @_ ) + : ( $a->[$aFinish] eq $b->[$bFinish] ) ) ) { $matchVector->[ $aFinish-- ] = $bFinish--; $prunedCount++; @@ -200,7 +197,7 @@ my ( $i, $ai, $j, $k ); for ( $i = $aStart ; $i <= $aFinish ; $i++ ) { - $ai = &$keyGen( $a->[$i], @_ ); + $ai = $keyGen ? &$keyGen( $a->[$i], @_ ) : $a->[$i]; if ( exists( $bMatches->{$ai} ) ) { $k = 0;
Subject: Re: [rt.cpan.org #101105] [PATCH] 2x faster LCS
Date: Thu, 25 Dec 2014 20:03:28 -0800
To: bug-Algorithm-Diff@rt.cpan.org
From: Tye McQueen <tyemq@cpan.org>
Thanks!

I'll apply the patch and let you know.

Tye

On Tue, Dec 23, 2014 at 12:41 PM, Tomasz Konojacki via RT <bug-Algorithm-Diff@rt.cpan.org> wrote:
Show quoted text
Tue Dec 23 15:41:34 2014: Request 101105 was acted upon.
Transaction: Ticket created by XENU
       Queue: Algorithm-Diff
     Subject: [PATCH] 2x faster LCS
   Broken in: (no value)
    Severity: (no value)
       Owner: Nobody
  Requestors: XENU@cpan.org
      Status: new
 Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=101105 >


Hello,

I have optimized LCS function, so now it runs 96% faster when $keyGen is undef, the only change is removal of unnecessary calls of keygen functions. Looks like subroutine calls are really expensive in Perl.

Unfortunately my patch has one downside, when $keyGen is actually defined, LCS is 4% slower than it was without my changes. I don't think it's a problem, because typically LCS is used without $keyGen.

Patch is attached.

Cheers,
Tomasz

Test script:

use strict;
use warnings;

use Algorithm::Diff;
use Algorithm::OldDiff;
use Benchmark qw/:all/;
use Digest::MD5 qw/md5_hex/;

my @aa = (('z') x 100, 'a', 'b', 'c', ('dupa')x70000, 'd', 'kupa');
my @bb = (('z') x 100, 'a', 'b', 'c', ('dupz')x70000, 'f', 'kupa');

my $hash = sub { md5_hex(shift) };

cmpthese(50, {
    'my patch' => sub { Algorithm::Diff::LCS(\@aa, \@bb) },
    'orginal' => sub { Algorithm::OldDiff::LCS(\@aa, \@bb) },
});

cmpthese(50, {
    'my patch' => sub { Algorithm::Diff::LCS(\@aa, \@bb, $hash) },
    'original' => sub { Algorithm::OldDiff::LCS(\@aa, \@bb, $hash) },
});

Results:

PS E:\Dokumenty\difftest> perl .\lcs.pl
           Rate  orginal my patch
orginal  2.41/s       --     -49%
my patch 4.72/s      96%       --
           Rate my patch original
my patch 1.39/s       --      -3%
original 1.44/s       4%       --

On Czw 25 Gru 2014, 23:03:39, TYEMQ wrote:
Show quoted text
> Thanks! > > I'll apply the patch and let you know. > > Tye
Any news? Tomasz
Thanks, I have applied this to the git repo and will make a release soon. -- rjbs


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.