Skip Menu |
 

This queue is for tickets about the Kasago CPAN distribution.

Report information
The Basics
Id: 98566
Status: new
Priority: 0/
Queue: Kasago

People
Owner: Nobody in particular
Requestors: ZARQUON [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Indexing bug for large source trees
MIME-Version: 1.0
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: Web
Message-ID: <rt-4.0.18-17237-1409696807-1911.0-0-0 [...] rt.cpan.org>
X-RT-Original-Encoding: utf-8
Content-Type: multipart/mixed; boundary="----------=_1409696807-17237-1"
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
Content-Length: 198
Download (untitled) / with headers
text/plain 198b
For large source trees (200k lines) the implicit btree index on words(word) via the unique constraint can make postgres barf. I added support for optionally hash indexing it instead via this patch.
Subject: Kasago.patch
MIME-Version: 1.0
Content-Type: text/x-patch; name="Kasago.patch"
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline; filename="Kasago.patch"
Content-Transfer-Encoding: binary
Content-Length: 2097
Download Kasago.patch
text/x-diff 2k
diff --git a/CHANGES b/CHANGES index c7aaa13..46c1059 100644 --- a/CHANGES +++ b/CHANGES @@ -1,4 +1,6 @@ CHANGES file for Kasago: +0.3 Tue Sep 3 + Add support for hash indexing on words(word) 0.29 Tue Jul 26 14:32:05 BST 2005 - first release \ No newline at end of file diff --git a/lib/Kasago.pm b/lib/Kasago.pm index 687a455..9abd51d 100644 --- a/lib/Kasago.pm +++ b/lib/Kasago.pm @@ -12,7 +12,7 @@ use PPI; use Search::QueryParser; use base qw( Class::Accessor::Chained::Fast ); __PACKAGE__->mk_accessors(qw( dbh )); -our $VERSION = '0.29'; +our $VERSION = '0.3'; sub new { my $class = shift; @@ -31,7 +31,7 @@ sub DESTROY { } sub init { - my $self = shift; + my ($self, $index_type) = @_; my $dbh = $self->dbh; eval { @@ -65,12 +65,20 @@ CREATE TABLE files ( CREATE INDEX source_id_index ON files(source_id); "); - $dbh->do(" + my $words_table = " CREATE TABLE words ( word_id SERIAL PRIMARY KEY, - word TEXT UNIQUE + word TEXT "; + if ($index_type eq 'hash') { + $words_table = ." ) WITHOUT OIDS; -"); +CREATE INDEX words_word ON words USING hash (word); +"; + } + else { + $words_table = " UNIQUE) WITHOUT OIDS;"; + } + $dbh->do($words_table); $dbh->do(" CREATE TABLE lines ( @@ -552,12 +560,15 @@ You pass a source name and the directory path: $kasago->import($source, $dir); -=head2 init +=head2 init ($index_type) -This created the tables needed by Kasago in the database. You only need run this -once. If you run this after initialisation, it will delete the index. - - $kasago->init; +This created the tables needed by Kasago in the database. You only need run +this once. If you run this after initialisation, it will delete the index. +If $index_type eq 'hash' then a hash based index will be created on +words(word). Otherwise an implicit btree index will be created. For large +codebases, postgres can complain about index size for the btree index. The +hash index fixes this, but at the expense of only being useful for equality +operators. $kasago->init; =head2 search
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-17237-1409696807-1911.0-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
X-RT-Interface: Web
References: <rt-4.0.18-17237-1409696807-1911.0-0-0 [...] rt.cpan.org>
Content-Type: multipart/mixed; boundary="----------=_1409697169-2268-2"
Message-ID: <rt-4.0.18-2268-1409697169-34.98566-0-0 [...] rt.cpan.org>
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 45
crappy code in patch. This is the right one.
MIME-Version: 1.0
Subject: kasago.patch
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Type: text/x-patch; name="kasago.patch"
Content-Disposition: inline; filename="kasago.patch"
Content-Transfer-Encoding: binary
Content-Length: 2097
Download kasago.patch
text/x-diff 2k
diff --git a/CHANGES b/CHANGES index c7aaa13..46c1059 100644 --- a/CHANGES +++ b/CHANGES @@ -1,4 +1,6 @@ CHANGES file for Kasago: +0.3 Tue Sep 3 + Add support for hash indexing on words(word) 0.29 Tue Jul 26 14:32:05 BST 2005 - first release \ No newline at end of file diff --git a/lib/Kasago.pm b/lib/Kasago.pm index 687a455..b2539c9 100644 --- a/lib/Kasago.pm +++ b/lib/Kasago.pm @@ -12,7 +12,7 @@ use PPI; use Search::QueryParser; use base qw( Class::Accessor::Chained::Fast ); __PACKAGE__->mk_accessors(qw( dbh )); -our $VERSION = '0.29'; +our $VERSION = '0.3'; sub new { my $class = shift; @@ -31,7 +31,7 @@ sub DESTROY { } sub init { - my $self = shift; + my ($self, $index_type) = @_; my $dbh = $self->dbh; eval { @@ -65,12 +65,20 @@ CREATE TABLE files ( CREATE INDEX source_id_index ON files(source_id); "); - $dbh->do(" + my $words_table = " CREATE TABLE words ( word_id SERIAL PRIMARY KEY, - word TEXT UNIQUE + word TEXT "; + if ($index_type eq 'hash') { + $words_table .=" ) WITHOUT OIDS; -"); +CREATE INDEX words_word ON words USING hash (word); +"; + } + else { + $words_table .= " UNIQUE) WITHOUT OIDS;"; + } + $dbh->do($words_table); $dbh->do(" CREATE TABLE lines ( @@ -552,12 +560,15 @@ You pass a source name and the directory path: $kasago->import($source, $dir); -=head2 init +=head2 init ($index_type) -This created the tables needed by Kasago in the database. You only need run this -once. If you run this after initialisation, it will delete the index. - - $kasago->init; +This created the tables needed by Kasago in the database. You only need run +this once. If you run this after initialisation, it will delete the index. +If $index_type eq 'hash' then a hash based index will be created on +words(word). Otherwise an implicit btree index will be created. For large +codebases, postgres can complain about index size for the btree index. The +hash index fixes this, but at the expense of only being useful for equality +operators. $kasago->init; =head2 search


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.