Skip Menu |
 

This queue is for tickets about the TheSchwartz CPAN distribution.

Report information
The Basics
Id: 34843
Status: resolved
Priority: 0/
Queue: TheSchwartz

People
Owner: Jeff.Fearn [...] gmail.com
Requestors: jesper [...] krogh.cc
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.04
Fixed in: 1.11



Subject: Using an ordering on the select.
If possible, please use an ordering when selecting from the database. In PostgreSQL this makes queries on large set enormously faster: # explain analyze SELECT job.jobid, job.funcid, job.arg, job.uniqkey, job.insert_time, job.run_after, job.grabbed_until, job.priority, job.coalesce from workqueue.job where (job.funcid in (3)) and (job.run_after <= 1207833398 ) and (job.grabbed_until <= 1207833398) limit 50; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..2.16 rows=50 width=88) (actual time=1513.607..1513.745 rows=50 loops=1) -> Seq Scan on job (cost=0.00..303726.95 rows=7035367 width=88) (actual time=1513.603..1513.663 rows=50 loops=1) Filter: ((funcid = 3) AND (run_after <= 1207833398) AND (grabbed_until <= 1207833398)) Total runtime: 1513.863 ms (4 rows) # explain analyze SELECT job.jobid, job.funcid, job.arg, job.uniqkey, job.insert_time, job.run_after, job.grabbed_until, job.priority, job.coalesce from workqueue.job where (job.funcid in (3)) and (job.run_after <= 1207833398 ) and (job.grabbed_until <= 1207833398) order by run_after limit 50; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..3.42 rows=50 width=88) (actual time=43.736..44.059 rows=50 loops=1) -> Index Scan using func_id_run_after_grabbed_until on job (cost=0.00..480655.21 rows=7035367 width=88) (actual time=43.732..43.982 rows=50 loops=1) Index Cond: ((funcid = 3) AND (run_after <= 1207833398) AND (grabbed_until <= 1207833398)) Total runtime: 44.173 ms (4 rows) The queue is currently around 7m.. I just have trouble tweaking Data::ObjectDriver to accept ordering on the data.
From: asjo [...] koldfront.dk
Here is a quick patch that adds the ORDER BY to TheSchwartz.pm.
Download add_order_by.patch
text/x-diff 1.6k
--- TheSchwartz.pm.orig 2008-04-11 11:58:11.000000000 +0200 +++ TheSchwartz.pm 2008-04-11 11:59:32.000000000 +0200 @@ -24,6 +24,7 @@ ## Number of jobs to fetch at a time in find_job_for_workers. our $FIND_JOB_BATCH_SIZE = 50; +our $FIND_JOB_SORT='funcid'; sub new { my TheSchwartz $client = shift; @@ -165,12 +166,12 @@ } $driver->search('TheSchwartz::Job' => { funcid => $funcid, @options - }, { limit => $limit }); + }, { limit => $limit, sort=>$FIND_JOB_SORT }); } else { push @jobs, $driver->search('TheSchwartz::Job' => { funcid => $funcid, @options - }, { limit => $limit }); + }, { limit => $limit, sort=>$FIND_JOB_SORT }); } } return @jobs; @@ -213,7 +214,7 @@ run_after => \ "<= $unixtime", grabbed_until => \ "<= $unixtime", coalesce => { op => $op, value => $coval }, - }, { limit => $FIND_JOB_BATCH_SIZE }); + }, { limit => $FIND_JOB_BATCH_SIZE, sort=>$FIND_JOB_SORT }); }; if ($@) { unless (OK_ERRORS->{ $driver->last_error || 0 }) { @@ -252,7 +253,7 @@ funcid => \@ids, run_after => \ "<= $unixtime", grabbed_until => \ "<= $unixtime", - }, { limit => $FIND_JOB_BATCH_SIZE }); + }, { limit => $FIND_JOB_BATCH_SIZE, sort=>$FIND_JOB_SORT }); }; if ($@) { unless (OK_ERRORS->{ $driver->last_error || 0 }) {
Download (untitled) / with headers
text/plain 265b
Hi, I made it default to sort by jobid as it's the only field with an index, and defaulting to sort by an un-indexed column has made things really slow on other systems I've worked on. To git@github.com:jfearn/TheSchwartz.git 02c14c9..9bb4120 master -> master
A fix for this issue shipped in Version 1.11.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.