Skip Menu |
 

This queue is for tickets about the Email-Find CPAN distribution.

Report information
The Basics
Id: 79271
Status: open
Priority: 0/
Queue: Email-Find

People
Owner: Nobody in particular
Requestors: ALUCAS [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Unimportant
Broken in: (no value)
Fixed in: (no value)

Attachments


Subject: Specify a maximum number of emails to find
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
X-RT-Original-Encoding: utf-8
Content-Type: multipart/mixed; boundary="----------=_1346190682-29098-3"
Content-Length: 0
Content-Type: text/plain; charset="UTF-8"
Content-Disposition: inline
Content-Transfer-Encoding: binary
Content-Length: 764
Download (untitled) / with headers
text/plain 764b
Hi, A user recently asked for help with this module on #perl at freenode. They wanted to stop processing after a specific number of matches. I had a look at your module and noticed that there's no easy way to bail out of a find early. I had to suggest that they subclass and override your find method in order to do it. Would you be interested in implementing such a feature in your module? The normal way to do it would be to walk the string with while(m//g) {} and using substr with @- etc. for the replacement, instead of the s///ge, but I think that would be slower and messier than simply returning the find routine early from within the regexp replacement block, since it is quite a short routine anyway. I've attached a simple patch to illustrate.
Subject: find_match_limit_support.patch
MIME-Version: 1.0
Content-Type: application/octet-stream; name="find_match_limit_support.patch"
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline; filename="find_match_limit_support.patch"
Content-Transfer-Encoding: base64
Content-Length: 956
40c40 < my($self, $r_text) = @_; --- > my($self, $r_text, $match_limit) = @_; 45,47c45,53 < my($replace, $found) = $self->validate($1); < $emails_found += $found; < $replace; --- > #return early if match limit reached > if( defined $match_limit && $emails_found >= $match_limit ) { > return $emails_found; > } > > #otherwise process match > my($replace, $found) = $self->validate($1); > $emails_found += $found; > $replace; 79,80c85,86 < sub find_emails(\$&) { < my($r_text, $callback) = @_; --- > sub find_emails(\$&$) { > my($r_text, $callback, $match_limit) = @_; 82c88 < $finder->find($r_text); --- > $finder->find($r_text, $match_limit); 129a136 > $num_emails_found = $finder->find(\$text, 10); #limit to 10 addresses 131a139,140 > Takes an optional integer as a second argument, which indicates the maximum number of email > addresses that will be matched.
From miyagawa [...] gmail.com Tue Aug 28 18: 11:59 2012
MIME-Version: 1.0 (Apple Message framework v1278)
X-Spam-Status: No, score=-6.219 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_NEUTRAL=0.779] autolearn=ham
In-Reply-To: <rt-3.8.HEAD-29098-1346190682-811.79271-4-0 [...] rt.cpan.org>
X-Mailer: Apple Mail (2.1278)
X-Spam-Flag: NO
References: <RT-Ticket-79271 [...] rt.cpan.org> <rt-3.8.HEAD-29098-1346190682-811.79271-4-0 [...] rt.cpan.org>
X-Virus-Checked: Checked by ClamAV on 16.mx.develooper.com
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Content-Type: multipart/alternative; boundary="Apple-Mail=_1C6B4EF1-37FA-4E6E-8FF2-49BA850F911C"
Message-ID: <E7F9E262-5E82-4DCA-906A-0D43D710074C [...] gmail.com>
X-Spam-Score: -6.219
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id EF3BF24072D for <cpan-bug+Email-Find [...] hipster.bestpractical.com>; Tue, 28 Aug 2012 18:11:58 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7d5WS0YRd4lO for <cpan-bug+Email-Find [...] hipster.bestpractical.com>; Tue, 28 Aug 2012 18:11:57 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 6DCC9240682 for <bug-Email-Find [...] rt.cpan.org>; Tue, 28 Aug 2012 18:11:57 -0400 (EDT)
Received: (qmail 18541 invoked by uid 103); 28 Aug 2012 22:11:56 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 28 Aug 2012 22:11:56 -0000
Received: from mail-pb0-f50.google.com (HELO mail-pb0-f50.google.com) (209.85.160.50) by 16.mx.develooper.com (qpsmtpd/0.80/v0.80-19-gf52d165) with ESMTP; Tue, 28 Aug 2012 15:11:53 -0700
Received: by pbcmd12 with SMTP id md12so9085255pbc.9 for <bug-Email-Find [...] rt.cpan.org>; Tue, 28 Aug 2012 15:11:50 -0700 (PDT)
Received: by 10.68.229.228 with SMTP id st4mr45401166pbc.106.1346191910841; Tue, 28 Aug 2012 15:11:50 -0700 (PDT)
Received: from [192.168.1.116] (64-60-245-242.static-ip.telepacific.net. [64.60.245.242]) by mx.google.com with ESMTPS id wn1sm17829345pbc.57.2012.08.28.15.11.47 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 28 Aug 2012 15:11:50 -0700 (PDT)
Delivered-To: cpan-bug+Email-Find [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #79271] Specify a maximum number of emails to find
Return-Path: <miyagawa [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:mime-version:content-type:subject:date:in-reply-to:to :references:message-id:x-mailer; bh=r0XA9ZFQKMVYYCNQpZEg9N/hDEnO3fGi18X8k/J/mlo=; b=gtwcGYMIrihPS4pTzFOZWn6gX+qh5kIOkMrzmPLzNfDOzme6XcM4pQ+GYz7aAXowNh K926s53ZdLJljF66ARi4aLz8Y6LKumVeJ00JgsN5G44qdJ4/YdbRQb1LdmM0End2uyjZ sVjFJTCG/vDN1dPG/M9ANGr4iYhtce6yyU1dYSrAAUe1ezpo0n540GU6RroF3mlA0PQU pX9hx5BcfCewYP9QJFb9/rnYVz5PTfQqsJm0O1m+yVCmpuTs//c2tKaIGdo1hVsmP4KF GBYlUSkg2fai3yUwCzpRdWpHdrf8Y47Zop4IHeePEpYFOkrkBVlaRV+70NVRNts39qcn MgJw==
X-Spam-Check-BY: 16.mx.develooper.com
X-Original-To: cpan-bug+Email-Find [...] hipster.bestpractical.com
X-RT-Mail-Extension: email-find
Date: Tue, 28 Aug 2012 15:11:46 -0700
X-Spam-Level:
To: bug-Email-Find [...] rt.cpan.org
From: Tatsuhiko Miyagawa <miyagawa [...] gmail.com>
RT-Message-ID: <rt-3.8.HEAD-28086-1346191919-370.79271-0-0 [...] rt.cpan.org>
Content-Length: 0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: utf-8
Content-Length: 1014
Download (untitled) / with headers
text/plain 1014b
On Aug 28, 2012, at 2:51 PM, Anthony J Lucas via RT wrote: Show quoted text
> > Hi, > > A user recently asked for help with this module on #perl at freenode. > They wanted to stop processing after a specific number of matches. I had a look at your module > and noticed that there's no easy way to bail out of a find early. > > I had to suggest that they subclass and override your find method in order to do it. > > Would you be interested in implementing such a feature in your module?
No. you can easily die from the callback to stop processing once collected address is bigger than what you want. Show quoted text
> The normal way to do it would be to walk the string with while(m//g) {} and using substr with @- > etc. for the replacement, instead of the s///ge, but I think that would be slower and messier > than simply returning the find routine early from within the regexp replacement block, since it is > quite a short routine anyway. > > I've attached a simple patch to illustrate. > > <find_match_limit_support.patch>
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-RT-Original-Encoding: utf-8
Content-Length: 2099
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-29098-1346198159-809.79271-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 124
Download (untitled) / with headers
text/plain 124b
Try this my @addr; my $f = Email::Find->new(sub { die if @addr >= $max; push @addr, $_[1] }); eval { $f->find(\$text) };
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-29098-1346198159-809.79271-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <rt-3.8.HEAD-29098-1346198159-809.79271-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-28086-1346198964-488.79271-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 1245
Download (untitled) / with headers
text/plain 1.2k
On Tue Aug 28 19:55:59 2012, MIYAGAWA wrote: Show quoted text
> Try this > > my @addr; > my $f = Email::Find->new(sub { die if @addr >= $max; push @addr, $_[1]
}); Show quoted text
> eval { $f->find(\$text) }; > >
Thanks for replying. Hmmm, I was just thinking about this. The problem with using die in this way is that it limits the usefulness of having the callback functionality. Beyond using the callback as a closure for pushing to an array, it gets a bit more complicated. In order to handle real errors in functions called inside the callback you end up having to parse $@, like below. for example, just to find 2 email addresses: + my $success = UNIQUE_SUCCESS_STRING; + eval{ + find_email($text, sub { + state $count = 0; + + die $success if( ++$count > 1 ); + my($email_obj, $email) = @_; + $my_obj->do_stuff($email_obj); + return $email; + }); + }; + if ($@ =~ /^$success/) { + #we found stuff + } else { + die "It was really an error from somewhere inside ->do_stuff: $@"; + } It would be nice to cover this common case with some simpler syntax. At the moment, using die to abort means the callback can only be used for extremely simple operations (ones that won't die themselves).
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-28086-1346198964-488.79271-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <rt-3.8.HEAD-29098-1346198159-809.79271-0-0 [...] rt.cpan.org> <rt-3.8.HEAD-28086-1346198964-488.79271-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-21250-1346199069-1814.79271-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 73
Sorry, it opened the case. I was just following up. Set back to rejected.
From miyagawa [...] gmail.com Tue Aug 28 20: 20:53 2012
MIME-Version: 1.0 (1.0)
X-Spam-Status: No, score=-5.345 tagged_above=-99.9 required=10 tests=[AWL=0.875, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_HI=-5, SPF_NEUTRAL=0.779] autolearn=ham
In-Reply-To: <rt-3.8.HEAD-28086-1346198964-551.79271-5-0 [...] rt.cpan.org>
X-Spam-Flag: NO
References: <RT-Ticket-79271 [...] rt.cpan.org> <rt-3.8.HEAD-29098-1346198159-809.79271-5-0 [...] rt.cpan.org> <rt-3.8.HEAD-28086-1346198964-551.79271-5-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <5024555292640315152 [...] unknownmsgid>
Content-Type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -5.345
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id C659C24072D for <cpan-bug+Email-Find [...] hipster.bestpractical.com>; Tue, 28 Aug 2012 20:20:52 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id X5BEZX+7Mtyi for <cpan-bug+Email-Find [...] hipster.bestpractical.com>; Tue, 28 Aug 2012 20:20:47 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id A1CC8240682 for <bug-Email-Find [...] rt.cpan.org>; Tue, 28 Aug 2012 20:20:44 -0400 (EDT)
Received: (qmail 27004 invoked by uid 103); 29 Aug 2012 00:20:44 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 29 Aug 2012 00:20:44 -0000
Received: from mail-qc0-f178.google.com (HELO mail-qc0-f178.google.com) (209.85.216.178) by 16.mx.develooper.com (qpsmtpd/0.80/v0.80-19-gf52d165) with ESMTP; Tue, 28 Aug 2012 17:20:38 -0700
Received: by qchj9 with SMTP id j9so3934025qch.9 for <bug-Email-Find [...] rt.cpan.org>; Tue, 28 Aug 2012 17:20:35 -0700 (PDT)
Received: by 10.224.174.145 with SMTP id t17mr524374qaz.0.1346199635503; Tue, 28 Aug 2012 17:20:35 -0700 (PDT)
Delivered-To: cpan-bug+Email-Find [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #79271] Specify a maximum number of emails to find
Return-Path: <miyagawa [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:from:in-reply-to:mime-version:date:message-id:subject:to :content-type; bh=CaRYKCRSHIa4GZ9hiqgoQ/X5Kd2urZqCm/0ATZlQKHA=; b=NG1/uTqEaKUHBgYtx5JrXdqS3B6dYmzbEDqwoZwESbMLCmMG852MAYPLdSv0suC9ft 2JwAu0KpzuSObrgxtqzfz4lRHnMS1SfI78Akm9cTkEZpezKm58ryHaGLZr7srg07BHr1 9QFwTL6kuK1K615xNO44ykwo3MPj+PJDwZdzujVT3oXb+paCHJXEwtifdTpfFqWHu43T dniNwSLLUsSTTnfdvi/U/Gj5/nuGHJTLKZ4Ikm5oDxbEX0e2iGzoQqYzHSlMtX+lmAbr yv8AJq14SHW2VwJONTZ8D3lrmBUgZclC4UPT+R/AgR5ReNsOMTTdhCq8AcdK6v16qRLJ hc1Q==
X-Spam-Check-BY: 16.mx.develooper.com
X-Original-To: cpan-bug+Email-Find [...] hipster.bestpractical.com
X-RT-Mail-Extension: email-find
Date: Tue, 28 Aug 2012 17:21:39 -0700
X-Spam-Level:
To: "bug-Email-Find [...] rt.cpan.org" <bug-Email-Find [...] rt.cpan.org>
From: Tatsuhiko Miyagawa <miyagawa [...] gmail.com>
RT-Message-ID: <rt-3.8.HEAD-29098-1346199653-636.79271-0-0 [...] rt.cpan.org>
Content-Length: 1894
Download (untitled) / with headers
text/plain 1.8k
On Aug 28, 2012, at 5:10 PM, Anthony J Lucas via RT <bug-Email-Find@rt.cpan.org> wrote: Show quoted text
> Queue: Email-Find > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=79271 > > > On Tue Aug 28 19:55:59 2012, MIYAGAWA wrote:
>> Try this >> >> my @addr; >> my $f = Email::Find->new(sub { die if @addr >= $max; push @addr, $_[1]
> });
>> eval { $f->find(\$text) }; >> >>
> > Thanks for replying. Hmmm, I was just thinking about this. > > The problem with using die in this way is that it limits the usefulness > of having the callback functionality. > > Beyond using the callback as a closure for pushing to an array, it gets > a bit more complicated. In order to handle real errors in functions > called inside the callback you end up having to parse $@, like below.
It is common to allow die in callbacks to stop further processing. Commonly used in LWP handlers for instance. Show quoted text
> > for example, just to find 2 email addresses: > > + my $success = UNIQUE_SUCCESS_STRING; > + eval{ > + find_email($text, sub { > + state $count = 0; > + > + die $success if( ++$count > 1 ); > + my($email_obj, $email) = @_; > + $my_obj->do_stuff($email_obj); > + return $email; > + }); > + }; > + if ($@ =~ /^$success/) { > + #we found stuff > + } else { > + die "It was really an error from somewhere inside ->do_stuff: > $@"; > + } > > It would be nice to cover this common case with some simpler syntax.
I don't consider it a common case given this module has been on CPAN for 10 years and you're almost the first to request it to be built into the module. Having said that you can always subclass or fork a module and ship to CPAN as a separate module. Nobody would stop you from doing it. Show quoted text
> At the moment, using die to abort means the callback can only be used > for extremely simple operations (ones that won't die themselves).


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.