Skip Menu |
 

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Perl-Critic CPAN distribution.

Report information
The Basics
Id: 64776
Status: open
Priority: 0/
Queue: Perl-Critic

People
Owner: Nobody in particular
Requestors: EDAVIS [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 1.111
Fixed in: (no value)



Subject: Suggested policy: forbid .* at start or end of unanchored regexp
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 865
Download (untitled) / with headers
text/plain 865b
Beginning perl programmers will sometimes write regexp tests like: if ($string =~ /.*(\d+).*/) { # do something with $1 } The intention is to match a number somewhere in the input string. But because the regexp engine tries all possible start positions anyway, the initial .* is redundant, and since the regexp engine ignores unmatching stuff at the end, the final .* is also redundant. I suggest that code like this indicates some confusion about how perl's regular expressions work, and a warning is very worthwhile to explain to the programmer his or her mistake. A policy should warn about .* at the very start or end of a regexp used for m// matching (but not for s///). The warning text should suggest using either /\A(\d+)\z/ to match the entire string, or /(\d+)/ to search the string and find a match at any point.
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-9058-1309451854-113.64776-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 295
Download (untitled) / with headers
text/plain 295b
The documentation for this should recognize the subtle difference between /\d+/ and /.*\d+/ -- the former matches the FIRST occurrence, and the latter matches the LAST occurrence, since .* is greedy. Not relevant if you are just checking presence, but relevant if you make use of pos() or $+[0].
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-9058-1309451854-113.64776-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <rt-3.8.HEAD-9058-1309451854-113.64776-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-31590-1337857495-1077.64776-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 260
Download (untitled) / with headers
text/plain 260b
You're right. .* at the beginning of a regexp is not a mistake if that regexp has capturing groups. However, .* at the start still merits a warning for a plain non-capturing regexp, and .* at the end is pretty much always a mistake (unless with /g perhaps?).


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.