Skip Menu |
 

This queue is for tickets about the Regexp-Grammars CPAN distribution.

Report information
The Basics
Id: 99980
Status: resolved
Priority: 0/
Queue: Regexp-Grammars

People
Owner: Nobody in particular
Requestors: bbkr [...] post.pl
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.036
Fixed in: (no value)



Subject: utf8 flag is lost in match object on v5.20+
MIME-Version: 1.0
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
Message-ID: <rt-4.0.18-12750-1415020128-1799.0-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 328
Download (untitled) / with headers
text/plain 328b
#!/usr/bin/env perl use utf8; use Regexp::Grammars; my $parser = qr{ <TOP> <rule: TOP>.* }xms; 'zażółć_gęślą_jaźń' =~ $parser; print "parsed_as_utf8 = ", utf8::is_utf8( $/{'TOP'} ); __END__ On Perl 5.14.4 and 5.16.3 it correctly sets utf8 flag on captured string. On Perl 5.20.1 and 5.21.5 flag is lost.
MIME-Version: 1.0
X-Spam-Status: No, score=-5.062 tagged_above=-99.9 required=10 tests=[AWL=1.537, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, FROM_OUR_RT=-4, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
In-Reply-To: <rt-4.0.18-12750-1415020128-1084.99980-4-0 [...] rt.cpan.org>
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-99980 [...] rt.cpan.org> <rt-4.0.18-12750-1415020128-1084.99980-4-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Received: by 10.70.38.165 with SMTP id h5mr8603815pdk.121.1415089516356; Tue, 04 Nov 2014 00:25:16 -0800 (PST)
Message-ID: <CAATtAp5VZ3VCg1NXxykn3yqHn67Y9szUp5Las8Wr4NyPai9A+A [...] mail.gmail.com>
Content-Type: multipart/mixed; boundary="089e0103eb26f2dd04050704339e"
X-Spam-Score: -5.062
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id F3A5124055E for <cpan-bug+Regexp-Grammars [...] hipster.bestpractical.com>; Tue, 4 Nov 2014 03:25:23 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cNd5YsGaFwdv for <cpan-bug+Regexp-Grammars [...] hipster.bestpractical.com>; Tue, 4 Nov 2014 03:25:23 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id E07A1240542 for <bug-Regexp-Grammars [...] rt.cpan.org>; Tue, 4 Nov 2014 03:25:22 -0500 (EST)
Received: (qmail 11141 invoked by alias); 4 Nov 2014 08:25:22 -0000
Received: from mail-pa0-f54.google.com (HELO mail-pa0-f54.google.com) (209.85.220.54) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Tue, 04 Nov 2014 00:25:20 -0800
Received: by mail-pa0-f54.google.com with SMTP id rd3so14059033pab.13 for <bug-Regexp-Grammars [...] rt.cpan.org>; Tue, 04 Nov 2014 00:25:16 -0800 (PST)
Received: by 10.70.32.226 with HTTP; Tue, 4 Nov 2014 00:24:36 -0800 (PST)
Delivered-To: cpan-bug+Regexp-Grammars [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #99980] utf8 flag is lost in match object on v5.20+
Return-Path: <thoughtstream [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=9uIRcQgRfREhqApTLKlwcV9cnLCzB2dz+al342AjPoA=; b=ZgxXuSlGokkneXIrN7ceMhqgVCFbjRVxJr/51hrLkVWhtHkcHTxgK3GqZ8cjtfhHpP iBqaxsJ4LNKrswm7WCmrElHWWyl8CY0swVuKnM7x04iIal8FqX8k1udmuHyJVmlt2JLR GY0vEH9O5BxqoqfBeWtpthUpfO14DN3wloYZhAhDfIPL8xNvkz2HDhhZaKD78BKgDuMy +5tFoyXPgBn7EL+ljwQ28iPnM9hX730/EZ9Rvu8idgI2tMk2UtLOVMQP0pKj5DZ5gz9D 0PBKaB5yj4NoDemcAtcLOS0xvfESEcUwrcspbfNhr+gT35JXYiMYl4R8aN4PxIEdg6Al Z8Ng==
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Regexp-Grammars [...] hipster.bestpractical.com
X-RT-Mail-Extension: regexp-grammars
X-Google-Sender-Auth: 0o0P5dbkR5bpqAcdAErzYLiYY4s
Sender: thoughtstream [...] gmail.com
Date: Tue, 4 Nov 2014 19:24:36 +1100
X-Spam-Level:
To: bug-Regexp-Grammars [...] rt.cpan.org
From: Damian Conway <damian [...] conway.org>
RT-Message-ID: <rt-4.0.18-23862-1415089524-485.99980-0-0 [...] rt.cpan.org>
Content-Length: 0
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
Content-Length: 181
Download (untitled) / with headers
text/plain 181b
Thanks for the report. However, this problem is not specific to Regexp::Grammars, as the attached test script demonstrates. I will report the issue to the core developers. Damian
Content-Type: text/x-perl-script; charset="UTF-8"; name="utf8_bug.pl"
X-Attachment-ID: f_i22zo8di0
Content-Disposition: attachment; filename="utf8_bug.pl"
Content-Transfer-Encoding: base64
X-RT-Original-Encoding: utf-8
Content-Length: 431
Download utf8_bug.pl
text/x-perl 431b

Message body is not shown because sender requested not to inline it.

MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-23862-1415089524-485.99980-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: API
References: <RT-Ticket-99980 [...] rt.cpan.org> <rt-4.0.18-12750-1415020128-1084.99980-4-0 [...] rt.cpan.org> <CAATtAp5VZ3VCg1NXxykn3yqHn67Y9szUp5Las8Wr4NyPai9A+A [...] mail.gmail.com> <rt-4.0.18-23862-1415089524-485.99980-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-23947-1415189132-45.0-0-0 [...] rt.cpan.org>
Message-ID: <rt-4.0.18-23947-1415189132-1639.99980-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
From: bbkr [...] post.pl
Content-Length: 113
Download (untitled) / with headers
text/plain 113b
Thanks! Can you please link ticket where it is reported so everyone can track its status? I don't see it on RT.
MIME-Version: 1.0
X-Spam-Status: No, score=-5.123 tagged_above=-99.9 required=10 tests=[AWL=1.476, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, FROM_OUR_RT=-4, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
In-Reply-To: <rt-4.0.18-23947-1415189133-441.99980-5-0 [...] rt.cpan.org>
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-99980 [...] rt.cpan.org> <rt-4.0.18-12750-1415020128-1084.99980-4-0 [...] rt.cpan.org> <CAATtAp5VZ3VCg1NXxykn3yqHn67Y9szUp5Las8Wr4NyPai9A+A [...] mail.gmail.com> <rt-4.0.18-23862-1415089524-485.99980-5-0 [...] rt.cpan.org> <rt-4.0.18-23947-1415189133-441.99980-5-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Received: by 10.68.229.193 with SMTP id ss1mr8470010pbc.16.1415220923119; Wed, 05 Nov 2014 12:55:23 -0800 (PST)
Message-ID: <CAATtAp516tXAh411TjYrzJyV9zgaLo0X+S2i3vZv3wqKA90Vbg [...] mail.gmail.com>
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -5.123
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 8C85324065B for <cpan-bug+Regexp-Grammars [...] hipster.bestpractical.com>; Wed, 5 Nov 2014 15:55:30 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w4rsUuykvz+e for <cpan-bug+Regexp-Grammars [...] hipster.bestpractical.com>; Wed, 5 Nov 2014 15:55:29 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 48F8F240532 for <bug-Regexp-Grammars [...] rt.cpan.org>; Wed, 5 Nov 2014 15:55:29 -0500 (EST)
Received: (qmail 6415 invoked by alias); 5 Nov 2014 20:55:28 -0000
Received: from mail-pd0-f171.google.com (HELO mail-pd0-f171.google.com) (209.85.192.171) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Wed, 05 Nov 2014 12:55:26 -0800
Received: by mail-pd0-f171.google.com with SMTP id r10so1480205pdi.2 for <bug-Regexp-Grammars [...] rt.cpan.org>; Wed, 05 Nov 2014 12:55:23 -0800 (PST)
Received: by 10.70.32.226 with HTTP; Wed, 5 Nov 2014 12:54:41 -0800 (PST)
Delivered-To: cpan-bug+Regexp-Grammars [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #99980] utf8 flag is lost in match object on v5.20+
Return-Path: <thoughtstream [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=GW2soxzWiix1P9tyleu5d7UjHrpKSamn19DjbvwURaE=; b=PQEFofy40GSJX9qakZt/o5lHFR+oI45Yb9BGk5rVpoI0Wt9nRLzrUP9ZpXg9ucfHWd WZDE4Dc6cyii18wqMldQhI+AC6MNDxF1D8wHarRC9wwEK4vAuEGQi4OJniud4iQo6R3n /wFhrOPqqs2FAlXnzhZKA+y6qXISFWbFt/MOoGFVNzp9rAmPtJdbUYPg86TUZ+6kcJDz cj9MrT14/mW883CYAJY6iym2uYVPYoydvAVDDLRnLqbTul+SPo4ocQd5iZsQTzPtVdk5 EpRGfO1p0C0TH8ysLnVfspyuiPAuddIUndqzSiynkn8cxbNe3580JrXjwikzZSb/s6uz fjFQ==
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Regexp-Grammars [...] hipster.bestpractical.com
X-RT-Mail-Extension: regexp-grammars
X-Google-Sender-Auth: ASsUFzjtoFVR0IwXIrHDT-9a_1U
Sender: thoughtstream [...] gmail.com
Date: Thu, 6 Nov 2014 07:54:41 +1100
X-Spam-Level:
To: bug-Regexp-Grammars [...] rt.cpan.org
From: Damian Conway <damian [...] conway.org>
RT-Message-ID: <rt-4.0.18-4412-1415220931-210.99980-0-0 [...] rt.cpan.org>
Content-Length: 251
Download (untitled) / with headers
text/plain 251b
Show quoted text
> Can you please link ticket where it is reported so everyone can track its status? I don't see it on RT.
There was a problem with the original report. I have just resubmitted. The ticket is: https://rt.perl.org/Ticket/Display.html?id=123135 Damian
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-4412-1415220931-210.99980-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: API
References: <RT-Ticket-99980 [...] rt.cpan.org> <rt-4.0.18-12750-1415020128-1084.99980-4-0 [...] rt.cpan.org> <CAATtAp5VZ3VCg1NXxykn3yqHn67Y9szUp5Las8Wr4NyPai9A+A [...] mail.gmail.com> <rt-4.0.18-23862-1415089524-485.99980-5-0 [...] rt.cpan.org> <rt-4.0.18-23947-1415189133-441.99980-5-0 [...] rt.cpan.org> <CAATtAp516tXAh411TjYrzJyV9zgaLo0X+S2i3vZv3wqKA90Vbg [...] mail.gmail.com> <rt-4.0.18-4412-1415220931-210.99980-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-29448-1415294214-995.0-0-0 [...] rt.cpan.org>
Message-ID: <rt-4.0.18-29448-1415294214-1225.99980-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
From: bbkr [...] post.pl
Content-Length: 110
Download (untitled) / with headers
text/plain 110b
Bug is confirmed, should be fixed in Perl 5.20.2 release: https://rt.perl.org/Ticket/Display.html?id=122913
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-29448-1415294214-995.0-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <RT-Ticket-99980 [...] rt.cpan.org> <rt-4.0.18-12750-1415020128-1084.99980-4-0 [...] rt.cpan.org> <CAATtAp5VZ3VCg1NXxykn3yqHn67Y9szUp5Las8Wr4NyPai9A+A [...] mail.gmail.com> <rt-4.0.18-23862-1415089524-485.99980-5-0 [...] rt.cpan.org> <rt-4.0.18-23947-1415189133-441.99980-5-0 [...] rt.cpan.org> <CAATtAp516tXAh411TjYrzJyV9zgaLo0X+S2i3vZv3wqKA90Vbg [...] mail.gmail.com> <rt-4.0.18-4412-1415220931-210.99980-0-0 [...] rt.cpan.org> <rt-4.0.18-29448-1415294214-995.0-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-23321-1415411853-1967.99980-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 542
Download (untitled) / with headers
text/plain 542b
On Thu Nov 06 12:16:54 2014, bbkr@post.pl wrote: Show quoted text
> Bug is confirmed, should be fixed in Perl 5.20.2 release: > > https://rt.perl.org/Ticket/Display.html?id=122913
Yes, the fix has been backported to maint-5.20. If you need to work around the bug for 5.20.0 and 5.20.1, I believe my $x = $^N; utf8::decode $x if utf8::is_utf8 $_; ... do something with $x, not $^N ... will do the trick. Within regexp code blocks, $_ is aliased to the string being matched against. And it is within code blocks that $^N behaves erratically.
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-23321-1415411853-1967.99980-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: API
References: <RT-Ticket-99980 [...] rt.cpan.org> <rt-4.0.18-12750-1415020128-1084.99980-4-0 [...] rt.cpan.org> <CAATtAp5VZ3VCg1NXxykn3yqHn67Y9szUp5Las8Wr4NyPai9A+A [...] mail.gmail.com> <rt-4.0.18-23862-1415089524-485.99980-5-0 [...] rt.cpan.org> <rt-4.0.18-23947-1415189133-441.99980-5-0 [...] rt.cpan.org> <CAATtAp516tXAh411TjYrzJyV9zgaLo0X+S2i3vZv3wqKA90Vbg [...] mail.gmail.com> <rt-4.0.18-4412-1415220931-210.99980-0-0 [...] rt.cpan.org> <rt-4.0.18-29448-1415294214-995.0-0-0 [...] rt.cpan.org> <rt-4.0.18-23321-1415411853-1967.99980-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-7787-1431383929-1756.0-0-0 [...] rt.cpan.org>
Message-ID: <rt-4.0.18-7787-1431383929-824.99980-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
From: bbkr [...] post.pl
Content-Length: 267
Download (untitled) / with headers
text/plain 267b
From Perl 5.20.2 changelog: "In Perl 5.20.0, $^N accidentally had the internal UTF8 flag turned off if accessed from a code block within a regular expression, effectively UTF8-encoding the value. This has been fixed. [perl #123135]" So ticket can be closed. Thanks!


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.