Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Sereal-Encoder CPAN distribution.

Report information
The Basics
Id:
101876
Status:
open
Priority:
Low/Low

People
Owner:
Nobody in particular
Requestors:
zefram [...] fysh.org
Cc:
AdminCc:

BugTracker
Severity:
(no value)
Broken in:
(no value)
Fixed in:
(no value)



MIME-Version: 1.0
X-Spam-Status: No, score=-2.149 tagged_above=-99.9 required=10 tests=[AWL=-0.238, BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Content-Disposition: inline
X-Spam-Flag: NO
content-type: text/plain; charset="utf-8"
Message-ID: <20150202103301.GA4367@fysh.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Spam-Score: -2.149
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 68B602400EC for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 05:33:19 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ePqvGagsWPTs for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 05:33:18 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 12B2A240061 for <bug-Sereal-Encoder@rt.cpan.org>; Mon, 2 Feb 2015 05:33:17 -0500 (EST)
Received: (qmail 2037 invoked by alias); 2 Feb 2015 10:33:17 -0000
Received: from river.fysh.org (HELO river.fysh.org) (5.135.154.127) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Mon, 02 Feb 2015 02:33:09 -0800
Received: from zefram by river.fysh.org with local (Exim 4.80 #2 (Debian)) id 1YIEJR-0001mh-BJ; Mon, 02 Feb 2015 10:33:01 +0000
Delivered-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
Subject: losing string value of semi-numeric string
Return-Path: <zefram@fysh.org>
X-RT-Mail-Extension: sereal-encoder
X-Original-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
X-Spam-Check-BY: la.mx.develooper.com
Date: Mon, 2 Feb 2015 10:33:01 +0000
X-Spam-Level:
To: bug-Sereal-Encoder@rt.cpan.org
From: Zefram <zefram@fysh.org>
X-RT-Original-Encoding: ascii
X-RT-Interface: Email
Content-Length: 1315
$ perl -MSereal::Encoder=encode_sereal -MSereal::Decoder=decode_sereal -lwe 'print $]; print $Sereal::Encoder::VERSION; my $a="0 but true"; print decode_sereal(encode_sereal($a)); my $b = $a+0; print $a; print decode_sereal(encode_sereal($a));' 5.018002 3.005 0 but true 0 but true 0 I believe the first encoding is representing $a as a string but the second encoding is representing it as a pure integer, based on the IOK flag. In the case of this string, along with infinitely many others such as "00", "01", and "1 ", the integer representation is lossy. It's particularly significant for strings such as "0 but true" and "00" which qualify as true but come out as false when mangled by the lossy encoding. But even when the truth value doesn't change, it is not at all acceptable to lose the string value. The underlying mistake is that you've treated the IOK flag as implying that the scalar is fully characterised by its IV. In general that is not the case. For scalars that are both IOK and POK, to see whether integer representation suffices you need to perform the IV->PV coercion yourself, and see whether the PV generated from the IV matches the scalar's actual PV. Similar remarks apply to NOK and NV. For extra fun, the exact meaning of the [PIN]OK flags varies between Perl versions. -zefram
CC: perlbug@perl.org
MIME-Version: 1.0
X-Spam-Status: No, score=-5.031 tagged_above=-99.9 required=10 tests=[AWL=1.668, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FROM_OUR_RT=-4, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
In-Reply-To: <rt-4.0.18-32009-1422873200-641.101876-4-0@rt.cpan.org>
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-101876@rt.cpan.org> <20150202103301.GA4367@fysh.org> <rt-4.0.18-32009-1422873200-641.101876-4-0@rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Received: by 10.140.104.228 with SMTP id a91mr38427989qgf.46.1422874257119; Mon, 02 Feb 2015 02:50:57 -0800 (PST)
Message-ID: <CANgJU+WC_FDeBuhoGSyPiHG=4H8MvVzLCRVAy=0c4qOyfcctXQ@mail.gmail.com>
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -5.031
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i=@gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 6ACB72403EE for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 05:51:07 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FjB95zDHH+vM for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 05:51:06 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id C51E1240061 for <bug-Sereal-Encoder@rt.cpan.org>; Mon, 2 Feb 2015 05:51:05 -0500 (EST)
Received: (qmail 2933 invoked by alias); 2 Feb 2015 10:51:04 -0000
Received: from mail-qa0-f45.google.com (HELO mail-qa0-f45.google.com) (209.85.216.45) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Mon, 02 Feb 2015 02:51:01 -0800
Received: by mail-qa0-f45.google.com with SMTP id n8so28437223qaq.4 for <bug-Sereal-Encoder@rt.cpan.org>; Mon, 02 Feb 2015 02:50:57 -0800 (PST)
Received: by 10.140.101.145 with HTTP; Mon, 2 Feb 2015 02:50:57 -0800 (PST)
Delivered-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
Subject: Re: [rt.cpan.org #101876] losing string value of semi-numeric string
Return-Path: <demerphq@gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=JbWQKtyt0+UxYRu38EpcwAk0cSEm/g6fVcrULDvjrx0=; b=PHKqJd7jjjxxjKbDFHzrP5Mhj48yKQhJUn4lsEsZ25YHyLz+xsQNSkr/G556OXeAHm 8gxlfGOwCiIU7Vh7DNdfhZH7Scu/k03xZE7zVZCJovoc235KYl+kVPBfnF4EtEMxzEKO i8y5CDkk8x5kuOFpCiKh1SsH4aYMYEd/aPgFvchxe26MkhR1NYGAdYBUCfgb1dQhNXg4 u8vzQlHC5GiszZw8MyCLAJW/zu1uhrAiqkNRsqxjUzZRDnfUSpcFWFahKF+8G/lTnETB 6M5+/g8rtmVV/ccb+wRu9uPMJKE+srMu0KUwXmiL9Am9TVEnu1wH37Bj0mjCziRXep5q ZsGQ==
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
X-RT-Mail-Extension: sereal-encoder
Date: Mon, 2 Feb 2015 11:50:57 +0100
X-Spam-Level:
To: bug-Sereal-Encoder@rt.cpan.org
From: demerphq <demerphq@gmail.com>
RT-Message-ID: <rt-4.0.18-12552-1422874268-421.101876-0-0@rt.cpan.org>
Content-Length: 3113
On 2 February 2015 at 11:33, Zefram via RT <bug-Sereal-Encoder@rt.cpan.org> wrote:
Show quoted text
> Mon Feb 02 05:33:20 2015: Request 101876 was acted upon. > Transaction: Ticket created by zefram@fysh.org > Queue: Sereal-Encoder > Subject: losing string value of semi-numeric string > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: zefram@fysh.org > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=101876 > > > > $ perl -MSereal::Encoder=encode_sereal -MSereal::Decoder=decode_sereal -lwe 'print $]; print $Sereal::Encoder::VERSION; my $a="0 but true"; print decode_sereal(encode_sereal($a)); my $b = $a+0; print $a; print decode_sereal(encode_sereal($a));' > 5.018002 > 3.005 > 0 but true > 0 but true > 0 > > I believe the first encoding is representing $a as a string but the > second encoding is representing it as a pure integer, based on the IOK > flag. In the case of this string, along with infinitely many others > such as "00", "01", and "1 ", the integer representation is lossy. > It's particularly significant for strings such as "0 but true" and "00" > which qualify as true but come out as false when mangled by the lossy > encoding. But even when the truth value doesn't change, it is not at > all acceptable to lose the string value. > > The underlying mistake is that you've treated the IOK flag as implying > that the scalar is fully characterised by its IV. In general that is > not the case. For scalars that are both IOK and POK, to see whether > integer representation suffices you need to perform the IV->PV coercion > yourself, and see whether the PV generated from the IV matches the > scalar's actual PV. Similar remarks apply to NOK and NV. For extra fun, > the exact meaning of the [PIN]OK flags varies between Perl versions.
No. I disagree. This is a bug in perl itself. $ perl -MDevel::Peek -le'my $x="0 but true"; my $y=0+$x; Dump($x)' SV = PVIV(0x7cdd88) at 0x7d9a48 REFCNT = 1 FLAGS = (PADMY,IOK,POK,pIOK,pPOK) IV = 0 PV = 0x7d2b90 "0 but true"\0 CUR = 10 LEN = 16 The IOK flag should NOT be set here, it should be pIOK only. IOK means that the integer representation is either a) canonical, or b) a faithful representation of the PV. pIOK is supposed to mean that the cached value of the string can be used, but that it is not a faithful representation of the string it was derived from. (If IOK and pIOK do not mean these things then it is a total waste to have both set of flags, which seems an unreasonable interpretation.) Compare to this: $ perl -MDevel::Peek -le'my $x="0blahblah"; my $y=0+$x; Dump($x)' SV = PVNV(0x1bcaf10) at 0x1beaa58 REFCNT = 1 FLAGS = (PADMY,POK,pIOK,pNOK,pPOK) IV = 0 NV = 0 PV = 0x1be3ba0 "0blahblah"\0 CUR = 9 IMO this is clearly a bug in the special case logic for "0 but true". It should NOT set the IOK flag, it should set only the pIOK flag. I will naturally try to fix this in Sereal, but I consider this a bug in Perl and I am sending this to perlbug because of it. cheers, Yves perl -Mre=debug -e "/just|another|perl|hacker/"
MIME-Version: 1.0
X-Spam-Status: No, score=-4.147 tagged_above=-99.9 required=10 tests=[AWL=1.764, BAYES_00=-1.9, FROM_OUR_RT=-4, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
In-Reply-To: <rt-4.0.18-12552-1422874268-353.101876-6-0@rt.cpan.org>
Content-Disposition: inline
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-101876@rt.cpan.org> <20150202103301.GA4367@fysh.org> <rt-4.0.18-32009-1422873200-641.101876-4-0@rt.cpan.org> <CANgJU+WC_FDeBuhoGSyPiHG=4H8MvVzLCRVAy=0c4qOyfcctXQ@mail.gmail.com> <rt-4.0.18-12552-1422874268-353.101876-6-0@rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <20150202121900.GA18814@fysh.org>
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -4.147
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 8D03B240106 for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 07:19:13 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1tRwxof--EP5 for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 07:19:12 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 0FD242400EC for <bug-Sereal-Encoder@rt.cpan.org>; Mon, 2 Feb 2015 07:19:11 -0500 (EST)
Received: (qmail 8356 invoked by alias); 2 Feb 2015 12:19:10 -0000
Received: from river.fysh.org (HELO river.fysh.org) (5.135.154.127) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Mon, 02 Feb 2015 04:19:08 -0800
Received: from zefram by river.fysh.org with local (Exim 4.80 #2 (Debian)) id 1YIFy0-0005XS-UK; Mon, 02 Feb 2015 12:19:00 +0000
Delivered-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
Subject: Re: [rt.cpan.org #101876] losing string value of semi-numeric string
Return-Path: <zefram@fysh.org>
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
X-RT-Mail-Extension: sereal-encoder
Date: Mon, 2 Feb 2015 12:19:00 +0000
X-Spam-Level:
To: demerphq via RT <bug-Sereal-Encoder@rt.cpan.org>
From: Zefram <zefram@fysh.org>
RT-Message-ID: <rt-4.0.18-14868-1422879554-917.101876-0-0@rt.cpan.org>
Content-Length: 1682
demerphq via RT wrote:
Show quoted text
>IMO this is clearly a bug in the special case logic for "0 but true".
The bug is certainly not in the "0 but true" special case. The bug is not specific to "0 but true", but is also shared by "00", "1 ", and the like. $ perl -MDevel::Peek -lwe 'my $a="1 "; my $b = $a+0; Dump $a' SV = PVIV(0xb0d1b0) at 0xb09b48 REFCNT = 1 FLAGS = (PADMY,IOK,POK,pIOK,pPOK) IV = 1 PV = 0xb02020 "1 "\0 CUR = 2 LEN = 16
Show quoted text
>IOK means that the integer representation is either a) canonical, or >b) a faithful representation of the PV.
There is certainly a case to be made that the flag *should* mean that, but de facto it doesn't mean that and never has. It's never even been close to that. De facto, in current Perl the IOK flag means that the scalar has an IV immediately available and *is an acceptable (non-warning) operand for numeric operations*. It currently looks rather pointless to preserve this aspect of a scalar, because non-numeric operands actually only warn on their first numeric operation, the one that generates the IV and sets the pIOK flag. (On subsequent operations the pIOK flag effectively muffles the warning despite the lack of IOK.) But that's where we are: IOK tells you very little about the relationship between the IV and PV slots (even excluding dualvars).
Show quoted text
>(If IOK and pIOK do not mean these things then it is a total waste to >have both set of flags, which seems an unreasonable interpretation.)
What IOK actually represents does seem wasteful, considering how little use is made of it. Not a *total* waste, but a poor use of a precious flag bit. But being wasteful doesn't mean it isn't what actually happens. -zefram
MIME-Version: 1.0
X-Spam-Status: No, score=-5.089 tagged_above=-99.9 required=10 tests=[AWL=1.610, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FROM_OUR_RT=-4, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
In-Reply-To: <rt-4.0.18-14868-1422879555-1377.101876-5-0@rt.cpan.org>
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-101876@rt.cpan.org> <20150202103301.GA4367@fysh.org> <rt-4.0.18-32009-1422873200-641.101876-4-0@rt.cpan.org> <CANgJU+WC_FDeBuhoGSyPiHG=4H8MvVzLCRVAy=0c4qOyfcctXQ@mail.gmail.com> <rt-4.0.18-12552-1422874268-353.101876-6-0@rt.cpan.org> <20150202121900.GA18814@fysh.org> <rt-4.0.18-14868-1422879555-1377.101876-5-0@rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Received: by 10.224.79.82 with SMTP id o18mr40735606qak.3.1422880249081; Mon, 02 Feb 2015 04:30:49 -0800 (PST)
Message-ID: <CANgJU+XpxH9h9i9iANWZiMSH0ADGEHgeSzuCePJSj3OG=HWLUg@mail.gmail.com>
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -5.089
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i=@gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 9DA87240106 for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 07:30:59 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gWh5Fn2YXNTd for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 07:30:57 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 39AA62400EC for <bug-Sereal-Encoder@rt.cpan.org>; Mon, 2 Feb 2015 07:30:56 -0500 (EST)
Received: (qmail 9102 invoked by alias); 2 Feb 2015 12:30:56 -0000
Received: from mail-qa0-f45.google.com (HELO mail-qa0-f45.google.com) (209.85.216.45) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Mon, 02 Feb 2015 04:30:53 -0800
Received: by mail-qa0-f45.google.com with SMTP id n8so28748586qaq.4 for <bug-Sereal-Encoder@rt.cpan.org>; Mon, 02 Feb 2015 04:30:49 -0800 (PST)
Received: by 10.140.101.145 with HTTP; Mon, 2 Feb 2015 04:30:49 -0800 (PST)
Delivered-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
Subject: Re: [rt.cpan.org #101876] losing string value of semi-numeric string
Return-Path: <demerphq@gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=rgnrTqFToWXim3/5euPZLtTPBrr8oH7cvTtuXldV9Bk=; b=0TEZjxG4CFx25FdHGmh3uP+lwlGosjBETv11brhufyg7LGQ91W2vVErbk3UQouSTX+ 9yAKtrAoIgTtPeKdIqOeaBIXc79SwrowctfmpGdppG04fi4fsHxiFuKuiP7dUzXkJqVl AkuW4l08e3notSnmm7Wo/CYUpk2AyvepH1hxVpA3Kwh7GhHxoSQwqO6aC1ONqjCEUQ9x +S36rlsG3Ir/6yHgMFm+H7j0IpRgUTUmc73PXJtr57JU//L1jFVQ2zuSejX1zl2Rng17 RFYUl9JbBqeqQhXj4gdtSmFescrMJvqndrd99XXMfiUFrl14Mdv3C6XeKpe6/rPeqgCa bB2Q==
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
X-RT-Mail-Extension: sereal-encoder
Date: Mon, 2 Feb 2015 13:30:49 +0100
X-Spam-Level:
To: bug-Sereal-Encoder@rt.cpan.org
From: demerphq <demerphq@gmail.com>
RT-Message-ID: <rt-4.0.18-11548-1422880260-999.101876-0-0@rt.cpan.org>
Content-Length: 2628
On 2 February 2015 at 13:19, Zefram via RT <bug-Sereal-Encoder@rt.cpan.org> wrote:
Show quoted text
> Queue: Sereal-Encoder > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=101876 > > > demerphq via RT wrote:
>>IMO this is clearly a bug in the special case logic for "0 but true".
> > The bug is certainly not in the "0 but true" special case. The bug is not > specific to "0 but true", but is also shared by "00", "1 ", and the like.
As far as I can tell it is a bug in the "0 but true" branch as it does not set the IS_NUMBER_TRAILING flag. And it seems to also be a bug in sv_2iuv_common() which seems to not check the IS_NUMBER_TRAILING flag that is set on these examples. Actually I cant find *any* code that checks the IS_NUMBER_TRAILING flag.
Show quoted text
> > $ perl -MDevel::Peek -lwe 'my $a="1 "; my $b = $a+0; Dump $a' > SV = PVIV(0xb0d1b0) at 0xb09b48 > REFCNT = 1 > FLAGS = (PADMY,IOK,POK,pIOK,pPOK) > IV = 1 > PV = 0xb02020 "1 "\0 > CUR = 2 > LEN = 16 >
>>IOK means that the integer representation is either a) canonical, or >>b) a faithful representation of the PV.
> > There is certainly a case to be made that the flag *should* mean that, > but de facto it doesn't mean that and never has. It's never even been > close to that. De facto, in current Perl the IOK flag means that the > scalar has an IV immediately available and *is an acceptable (non-warning) > operand for numeric operations*. It currently looks rather pointless to > preserve this aspect of a scalar, because non-numeric operands actually > only warn on their first numeric operation, the one that generates the > IV and sets the pIOK flag. (On subsequent operations the pIOK flag > effectively muffles the warning despite the lack of IOK.) But that's > where we are: IOK tells you very little about the relationship between > the IV and PV slots (even excluding dualvars). >
>>(If IOK and pIOK do not mean these things then it is a total waste to >>have both set of flags, which seems an unreasonable interpretation.)
> > What IOK actually represents does seem wasteful, considering how little > use is made of it. Not a *total* waste, but a poor use of a precious > flag bit. But being wasteful doesn't mean it isn't what actually happens.
I am not sure that "what happens" is relevant here, I think that what is relevant here is what *should* happen. It seems to me we have bug on top of bug here that builds up the status quo, and IMO it is better to fix the status quo than it is to live with it. This whole business is *insanely* complex and as you say the flags are being poorly used. Lets fix this once and for all. Yves
MIME-Version: 1.0
X-Spam-Status: No, score=-4.163 tagged_above=-99.9 required=10 tests=[AWL=1.748, BAYES_00=-1.9, FROM_OUR_RT=-4, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
In-Reply-To: <rt-4.0.18-11548-1422880261-1625.101876-6-0@rt.cpan.org>
Content-Disposition: inline
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-101876@rt.cpan.org> <20150202103301.GA4367@fysh.org> <rt-4.0.18-32009-1422873200-641.101876-4-0@rt.cpan.org> <CANgJU+WC_FDeBuhoGSyPiHG=4H8MvVzLCRVAy=0c4qOyfcctXQ@mail.gmail.com> <rt-4.0.18-12552-1422874268-353.101876-6-0@rt.cpan.org> <20150202121900.GA18814@fysh.org> <rt-4.0.18-14868-1422879555-1377.101876-5-0@rt.cpan.org> <CANgJU+XpxH9h9i9iANWZiMSH0ADGEHgeSzuCePJSj3OG=HWLUg@mail.gmail.com> <rt-4.0.18-11548-1422880261-1625.101876-6-0@rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <20150202130140.GB18814@fysh.org>
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -4.163
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 342212402E6 for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 08:01:52 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 691CBPGvXDrl for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 08:01:50 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 7D12E2400EC for <bug-Sereal-Encoder@rt.cpan.org>; Mon, 2 Feb 2015 08:01:50 -0500 (EST)
Received: (qmail 10908 invoked by alias); 2 Feb 2015 13:01:50 -0000
Received: from river.fysh.org (HELO river.fysh.org) (5.135.154.127) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Mon, 02 Feb 2015 05:01:48 -0800
Received: from zefram by river.fysh.org with local (Exim 4.80 #2 (Debian)) id 1YIGdI-00071j-RG; Mon, 02 Feb 2015 13:01:40 +0000
Delivered-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
Subject: Re: [rt.cpan.org #101876] losing string value of semi-numeric string
Return-Path: <zefram@fysh.org>
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
X-RT-Mail-Extension: sereal-encoder
Date: Mon, 2 Feb 2015 13:01:40 +0000
X-Spam-Level:
To: demerphq via RT <bug-Sereal-Encoder@rt.cpan.org>
From: Zefram <zefram@fysh.org>
RT-Message-ID: <rt-4.0.18-15829-1422882113-133.101876-0-0@rt.cpan.org>
Content-Length: 1771
demerphq via RT wrote:
Show quoted text
>As far as I can tell it is a bug in the "0 but true" branch as it does >not set the IS_NUMBER_TRAILING flag.
No, that's not a bug. The special handling is precisely that the trailing "but true" is not regarded as "trailing trash" (which would render the string non-numeric). Note that trailing spaces are likewise not regarded as trash, nor are leading spaces or leading zeroes. If you want grok_number_flags() to detect non-canonical integer representations, it would need a new output flag to represent that state and several new bits of logic to set it.
Show quoted text
>And it seems to also be a bug in sv_2iuv_common() which seems to not >check the IS_NUMBER_TRAILING flag that is set on these examples.
sv_2iuv_common() doesn't pass in the PERL_SCAN_TRAILING flag that would request the IS_NUMBER_TRAILING flag to be used. Without it, the "trailing trash" case returns numtype==0 rather than a numtype with the IS_NUMBER_TRAILING flag set. It is the lack of the IS_NUMBER_IN_UV flag, rather than the presence of IS_NUMBER_TRAILING, that controls sv_2iuv_common().
Show quoted text
>Actually I cant find *any* code that checks the IS_NUMBER_TRAILING flag.
I concur. There's only one use of PERL_SCAN_TRAILING, connected with magic incrementation, and it's being used to avoid getting numtype==0 rather than to get the IS_NUMBER_TRAILING flag.
Show quoted text
>I am not sure that "what happens" is relevant here, I think that what >is relevant here is what *should* happen.
What happens is very relevant when processing strings on current Perl versions and attempting to serialise data structures containing them. Changing the behaviour of the IOK flag in future Perl versions wouldn't invalidate the need for serialisation to preserve string values on recent versions. -zefram
MIME-Version: 1.0
X-Spam-Status: No, score=-5.142 tagged_above=-99.9 required=10 tests=[AWL=1.557, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FROM_OUR_RT=-4, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
In-Reply-To: <rt-4.0.18-15829-1422882113-1439.101876-5-0@rt.cpan.org>
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-101876@rt.cpan.org> <20150202103301.GA4367@fysh.org> <rt-4.0.18-32009-1422873200-641.101876-4-0@rt.cpan.org> <CANgJU+WC_FDeBuhoGSyPiHG=4H8MvVzLCRVAy=0c4qOyfcctXQ@mail.gmail.com> <rt-4.0.18-12552-1422874268-353.101876-6-0@rt.cpan.org> <20150202121900.GA18814@fysh.org> <rt-4.0.18-14868-1422879555-1377.101876-5-0@rt.cpan.org> <CANgJU+XpxH9h9i9iANWZiMSH0ADGEHgeSzuCePJSj3OG=HWLUg@mail.gmail.com> <rt-4.0.18-11548-1422880261-1625.101876-6-0@rt.cpan.org> <20150202130140.GB18814@fysh.org> <rt-4.0.18-15829-1422882113-1439.101876-5-0@rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Received: by 10.140.104.228 with SMTP id a91mr39716372qgf.46.1422883885004; Mon, 02 Feb 2015 05:31:25 -0800 (PST)
Message-ID: <CANgJU+Vi05zm1dOBPJtJpnPxuO-Q5RDfA8FDQdFTc9f4izo0LA@mail.gmail.com>
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -5.142
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i=@gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 94CDC2400EC for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 08:31:33 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YTr8O6pbiTVy for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 08:31:32 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 12C13240061 for <bug-Sereal-Encoder@rt.cpan.org>; Mon, 2 Feb 2015 08:31:31 -0500 (EST)
Received: (qmail 12616 invoked by alias); 2 Feb 2015 13:31:31 -0000
Received: from mail-qa0-f49.google.com (HELO mail-qa0-f49.google.com) (209.85.216.49) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Mon, 02 Feb 2015 05:31:28 -0800
Received: by mail-qa0-f49.google.com with SMTP id v8so28911589qal.8 for <bug-Sereal-Encoder@rt.cpan.org>; Mon, 02 Feb 2015 05:31:25 -0800 (PST)
Received: by 10.140.101.145 with HTTP; Mon, 2 Feb 2015 05:31:24 -0800 (PST)
Delivered-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
Subject: Re: [rt.cpan.org #101876] losing string value of semi-numeric string
Return-Path: <demerphq@gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=glpeEX5ChesoxT0dwDg3sdTQX53KKoy8TLc79IISKTw=; b=o+LgI7zv/002r5wRCXR5oKKWvbQI4iZii5CsgQV8UXyWyg24wzJafBmfcrwAwfK6Kv CczU81sUDlvsCFwyWhRFGigDspFmaNgZe+9ZhUh3SJtM8QnT+UkxkVk1KLh2EIse7nXv 9+5JbyXzWfRASybylTYC8IT1J2mI36i/HPHbEMNNqvkZATFKfhge9QX0luJyCIAaYE41 Xe5YJHwDjqGwi1q8oRcgj+SGDJK/N7PwXsIyNi1mydwwfouBeDPRg6EF2RUmo8zn1VuB E2wX8deqFrnolhuz8kXqmfjp3EWZI1TPwqBUH3d1IUp1lxiM1K5v22X8B8U4O48R26p/ im1A==
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
X-RT-Mail-Extension: sereal-encoder
Date: Mon, 2 Feb 2015 14:31:24 +0100
X-Spam-Level:
To: bug-Sereal-Encoder@rt.cpan.org
From: demerphq <demerphq@gmail.com>
RT-Message-ID: <rt-4.0.18-15848-1422883894-333.101876-0-0@rt.cpan.org>
Content-Length: 2526
On 2 February 2015 at 14:01, Zefram via RT <bug-Sereal-Encoder@rt.cpan.org> wrote:
Show quoted text
> Queue: Sereal-Encoder > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=101876 > > > demerphq via RT wrote:
>>As far as I can tell it is a bug in the "0 but true" branch as it does >>not set the IS_NUMBER_TRAILING flag.
> > No, that's not a bug. The special handling is precisely that the > trailing "but true" is not regarded as "trailing trash" (which would > render the string non-numeric). Note that trailing spaces are likewise > not regarded as trash, nor are leading spaces or leading zeroes. If you > want grok_number_flags() to detect non-canonical integer representations, > it would need a new output flag to represent that state and several new > bits of logic to set it.
It looks like a bug/oversight to me.
Show quoted text
>
>>And it seems to also be a bug in sv_2iuv_common() which seems to not >>check the IS_NUMBER_TRAILING flag that is set on these examples.
> > sv_2iuv_common() doesn't pass in the PERL_SCAN_TRAILING flag that > would request the IS_NUMBER_TRAILING flag to be used.
Yes, that is also true, probably because it calls the grok_number() wrapper which has no way to pass in this flag.
Show quoted text
> Without it, the > "trailing trash" case returns numtype==0 rather than a numtype with the > IS_NUMBER_TRAILING flag set. It is the lack of the IS_NUMBER_IN_UV > flag, rather than the presence of IS_NUMBER_TRAILING, that controls > sv_2iuv_common().
Yes I know. And my position is that that is the bug.
Show quoted text
>>Actually I cant find *any* code that checks the IS_NUMBER_TRAILING flag.
> > I concur. There's only one use of PERL_SCAN_TRAILING, connected with > magic incrementation, and it's being used to avoid getting numtype==0 > rather than to get the IS_NUMBER_TRAILING flag.
Hrm.
Show quoted text
>>I am not sure that "what happens" is relevant here, I think that what >>is relevant here is what *should* happen.
> > What happens is very relevant when processing strings on current Perl > versions and attempting to serialise data structures containing them. > Changing the behaviour of the IOK flag in future Perl versions wouldn't > invalidate the need for serialisation to preserve string values on > recent versions.
I think the right plan is to fix Perl, and then fix Sereal to work around previous versions. The current state of this logic is completely unacceptable and not fit for purpose. *an entire bit every sv to prevent "not a number" warnings*? Wtf? Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
MIME-Version: 1.0
X-Spam-Status: No, score=-2.204 tagged_above=-99.9 required=10 tests=[AWL=-0.293, BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
In-Reply-To: <rt-4.0.18-15848-1422883894-68.101876-6-0@rt.cpan.org>
Content-Disposition: inline
X-Spam-Flag: NO
X-RT-Interface: API
References: <20150202103301.GA4367@fysh.org> <rt-4.0.18-32009-1422873200-641.101876-4-0@rt.cpan.org> <CANgJU+WC_FDeBuhoGSyPiHG=4H8MvVzLCRVAy=0c4qOyfcctXQ@mail.gmail.com> <rt-4.0.18-14868-1422879555-1377.101876-5-0@rt.cpan.org> <CANgJU+XpxH9h9i9iANWZiMSH0ADGEHgeSzuCePJSj3OG=HWLUg@mail.gmail.com> <rt-4.0.18-11548-1422880261-1625.101876-6-0@rt.cpan.org> <20150202130140.GB18814@fysh.org> <rt-4.0.18-15829-1422882113-1439.101876-5-0@rt.cpan.org> <CANgJU+Vi05zm1dOBPJtJpnPxuO-Q5RDfA8FDQdFTc9f4izo0LA@mail.gmail.com> <rt-4.0.18-15848-1422883894-68.101876-6-0@rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <20150202174209.GA31028@fysh.org>
Content-Type: multipart/mixed; boundary="HlL+5n6rz5pIUxbD"
X-Spam-Score: -2.204
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 29BF52403ED for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 12:42:24 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pF549nWp+NmP for <cpan-bug+Sereal-Encoder@hipster.bestpractical.com>; Mon, 2 Feb 2015 12:42:19 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id D60B72400EC for <bug-Sereal-Encoder@rt.cpan.org>; Mon, 2 Feb 2015 12:42:18 -0500 (EST)
Received: (qmail 29220 invoked by alias); 2 Feb 2015 17:42:18 -0000
Received: from river.fysh.org (HELO river.fysh.org) (5.135.154.127) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Mon, 02 Feb 2015 09:42:16 -0800
Received: from zefram by river.fysh.org with local (Exim 4.80 #2 (Debian)) id 1YIL0j-0000HL-2j; Mon, 02 Feb 2015 17:42:09 +0000
Delivered-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
Subject: Re: [rt.cpan.org #101876] losing string value of semi-numeric string
Return-Path: <zefram@fysh.org>
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Sereal-Encoder@hipster.bestpractical.com
X-RT-Mail-Extension: sereal-encoder
Date: Mon, 2 Feb 2015 17:42:09 +0000
X-Spam-Level:
To: demerphq via RT <bug-Sereal-Encoder@rt.cpan.org>
From: Zefram <zefram@fysh.org>
RT-Message-ID: <rt-4.0.18-4748-1422898945-761.101876-0-0@rt.cpan.org>
Content-Length: 0
content-type: text/plain; charset="utf-8"
Content-Disposition: inline
X-RT-Original-Encoding: utf-8
Content-Length: 702
Attached are two patches. The first adds failing test cases for various kinds of semi-numeric string. The second implements the simplest possible fix: to prefer PV serialisation where multiple OK flags are set. This totally fixes strings, but at the expense of both some space efficiency and, more importantly, behaviour on some numeric values that have been coerced to string. (Those get POK set in a way that's just as misleading as IOK being set on "00".) For example, Data::Float::nextup(0.5) stringifies as "0.5", just as 0.5 does, so using string encoding for it is lossy. Fully general fix really needs to try the implicit coercions, to see which slot fully describes the scalar. -zefram
Content-Type: text/x-diff; charset="us-ascii"
Content-Disposition: attachment; filename="seminumeric_test.patch"
X-RT-Original-Encoding: ascii
Content-Length: 520

Message body is not shown because sender requested not to inline it.

Content-Type: text/x-diff; charset="us-ascii"
Content-Disposition: attachment; filename="seminumeric_simple_fix.patch"
X-RT-Original-Encoding: ascii
Content-Length: 2939

Message body is not shown because sender requested not to inline it.



This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.