Skip Menu |
 

This queue is for tickets about the URI CPAN distribution.

Report information
The Basics
Id: 70161
Status: open
Priority: 0/
Queue: URI

People
Owner: Nobody in particular
Requestors: ruz [...] bestpractical.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



From ruslan.zakirov [...] gmail.com Tue Aug 9 06: 57:08 2011
MIME-Version: 1.0
X-Spam-Status: No, score=-4.296 tagged_above=-99.9 required=10 tests=[AWL=1.814, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_HI=-5, SPF_NEUTRAL=0.779, T_TO_NO_BRKTS_FREEMAIL=0.01] autolearn=ham
X-Spam-Flag: NO
Content-Type: text/plain; charset=UTF-8
Message-ID: <CAMOxC8t1GrhUdFfMS4mwVjN4+UOYBvjse7kNR4K32qX2NgLPwg [...] mail.gmail.com>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Spam-Score: -4.296
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 63507240519 for <cpan-bug+URI [...] hipster.bestpractical.com>; Tue, 9 Aug 2011 06:57:08 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Dl+qqTQiHFCy for <cpan-bug+URI [...] hipster.bestpractical.com>; Tue, 9 Aug 2011 06:57:07 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id F3E5B240461 for <bug-URI [...] rt.cpan.org>; Tue, 9 Aug 2011 06:57:06 -0400 (EDT)
Received: (qmail 23260 invoked by uid 103); 9 Aug 2011 10:57:06 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 9 Aug 2011 10:57:06 -0000
Received: from mail-pz0-f44.google.com (HELO mail-pz0-f44.google.com) (209.85.210.44) by 16.mx.develooper.com (qpsmtpd/0.80/v0.80-19-gf52d165) with ESMTP; Tue, 09 Aug 2011 03:57:04 -0700
Received: by pzk36 with SMTP id 36so4193956pzk.3 for <bug-URI [...] rt.cpan.org>; Tue, 09 Aug 2011 03:57:01 -0700 (PDT)
Received: by 10.142.196.5 with SMTP id t5mr2023028wff.142.1312887421871; Tue, 09 Aug 2011 03:57:01 -0700 (PDT)
Received: by 10.142.126.10 with HTTP; Tue, 9 Aug 2011 03:57:01 -0700 (PDT)
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Delivered-To: cpan-bug+URI [...] hipster.bestpractical.com
Subject: URI parsing may corrupt data if argument is UTF-8 string
Return-Path: <ruslan.zakirov [...] gmail.com>
X-RT-Mail-Extension: uri
X-Original-To: cpan-bug+URI [...] hipster.bestpractical.com
X-Spam-Check-BY: 16.mx.develooper.com
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=slYleDVFXOLASvZ4Dj1BiISKrh2EdzJiWJ3aZ4YjdsY=; b=FZNIlKeHbI734VM6Tfrlz5mkDst30MN8ibdlTCkxiq5lUKuEY5Tvloo82dOuhG+Xr4 39dtOnakmv1lmibpKfhugx3yYm5f4YymFDcXdIlX/gDkD5ZsOVDeaE289GDkiQzMV6HW p0ELHgi5crlnO+YP0ZvuOCZ+2l0JnT2FQTFFo=
X-Google-Sender-Auth: pBVg5bIUj3m5B0XBLSVrtgqLUm4
Sender: ruslan.zakirov [...] gmail.com
Date: Tue, 9 Aug 2011 14:57:01 +0400
X-Spam-Level:
To: bug-URI [...] rt.cpan.org
From: Ruslan Zakirov <ruz [...] bestpractical.com>
X-RT-Original-Encoding: utf-8
Content-Length: 360
Download (untitled) / with headers
text/plain 360b
Hello Gisle, Do you consider the following as a bug or as thing requiring an explanation in the docs? use Encode; use URI; use Devel::Peek; my $uri = URI->new(decode_utf8 '?Query=%C3%A4%C3%B6%C3%BC'); Dump( ($uri->query_form('Query'))[1] ); If drop me ideas on how you want this addressed then I can write a patch. -- Best regards, Ruslan.
MIME-Version: 1.0
In-Reply-To: <CAMOxC8t1GrhUdFfMS4mwVjN4+UOYBvjse7kNR4K32qX2NgLPwg [...] mail.gmail.com>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <CAMOxC8t1GrhUdFfMS4mwVjN4+UOYBvjse7kNR4K32qX2NgLPwg [...] mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-22511-1313344605-1922.70161-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 702
Download (untitled) / with headers
text/plain 702b
It might be considered an issue that the internal UTF8-flag set on the string that initialized the URI gets propagated to the values returned by query_form(). In an ideal world this should not change the semantics of the return value; but currently this has issues. For instance decode_utf8() will not decode such values. I fixed that issue in <https://github.com/gisle/uri/commit/8803283ed9d1b67c7f58d2b5d507ede2602c477a>. After this patch your query_form() call will return a byte string. In general it's more problematic that the UTF8 flag determine how chars in the 128 .. 255 range are percent encoded by URI. Don't really have a good (and backwards-compatible) plan for addressing this.
From ruslan.zakirov [...] gmail.com Sun Aug 14 16: 10:42 2011
MIME-Version: 1.0
X-Spam-Status: No, score=-4.301 tagged_above=-99.9 required=10 tests=[AWL=1.809, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_HI=-5, SPF_NEUTRAL=0.779, T_TO_NO_BRKTS_FREEMAIL=0.01] autolearn=ham
In-Reply-To: <rt-3.8.HEAD-22511-1313344605-893.70161-6-0 [...] rt.cpan.org>
X-Spam-Flag: NO
References: <RT-Ticket-70161 [...] rt.cpan.org> <CAMOxC8t1GrhUdFfMS4mwVjN4+UOYBvjse7kNR4K32qX2NgLPwg [...] mail.gmail.com> <rt-3.8.HEAD-22511-1313344605-893.70161-6-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <CAMOxC8u6md0hB29eeEQN3+bLmXXiOzAOWc-p+wWMYdxexzqEVA [...] mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
X-RT-Original-Encoding: utf-8
X-Spam-Score: -4.301
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] gmail.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 7931C240570 for <cpan-bug+URI [...] hipster.bestpractical.com>; Sun, 14 Aug 2011 16:10:42 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kySc6Bf2uFB8 for <cpan-bug+URI [...] hipster.bestpractical.com>; Sun, 14 Aug 2011 16:10:38 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 74657240544 for <bug-URI [...] rt.cpan.org>; Sun, 14 Aug 2011 16:10:37 -0400 (EDT)
Received: (qmail 5936 invoked by uid 103); 14 Aug 2011 20:10:36 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 14 Aug 2011 20:10:36 -0000
Received: from mail-pz0-f44.google.com (HELO mail-pz0-f44.google.com) (209.85.210.44) by 16.mx.develooper.com (qpsmtpd/0.80/v0.80-19-gf52d165) with ESMTP; Sun, 14 Aug 2011 13:10:34 -0700
Received: by pzk36 with SMTP id 36so2164521pzk.3 for <bug-URI [...] rt.cpan.org>; Sun, 14 Aug 2011 13:10:31 -0700 (PDT)
Received: by 10.143.21.16 with SMTP id y16mr1559400wfi.28.1313352630616; Sun, 14 Aug 2011 13:10:30 -0700 (PDT)
Received: by 10.142.126.10 with HTTP; Sun, 14 Aug 2011 13:10:30 -0700 (PDT)
Delivered-To: cpan-bug+URI [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #70161] URI parsing may corrupt data if argument is UTF-8 string
Return-Path: <ruslan.zakirov [...] gmail.com>
Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=i5V9/jrQ4qtJySUQwT1D8QFQBQoaxkJ1iXZ7KK1mJyU=; b=HTYL5hJ9hOwcK3cwjoOKMvk4Lctl0nXqNBI/6uz+RyabiHqjM9rkWfyM1qcCVQSwty urowbJnEX1IF2sEhZlyOqpMlRcqiyM+omFELLvyUF2vvUofZi9c3Q2DPy8kWqaJekcxL F6kHrDiXw/DGzBx0QlQBlDzmEN2ySzUlWqn2U=
X-Spam-Check-BY: 16.mx.develooper.com
X-Original-To: cpan-bug+URI [...] hipster.bestpractical.com
X-RT-Mail-Extension: uri
X-Google-Sender-Auth: XwKU1iM9z4qykmLUtLeoA-YA1-c
Sender: ruslan.zakirov [...] gmail.com
Date: Mon, 15 Aug 2011 00:10:30 +0400
X-Spam-Level:
To: bug-URI [...] rt.cpan.org
Content-Transfer-Encoding: quoted-printable
From: Ruslan Zakirov <ruz [...] bestpractical.com>
RT-Message-ID: <rt-3.8.HEAD-22516-1313352643-1931.70161-0-0 [...] rt.cpan.org>
Content-Length: 1033
On Sun, Aug 14, 2011 at 9:56 PM, Gisle_Aas via RT <bug-URI@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=70161 > > > It might be considered an issue that the internal UTF8-flag set on the string that initialized the URI gets propagated to the values returned > by query_form().  In an ideal world this should not change the semantics of the return value; but currently this has issues.  For instance > decode_utf8() will not decode such values. > > I fixed that issue in <https://github.com/gisle/uri/commit/8803283ed9d1b67c7f58d2b5d507ede2602c477a>.  After this patch your > query_form() call will return a byte string.
I expected different reaction. Thanks for implementing this change. Bytes are good in this case. Escaped data may be in any encoding. Show quoted text
> In general it's more problematic that the UTF8 flag determine how chars in the 128 .. 255 range are percent encoded by URI.  Don't really > have a good (and backwards-compatible) plan for addressing this.
Understood. -- Best regards, Ruslan.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.