Skip Menu |
 

This queue is for tickets about the SOAP-Lite CPAN distribution.

Report information
The Basics
Id: 32952
Status: resolved
Priority: 0/
Queue: SOAP-Lite

People
Owner: Nobody in particular
Requestors: gwittel [...] proofpoint.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



MIME-Version: 1.0
X-Spam-Status: No, hits=-1.9 required=8.0 tests=BAYES_00,SPF_NEUTRAL
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx engine=3.1.0-0801230000 definitions=main-0802050075
content-type: text/plain; charset="utf-8"; format="flowed"
Received: from x1.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id 41EEF4D81DA for <bug-SOAP-Lite [...] rt.cpan.org>; Tue, 5 Feb 2008 15:20:15 -0500 (EST)
Received: (qmail 7456 invoked from network); 5 Feb 2008 20:20:15 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 5 Feb 2008 20:20:15 -0000
Received: from mx2.proofpoint.com (HELO admin1009.us.proofpoint.com) (207.111.236.2) by 16.mx.develooper.com (qpsmtpd/0.40-dev) with ESMTP; Tue, 05 Feb 2008 12:20:10 -0800
Received: from binky.us.proofpoint.com (mail-ext.us.proofpoint.com [10.20.0.200]) by admin1009.us.proofpoint.com (8.13.8/8.13.8) with ESMTP id m15KK68f012290 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for <bug-SOAP-Lite [...] rt.cpan.org>; Tue, 5 Feb 2008 12:20:06 -0800
Received: from [10.23.12.68] (cup-wk1034.corp.proofpoint.com [10.23.12.68]) by binky.us.proofpoint.com (8.13.6/8.12.8) with ESMTP id m15KK1K0032429 for <bug-SOAP-Lite [...] rt.cpan.org>; Tue, 5 Feb 2008 12:20:05 -0800
Delivered-To: cpan-bug+SOAP-Lite [...] diesel.bestpractical.com
User-Agent: Thunderbird 2.0.0.9 (Windows/20071031)
Subject: UTF8 Strings Not Marked as UTF8 If Base64 encoded
Return-Path: <gwittel [...] proofpoint.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=4.65.7020:2.3.11,1.2.37,4.0.164 definitions=2008-02-05_04:2008-02-01,2008-02-05,2008-02-05 signatures=0
X-Original-To: bug-SOAP-Lite [...] rt.cpan.org
X-Spam-Check-BY: 16.mx.develooper.com
Date: Tue, 05 Feb 2008 12:20:02 -0800
X-Spam-Level: *
Message-Id: <47A8C4F2.3040600 [...] proofpoint.com>
To: bug-SOAP-Lite [...] rt.cpan.org
Content-Transfer-Encoding: 7bit
From: Greg Wittel <gwittel [...] proofpoint.com>
X-RT-Original-Encoding: ISO-8859-1
Content-Length: 1185
Download (untitled) / with headers
text/plain 1.1k
Tried on SOAP::Lite 0.70_4. If a UTF8 string is subjected to base64 encoding (See RT Bug# 30271 ; http://rt.cpan.org/Public/Bug/Display.html?id=30271), the deserialized data does not have its is_utf8 bits set. This means the client gets octets back rather than a string as expected. Based on Bug# 30721 there are 2 ways to fix this: 1) Fix data type detection so that UTF8 data is not detected as binary and sent to base64 encoding: In SOAP::Serializer change: _typelookup => { 'base64binary' => [10, sub { $_[0] =~ ...}, ... ] To (adding the appropriate 'use' statements): _typelookup => { 'base64binary' => [10, sub { ( ! Encode::is_utf8($_[0]) ) && $_[0] =~ .... }, ... ] This assumes that transport charset is UTF8. Not sure what happens if its not. 2) Create a data type 'utf8base64' and properly encode/decode it. The expected behavior should be equivalent to: Serialize: encode_base64( Encode::encode(...) ) De-Serialized: Encode::decode(decode_base64() ... ) This method would be less sensitive to transport charset, but I'm guessing that this would cause interop problems. -Greg
MIME-Version: 1.0
In-Reply-To: <47A8C4F2.3040600 [...] proofpoint.com>
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Charset: utf8
References: <47A8C4F2.3040600 [...] proofpoint.com>
Message-Id: <rt-3.6.HEAD-2766-1202244061-332.32952-0-0 [...] rt.cpan.org>
Content-Type: text/plain
Content-Transfer-Encoding: binary
From: kutterma [...] users.sourceforge.net
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 1501
Download (untitled) / with headers
text/plain 1.4k
I'd suggest a resolution similar to RT Bug# 30271: perl 5.8 and above should not detect utf-8 as binary, and there's no use fixing it for perls below (to which unicode strings are just octets). On Tue Feb 05 15:20:36 2008, gwittel@proofpoint.com wrote: Show quoted text
> Tried on SOAP::Lite 0.70_4. > > If a UTF8 string is subjected to base64 encoding (See RT Bug# 30271 ; > http://rt.cpan.org/Public/Bug/Display.html?id=30271), the deserialized
data Show quoted text
> does not have its is_utf8 bits set. This means the client gets octets
back Show quoted text
> rather than a string as expected. > > Based on Bug# 30721 there are 2 ways to fix this: > 1) Fix data type detection so that UTF8 data is not detected as
binary and Show quoted text
> sent to base64 encoding: > In SOAP::Serializer change: > _typelookup => { > 'base64binary' => [10, sub { $_[0] =~ ...}, ... ] > > To (adding the appropriate 'use' statements): > _typelookup => { > 'base64binary' => [10, sub { ( ! > Encode::is_utf8($_[0]) ) && $_[0] =~ .... }, ... ] > > This assumes that transport charset is UTF8. Not sure what
happens if Show quoted text
> its not. > > 2) Create a data type 'utf8base64' and properly encode/decode it. > The expected behavior should be equivalent to: > Serialize: encode_base64( Encode::encode(...) ) > De-Serialized: Encode::decode(decode_base64() ... ) > This method would be less sensitive to transport charset, but I'm > guessing that this would cause interop problems. > > -Greg
MIME-Version: 1.0
X-Spam-Status: No, hits=-1.9 required=8.0 tests=BAYES_00,SPF_NEUTRAL
In-Reply-To: <rt-3.6.HEAD-2766-1202244061-332.32952-6-0 [...] rt.cpan.org>
References: <RT-Ticket-32952 [...] rt.cpan.org> <47A8C4F2.3040600 [...] proofpoint.com> <rt-3.6.HEAD-2766-1202244061-332.32952-6-0 [...] rt.cpan.org>
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx engine=3.1.0-0801230000 definitions=main-0802050078
Content-Type: text/plain; charset=UTF-8; format=flowed
X-RT-Original-Encoding: utf-8
Received: from x1.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id BE53C4D81C9 for <bug-SOAP-Lite [...] rt.cpan.org>; Tue, 5 Feb 2008 16:31:27 -0500 (EST)
Received: (qmail 4230 invoked from network); 5 Feb 2008 21:31:26 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 5 Feb 2008 21:31:26 -0000
Received: from mx2.proofpoint.com (HELO admin1009.us.proofpoint.com) (207.111.236.2) by 16.mx.develooper.com (qpsmtpd/0.40-dev) with ESMTP; Tue, 05 Feb 2008 13:31:20 -0800
Received: from binky.us.proofpoint.com (mail-ext.us.proofpoint.com [10.20.0.200]) by admin1009.us.proofpoint.com (8.13.8/8.13.8) with ESMTP id m15LVAoF023200 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for <bug-SOAP-Lite [...] rt.cpan.org>; Tue, 5 Feb 2008 13:31:13 -0800
Received: from [10.23.12.68] (cup-wk1034.corp.proofpoint.com [10.23.12.68]) by binky.us.proofpoint.com (8.13.6/8.12.8) with ESMTP id m15LV9Yk003037 for <bug-SOAP-Lite [...] rt.cpan.org>; Tue, 5 Feb 2008 13:31:09 -0800
Delivered-To: cpan-bug+SOAP-Lite [...] diesel.bestpractical.com
Subject: Re: [rt.cpan.org #32952] UTF8 Strings Not Marked as UTF8 If Base64 encoded
User-Agent: Thunderbird 2.0.0.9 (Windows/20071031)
Return-Path: <gwittel [...] proofpoint.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=4.65.7020:2.3.11,1.2.37,4.0.164 definitions=2008-02-05_04:2008-02-01,2008-02-05,2008-02-05 signatures=0
X-Spam-Check-BY: 16.mx.develooper.com
X-Original-To: bug-SOAP-Lite [...] rt.cpan.org
Date: Tue, 05 Feb 2008 13:31:09 -0800
X-Spam-Level: *
Message-Id: <47A8D59D.7000900 [...] proofpoint.com>
To: bug-SOAP-Lite [...] rt.cpan.org
Content-Transfer-Encoding: 7bit
From: Greg Wittel <gwittel [...] proofpoint.com>
X-RT-Original-Encoding: utf-8
RT-Message-ID: <rt-3.6.HEAD-2782-1202247112-632.32952-0-0 [...] rt.cpan.org>
Content-Length: 805
Download (untitled) / with headers
text/plain 805b
Martin Kutter via RT wrote: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=32952 > > > I'd suggest a resolution similar to RT Bug# 30271: perl 5.8 and above > should not detect utf-8 as binary, and there's no use fixing it for > perls below (to which unicode strings are just octets). >
Thanks for the quick response. To do something similar to Bug# 30271, how would we handle marking data as UTF8 on deserialization? Its just a bunch of base64 encoded octets so there's no way to know if it should be marked as such or not. If you mean implementing a new data type in a way similar to #30271, that should work. The deserialization problem is why I suggested fixing the base64binary type lookup since it incorrectly detects some UTF8 strings (such as Japanese characters) as binary. -Greg
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-2782-1202247112-632.32952-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Charset: utf8
References: <RT-Ticket-32952 [...] rt.cpan.org> <47A8C4F2.3040600 [...] proofpoint.com> <rt-3.6.HEAD-2766-1202244061-332.32952-6-0 [...] rt.cpan.org> <47A8D59D.7000900 [...] proofpoint.com> <rt-3.6.HEAD-2782-1202247112-632.32952-0-0 [...] rt.cpan.org>
Message-Id: <rt-3.6.HEAD-2797-1202283797-557.32952-0-0 [...] rt.cpan.org>
Content-Type: text/plain
Content-Transfer-Encoding: binary
From: kutterma [...] users.sourceforge.net
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 703
Download (untitled) / with headers
text/plain 703b
Sorry for misleading you: A similar fix would mean that SOAP::Lite should not encode unicode strings as base64binary in perl 5.8 and above. The SOAP 1.2 standard demands the use of utf-8 or utf-16 (at least for HTTP), so there should be no problems (SOAP1.1 does not demand a specific encoding). Introducing a "utf8base64" type only helps perls before 5.8 - and that's pretty useless, as these don't have a unicode handling and there's no way to reliably detect whether a sequence of octets is a utf8 string or not. The problem is that this may affect existing SOAP clients and servers, since many of them rely on SOAP::Lites autotyping, so I'd like to discuss it on the SOAP::Lite mailing list first.
MIME-Version: 1.0
X-Spam-Status: No, hits=-1.9 required=8.0 tests=BAYES_00,SPF_NEUTRAL
In-Reply-To: <rt-3.6.HEAD-2797-1202283797-557.32952-6-0 [...] rt.cpan.org>
References: <RT-Ticket-32952 [...] rt.cpan.org> <47A8C4F2.3040600 [...] proofpoint.com> <rt-3.6.HEAD-2766-1202244061-332.32952-6-0 [...] rt.cpan.org> <47A8D59D.7000900 [...] proofpoint.com> <rt-3.6.HEAD-2782-1202247112-632.32952-6-0 [...] rt.cpan.org> <rt-3.6.HEAD-2797-1202283797-557.32952-6-0 [...] rt.cpan.org>
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx engine=3.1.0-0801230000 definitions=main-0802060068
Content-Type: text/plain; charset=UTF-8; format=flowed
X-RT-Original-Encoding: utf-8
Received: from x1.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id CAB564D8220 for <bug-SOAP-Lite [...] rt.cpan.org>; Wed, 6 Feb 2008 15:51:33 -0500 (EST)
Received: (qmail 24653 invoked from network); 6 Feb 2008 20:51:26 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 6 Feb 2008 20:51:26 -0000
Received: from mx2.proofpoint.com (HELO admin1009.us.proofpoint.com) (207.111.236.2) by 16.mx.develooper.com (qpsmtpd/0.40-dev) with ESMTP; Wed, 06 Feb 2008 12:51:19 -0800
Received: from binky.us.proofpoint.com (mail-ext.us.proofpoint.com [10.20.0.200]) by admin1009.us.proofpoint.com (8.13.8/8.13.8) with ESMTP id m16KpEXK031101 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for <bug-SOAP-Lite [...] rt.cpan.org>; Wed, 6 Feb 2008 12:51:14 -0800
Received: from [10.23.12.68] (cup-wk1034.corp.proofpoint.com [10.23.12.68]) by binky.us.proofpoint.com (8.13.6/8.12.8) with ESMTP id m16KpDVu027953 for <bug-SOAP-Lite [...] rt.cpan.org>; Wed, 6 Feb 2008 12:51:13 -0800
Delivered-To: cpan-bug+SOAP-Lite [...] diesel.bestpractical.com
Subject: Re: [rt.cpan.org #32952] UTF8 Strings Not Marked as UTF8 If Base64 encoded
User-Agent: Thunderbird 2.0.0.9 (Windows/20071031)
Return-Path: <gwittel [...] proofpoint.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=4.65.7020:2.3.11,1.2.37,4.0.164 definitions=2008-02-06_08:2008-02-06,2008-02-06,2008-02-06 signatures=0
X-Spam-Check-BY: 16.mx.develooper.com
X-Original-To: bug-SOAP-Lite [...] rt.cpan.org
Date: Wed, 06 Feb 2008 12:50:59 -0800
X-Spam-Level: *
Message-Id: <47AA1DB3.8080905 [...] proofpoint.com>
To: bug-SOAP-Lite [...] rt.cpan.org
Content-Transfer-Encoding: 7bit
From: Greg Wittel <gwittel [...] proofpoint.com>
X-RT-Original-Encoding: utf-8
RT-Message-ID: <rt-3.6.HEAD-2794-1202331126-509.32952-0-0 [...] rt.cpan.org>
Content-Length: 994
Download (untitled) / with headers
text/plain 994b
Thanks for the clarification. That makes sense. I look forward to seeing what comes of it as I can finally have all client/server code be fully UTF8 transparent. -Greg Martin Kutter via RT wrote: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=32952 > > > Sorry for misleading you: A similar fix would mean that SOAP::Lite > should not encode unicode strings as base64binary in perl 5.8 and above. > The SOAP 1.2 standard demands the use of utf-8 or utf-16 (at least for > HTTP), so there should be no problems (SOAP1.1 does not demand a > specific encoding). > > Introducing a "utf8base64" type only helps perls before 5.8 - and that's > pretty useless, as these don't have a unicode handling and there's no > way to reliably detect whether a sequence of octets is a utf8 string or not. > > The problem is that this may affect existing SOAP clients and servers, > since many of them rely on SOAP::Lites autotyping, so I'd like to > discuss it on the SOAP::Lite mailing list first.
MIME-Version: 1.0
In-Reply-To: <47A8C4F2.3040600 [...] proofpoint.com>
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Charset: utf8
References: <47A8C4F2.3040600 [...] proofpoint.com>
Message-Id: <rt-3.6.HEAD-25306-1203157513-842.32952-0-0 [...] rt.cpan.org>
Content-Type: text/plain
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 364
Download (untitled) / with headers
text/plain 364b
Hi, after discussion on the mailing list, I'm going to resolve this as following: - UTF-8 strings will not be base64 encoded in the future. To avoid breaking things, this behaviour will not be included in the next stable release (which should be out in a few days), but be included in the next devel release after the next stable. Thanks for reporting, Martin
MIME-Version: 1.0
X-Spam-Status: No, hits=-1.9 required=8.0 tests=BAYES_00,SPF_NEUTRAL
In-Reply-To: <rt-3.6.HEAD-25306-1203157513-842.32952-6-0 [...] rt.cpan.org>
References: <RT-Ticket-32952 [...] rt.cpan.org> <47A8C4F2.3040600 [...] proofpoint.com> <rt-3.6.HEAD-25306-1203157513-842.32952-6-0 [...] rt.cpan.org>
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 ipscore=0 phishscore=0 bulkscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx engine=3.1.0-0801230000 definitions=main-0802190045
Content-Type: text/plain; charset=UTF-8; format=flowed
X-RT-Original-Encoding: utf-8
Received: from x1.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id F29FE4D80CA for <bug-SOAP-Lite [...] rt.cpan.org>; Tue, 19 Feb 2008 11:24:43 -0500 (EST)
Received: (qmail 20510 invoked from network); 19 Feb 2008 16:24:43 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 19 Feb 2008 16:24:43 -0000
Received: from mx2.proofpoint.com (HELO admin1009.us.proofpoint.com) (207.111.236.2) by 16.mx.develooper.com (qpsmtpd/0.43rc1) with ESMTP; Tue, 19 Feb 2008 08:24:37 -0800
Received: from binky.us.proofpoint.com (mail-ext.us.proofpoint.com [10.20.0.200]) by admin1009.us.proofpoint.com (8.13.8/8.13.8) with ESMTP id m1JGOHLg001360 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for <bug-SOAP-Lite [...] rt.cpan.org>; Tue, 19 Feb 2008 08:24:18 -0800
Received: from [10.23.12.68] (cup-wk1034.corp.proofpoint.com [10.23.12.68]) by binky.us.proofpoint.com (8.13.6/8.12.8) with ESMTP id m1JGOGr2027034 for <bug-SOAP-Lite [...] rt.cpan.org>; Tue, 19 Feb 2008 08:24:16 -0800
Delivered-To: cpan-bug+SOAP-Lite [...] diesel.bestpractical.com
Subject: Re: [rt.cpan.org #32952] UTF8 Strings Not Marked as UTF8 If Base64 encoded
User-Agent: Thunderbird 2.0.0.9 (Windows/20071031)
Return-Path: <gwittel [...] proofpoint.com>
X-Proofpoint-Virus-Version: vendor=fsecure engine=4.65.7020:2.3.11,1.2.37,4.0.164 definitions=2008-02-19_05:2008-02-18,2008-02-19,2008-02-19 signatures=0
X-Spam-Check-BY: 16.mx.develooper.com
X-Original-To: bug-SOAP-Lite [...] rt.cpan.org
Date: Tue, 19 Feb 2008 08:24:16 -0800
X-Spam-Level: *
Message-Id: <47BB02B0.3080608 [...] proofpoint.com>
To: bug-SOAP-Lite [...] rt.cpan.org
Content-Transfer-Encoding: 7bit
From: Greg Wittel <gwittel [...] proofpoint.com>
X-RT-Original-Encoding: utf-8
RT-Message-ID: <rt-3.6.HEAD-4639-1203438290-1007.32952-0-0 [...] rt.cpan.org>
Content-Length: 628
Download (untitled) / with headers
text/plain 628b
Hi Martin, Thanks for the update. I look forward to seeing the patch as I have to backport it to 0.60 for our internal uses. Regards, -Greg Martin Kutter via RT wrote: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=32952 > > > Hi, > > after discussion on the mailing list, I'm going to resolve this as > following: > > - UTF-8 strings will not be base64 encoded in the future. > > To avoid breaking things, this behaviour will not be included in the > next stable release (which should be out in a few days), but be included > in the next devel release after the next stable. > > Thanks for reporting, > > Martin


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.