Skip Menu |
 

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 67569
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: florz [...] florz.de
Cc: pali [...] cpan.org
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



From florz [...] florz.de Mon Apr 18 19: 26:08 2011
MIME-Version: 1.0
X-Spam-Status: No, score=-6.9 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5] autolearn=ham
Content-Disposition: inline
X-Spam-Flag: NO
content-type: text/plain; charset="utf-8"
Message-ID: <20110418232555.GA12828 [...] florz.florz.dyndns.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
X-Spam-Score: -6.9
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 0FAFF24196F for <cpan-bug+Encode [...] hipster.bestpractical.com>; Mon, 18 Apr 2011 19:26:08 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZvQNonCIM6d8 for <cpan-bug+Encode [...] hipster.bestpractical.com>; Mon, 18 Apr 2011 19:26:06 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 0AC2F24127F for <bug-Encode [...] rt.cpan.org>; Mon, 18 Apr 2011 19:26:05 -0400 (EDT)
Received: (qmail 11215 invoked by uid 103); 18 Apr 2011 23:26:04 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 18 Apr 2011 23:26:04 -0000
Received: from rain.florz.de (HELO rain.florz.dyndns.org) (62.216.164.86) by 16.mx.develooper.com (qpsmtpd/0.80/v0.80-19-gf52d165) with ESMTP; Mon, 18 Apr 2011 16:26:01 -0700
Received: from florz.florz.dyndns.org ([192.168.0.121]) by rain.florz.dyndns.org with esmtp (Exim 4.69) (envelope-from <florz [...] florz.de>) id 1QBxpM-000206-C6 for bug-Encode [...] rt.cpan.org; Tue, 19 Apr 2011 01:25:56 +0200
Received: from florz by florz.florz.dyndns.org with local (Exim 4.69) (envelope-from <florz [...] florz.de>) id 1QBxpM-0003vL-0y for bug-Encode [...] rt.cpan.org; Tue, 19 Apr 2011 01:25:56 +0200
Delivered-To: cpan-bug+Encode [...] hipster.bestpractical.com
User-Agent: Mutt/1.5.18 (2008-05-17)
Subject: incorrect unfolding and other decoding bugs in Encode::MIME::RFC2047
Return-Path: <florz [...] florz.de>
X-RT-Mail-Extension: encode
X-Original-To: cpan-bug+Encode [...] hipster.bestpractical.com
X-Spam-Check-BY: 16.mx.develooper.com
Date: Tue, 19 Apr 2011 01:25:55 +0200
X-Spam-Level:
To: bug-Encode [...] rt.cpan.org
From: Florian Zumbiehl <florz [...] florz.de>
X-RT-Original-Encoding: us-ascii
Content-Length: 3277
Download (untitled) / with headers
text/plain 3.2k
Hi, I started digging because of the incorrect unfolding in Encode::MIME::RFC2047, which I now noticed is already reported as bug #40027. Essentially people have already explained it correctly: unfolding only eats the CRLF, nothing else (just as the RFC quite clearly states). RFC2047 decoding then additionally eats whitespace between encoded words in *text. _Between_ encoded words only. As I couldn't figure out how to submit additional information for a bug without opening an account, please feel free to merge things as appropriate. This is very relevant practically as the traditional way for breaking long Subject headers, for example, was to insert CRLFs at the beginning of whitespace sequences (well, and still is where no RFC2047 encoding is necessary), which you corrupt with the current code. While digging, I found a bunch more bugs and put together a fix which you find below that should bring the code a lot closer to the RFC. This code indeed is for *text only - there is no way to decode other headers that contain encoded words without first taking apart the respective headers and then decoding words separately anyhow. Also, here is a list of test cases with their respective correct decoding: "foo =?us-ascii?q?bar?=" => "foo bar" "foo\r\n =?us-ascii?q?bar?=" => "foo bar" "=?us-ascii?q?foo?= bar" => "foo bar" "=?us-ascii?q?foo?=\r\n bar" => "foo bar" "foo bar" => "foo bar" "foo\r\n bar" => "foo bar" "=?us-ascii?q?foo?= =?us-ascii?q?bar?=" => "foobar" "=?us-ascii?q?foo?=\r\n =?us-ascii?q?bar?=" => "foobar" "foo=?us-ascii?q?bar?=" => "foo=?us-ascii?q?bar?=" "=?us-ascii?q?foo?==?us-ascii?q?bar?=" => "foo=?us-ascii?q?bar?=" "=?us-ascii?q?foo bar?=" => "=?us-ascii?q?foo bar?=" "=?us-ascii?q?foo\r\n bar?=" => "=?us-ascii?q?foo bar?=" "foo =?us-ascii?q?=20?==?us-ascii?q?bar?=" => "foo =?us-ascii?q?bar?=" Please note that the code is untested as a whole, I just tested pieces separately. diff --git a/cpan/Encode/lib/Encode/MIME/Header.pm b/cpan/Encode/lib/Encode/MIME/Header.pm index 9728dc3..44c7024 100644 --- a/cpan/Encode/lib/Encode/MIME/Header.pm +++ b/cpan/Encode/lib/Encode/MIME/Header.pm @@ -40,23 +40,25 @@ sub decode($$;$) { use utf8; my ( $obj, $str, $chk ) = @_; - # zap spaces between encoded words - $str =~ s/\?=\s+=\?/\?==\?/gos; - # multi-line header to single line - $str =~ s/(?:\r\n|[\r\n])[ \t]//gos; - - 1 while ( $str =~ - s/(=\?[-0-9A-Za-z_]+\?[Qq]\?)(.*?)\?=\1(.*?\?=)/$1$2$3/ ) + $str =~ s/(?:\r\n|[\r\n])(?=[ \t])//gos; + + 1 while ( $str =~ s/ + (?:\A|(?<=[ \t])) + (=\?[-0-9A-Za-z_]+\?[Qq]\?)([\x21-\x3e\x40-\x7e]+)\?= + [ \t]+ + \1([\x21-\x3e\x40-\x7e]+\?=) + /$1$2$3/x ) ; # Concat consecutive QP encoded mime headers # Fixes breaking inside multi-byte characters $str =~ s{ + (?:\A|\G[ \t]+|(?<=[ \t])) =\? # begin encoded word ([-0-9A-Za-z_]+) # charset (encoding) (?:\*[A-Za-z]{1,8}(?:-[A-Za-z]{1,8})*)? # language (RFC 2231) \?([QqBb])\? # delimiter - (.*?) # Base64-encodede contents + ([\x21-\x3e\x40-\x7e]+) \?= # end encoded word }{ if (uc($2) eq 'B'){ Florian
MIME-Version: 1.0
In-Reply-To: <20110418232555.GA12828 [...] florz.florz.dyndns.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <20110418232555.GA12828 [...] florz.florz.dyndns.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-18810-1306019426-573.67569-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 3599
Download (untitled) / with headers
text/plain 3.5k
I tried your patch but unfortunately it breaks existing tests. Dan the Maintainer Thereof On Mon Apr 18 19:26:08 2011, florz@florz.de wrote: Show quoted text
> Hi, > > I started digging because of the incorrect unfolding in > Encode::MIME::RFC2047, which I now noticed is already reported as > bug #40027. Essentially people have already explained it correctly: > unfolding only eats the CRLF, nothing else (just as the RFC quite > clearly states). RFC2047 decoding then additionally eats whitespace > between encoded words in *text. _Between_ encoded words only. > > As I couldn't figure out how to submit additional information for > a bug without opening an account, please feel free to merge things > as appropriate. > > This is very relevant practically as the traditional way for breaking > long Subject headers, for example, was to insert CRLFs at the > beginning of > whitespace sequences (well, and still is where no RFC2047 encoding is > necessary), which you corrupt with the current code. > > While digging, I found a bunch more bugs and put together a fix > which you find below that should bring the code a lot closer to > the RFC. > > This code indeed is for *text only - there is no way to decode > other headers that contain encoded words without first taking apart > the respective headers and then decoding words separately anyhow. > > Also, here is a list of test cases with their respective correct > decoding: > > "foo =?us-ascii?q?bar?=" => "foo bar" > "foo\r\n =?us-ascii?q?bar?=" => "foo bar" > "=?us-ascii?q?foo?= bar" => "foo bar" > "=?us-ascii?q?foo?=\r\n bar" => "foo bar" > "foo bar" => "foo bar" > "foo\r\n bar" => "foo bar" > "=?us-ascii?q?foo?= =?us-ascii?q?bar?=" => "foobar" > "=?us-ascii?q?foo?=\r\n =?us-ascii?q?bar?=" => "foobar" > "foo=?us-ascii?q?bar?=" => "foo=?us-ascii?q?bar?=" > "=?us-ascii?q?foo?==?us-ascii?q?bar?=" => "foo=?us-ascii?q?bar?=" > "=?us-ascii?q?foo bar?=" => "=?us-ascii?q?foo bar?=" > "=?us-ascii?q?foo\r\n bar?=" => "=?us-ascii?q?foo bar?=" > "foo =?us-ascii?q?=20?==?us-ascii?q?bar?=" => "foo =?us- > ascii?q?bar?=" > > Please note that the code is untested as a whole, I just tested pieces > separately. > > diff --git a/cpan/Encode/lib/Encode/MIME/Header.pm > b/cpan/Encode/lib/Encode/MIME/Header.pm > index 9728dc3..44c7024 100644 > --- a/cpan/Encode/lib/Encode/MIME/Header.pm > +++ b/cpan/Encode/lib/Encode/MIME/Header.pm > @@ -40,23 +40,25 @@ sub decode($$;$) { > use utf8; > my ( $obj, $str, $chk ) = @_; > > - # zap spaces between encoded words > - $str =~ s/\?=\s+=\?/\?==\?/gos; > - > # multi-line header to single line > - $str =~ s/(?:\r\n|[\r\n])[ \t]//gos; > - > - 1 while ( $str =~ > - s/(=\?[-0-9A-Za-z_]+\?[Qq]\?)(.*?)\?=\1(.*?\?=)/$1$2$3/ ) > + $str =~ s/(?:\r\n|[\r\n])(?=[ \t])//gos; > + > + 1 while ( $str =~ s/ > + (?:\A|(?<=[ \t])) > + (=\?[-0-9A-Za-z_]+\?[Qq]\?)([\x21-\x3e\x40-\x7e]+)\?= > + [ \t]+ > + \1([\x21-\x3e\x40-\x7e]+\?=) > + /$1$2$3/x ) > ; # Concat consecutive QP encoded mime headers > # Fixes breaking inside multi-byte characters > > $str =~ s{ > + (?:\A|\G[ \t]+|(?<=[ \t])) > =\? # begin encoded word > ([-0-9A-Za-z_]+) # charset (encoding) > (?:\*[A-Za-z]{1,8}(?:-[A-Za-z]{1,8})*)? # language (RFC 2231) > \?([QqBb])\? # delimiter > - (.*?) # Base64-encodede contents > + ([\x21-\x3e\x40-\x7e]+) > \?= # end encoded word > }{ > if (uc($2) eq 'B'){ > > Florian
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-18810-1306019426-573.67569-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <20110418232555.GA12828 [...] florz.florz.dyndns.org> <rt-3.8.HEAD-18810-1306019426-573.67569-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.16-10373-1379452665-646.67569-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 472
Download (untitled) / with headers
text/plain 472b
On Sat May 21 19:10:27 2011, DANKOGAI wrote: Show quoted text
> I tried your patch but unfortunately it breaks existing tests.
The two tests it breaks may not actually be correct--- they aren't RFC2047-conformant examples, at least. As Florian Zumbiehl says, decoding To/From headers has to happen after tokenization if you want to get the right results; unless the caller has already done that, Encode::MIME::RFC2047 can probably only correctly decode the *text headers such as Subject.
From florz [...] florz.de Wed Sep 18 02: 39:14 2013
CC: Wim Lewis via RT <bug-Encode [...] rt.cpan.org>
MIME-Version: 1.0
X-Spam-Status: No, score=-5.65 tagged_above=-99.9 required=10 tests=[AWL=1.250, BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5] autolearn=ham
In-Reply-To: <rt-4.0.16-10373-1379452665-729.67569-6-0 [...] rt.cpan.org>
Content-Disposition: inline
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-67569 [...] rt.cpan.org> <20110418232555.GA12828 [...] florz.florz.dyndns.org> <rt-3.8.HEAD-18810-1306019426-573.67569-6-0 [...] rt.cpan.org> <rt-4.0.16-10373-1379452665-729.67569-6-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <20130918063923.GA1158 [...] florz.florz.dyndns.org>
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -5.65
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id CC89D2412A2 for <cpan-bug+Encode [...] hipster.bestpractical.com>; Wed, 18 Sep 2013 02:39:14 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id o6sKU8kR+sPD for <cpan-bug+Encode [...] hipster.bestpractical.com>; Wed, 18 Sep 2013 02:39:10 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 2815824129B for <bug-Encode [...] rt.cpan.org>; Wed, 18 Sep 2013 02:39:09 -0400 (EDT)
Received: (qmail 26044 invoked by alias); 18 Sep 2013 06:39:09 -0000
Received: from rain.florz.de (HELO rain.florz.dyndns.org) (62.216.164.86) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Tue, 17 Sep 2013 23:39:03 -0700
Received: from florz.florz.dyndns.org ([192.168.0.121]) by rain.florz.dyndns.org with esmtp (Exim 4.69) (envelope-from <florz [...] florz.de>) id 1VMBPc-0002C9-GG; Wed, 18 Sep 2013 08:38:56 +0200
Received: from florz by florz.florz.dyndns.org with local (Exim 4.72) (envelope-from <florz [...] florz.de>) id 1VMBQ3-0001ZC-L9; Wed, 18 Sep 2013 08:39:23 +0200
Delivered-To: cpan-bug+Encode [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #67569] incorrect unfolding and other decoding bugs in Encode::MIME::RFC2047
User-Agent: Mutt/1.5.20 (2009-06-14)
Return-Path: <florz [...] florz.de>
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Encode [...] hipster.bestpractical.com
X-RT-Mail-Extension: encode
Date: Wed, 18 Sep 2013 08:39:23 +0200
X-Spam-Level:
To: wiml [...] hhhh.org
From: Florian Zumbiehl <florz [...] florz.de>
RT-Message-ID: <rt-4.0.16-14333-1379486355-1921.67569-0-0 [...] rt.cpan.org>
Content-Length: 798
Download (untitled) / with headers
text/plain 798b
Hi, Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=67569 > > > On Sat May 21 19:10:27 2011, DANKOGAI wrote:
> > I tried your patch but unfortunately it breaks existing tests.
> > The two tests it breaks may not actually be correct--- they aren't RFC2047-conformant examples, at least. As Florian Zumbiehl says, decoding To/From headers has to happen after tokenization if you want to get the right results; unless the caller has already done that, Encode::MIME::RFC2047 can probably only correctly decode the *text headers such as Subject.
can you point me to the tests that are failing with the patch? I asked the maintainer for more specific information about two years ago by email but never got a reply, and unfortunately have forgotten most of the details by now ... Regards, Florian
MIME-Version: 1.0
In-Reply-To: <rt-4.0.16-14333-1379486355-1921.67569-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <RT-Ticket-67569 [...] rt.cpan.org> <20110418232555.GA12828 [...] florz.florz.dyndns.org> <rt-3.8.HEAD-18810-1306019426-573.67569-6-0 [...] rt.cpan.org> <rt-4.0.16-10373-1379452665-729.67569-6-0 [...] rt.cpan.org> <20130918063923.GA1158 [...] florz.florz.dyndns.org> <rt-4.0.16-14333-1379486355-1921.67569-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.16-25297-1379550342-205.67569-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 755
Download (untitled) / with headers
text/plain 755b
On Wed Sep 18 02:39:15 2013, florz@florz.de wrote: Show quoted text
> can you point me to the tests that are failing with the patch?
The tests are the ones decoding $bheader and $qheader in t/mime-heaader.t. The relevant bits of text are From:=?UTF-8?B?IOWwj+mjvCDlvL4g?=<dankogai@dan.co.jp> To: dankogai@dan.co.jp (=?UTF-8?B?5bCP6aO8?==Kogai,=?UTF-8?B?IOW8vg==?== Dan) In both cases, the patched MIME::RFC2047 decoder doesn't translate the encoded-words which are run together with adjacent tokens, and I think the patched behavior is more correct. The first line would be decodable by a program which tokenized the header and passed only the 2047-encodable phrases to Encode, but the To: header shouldn't be decodable by an RFC2047 6.1(2) compliant decoder.
From florz [...] florz.de Thu Sep 19 07: 45:53 2013
MIME-Version: 1.0
X-Spam-Status: No, score=-5.9 tagged_above=-99.9 required=10 tests=[AWL=1.000, BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5] autolearn=ham
In-Reply-To: <rt-4.0.16-25297-1379550342-1301.67569-6-0 [...] rt.cpan.org>
Content-Disposition: inline
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-67569 [...] rt.cpan.org> <20110418232555.GA12828 [...] florz.florz.dyndns.org> <rt-3.8.HEAD-18810-1306019426-573.67569-6-0 [...] rt.cpan.org> <rt-4.0.16-10373-1379452665-729.67569-6-0 [...] rt.cpan.org> <20130918063923.GA1158 [...] florz.florz.dyndns.org> <rt-4.0.16-14333-1379486355-1921.67569-6-0 [...] rt.cpan.org> <rt-4.0.16-25297-1379550342-1301.67569-6-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <20130919114600.GB1158 [...] florz.florz.dyndns.org>
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -5.9
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id DE0602412BB for <cpan-bug+Encode [...] hipster.bestpractical.com>; Thu, 19 Sep 2013 07:45:51 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MfbtG6s86Yf2 for <cpan-bug+Encode [...] hipster.bestpractical.com>; Thu, 19 Sep 2013 07:45:48 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 522B5240653 for <bug-Encode [...] rt.cpan.org>; Thu, 19 Sep 2013 07:45:47 -0400 (EDT)
Received: (qmail 2468 invoked by alias); 19 Sep 2013 11:45:46 -0000
Received: from rain.florz.de (HELO rain.florz.dyndns.org) (62.216.164.86) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Thu, 19 Sep 2013 04:45:39 -0700
Received: from florz.florz.dyndns.org ([192.168.0.121]) by rain.florz.dyndns.org with esmtp (Exim 4.69) (envelope-from <florz [...] florz.de>) id 1VMcft-0004eD-6o for bug-Encode [...] rt.cpan.org; Thu, 19 Sep 2013 13:45:33 +0200
Received: from florz by florz.florz.dyndns.org with local (Exim 4.72) (envelope-from <florz [...] florz.de>) id 1VMcgK-0003WU-FX for bug-Encode [...] rt.cpan.org; Thu, 19 Sep 2013 13:46:00 +0200
Delivered-To: cpan-bug+Encode [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #67569] incorrect unfolding and other decoding bugs in Encode::MIME::RFC2047
User-Agent: Mutt/1.5.20 (2009-06-14)
Return-Path: <florz [...] florz.de>
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Encode [...] hipster.bestpractical.com
X-RT-Mail-Extension: encode
Date: Thu, 19 Sep 2013 13:46:00 +0200
X-Spam-Level:
To: Wim Lewis via RT <bug-Encode [...] rt.cpan.org>
From: Florian Zumbiehl <florz [...] florz.de>
RT-Message-ID: <rt-4.0.16-24820-1379591154-1566.67569-0-0 [...] rt.cpan.org>
Content-Length: 2293
Download (untitled) / with headers
text/plain 2.2k
Hi, Show quoted text
> The tests are the ones decoding $bheader and $qheader in t/mime-heaader.t. The relevant bits of text are > > From:=?UTF-8?B?IOWwj+mjvCDlvL4g?=<dankogai@dan.co.jp> > To: dankogai@dan.co.jp (=?UTF-8?B?5bCP6aO8?==Kogai,=?UTF-8?B?IOW8vg==?== > Dan) > > In both cases, the patched MIME::RFC2047 decoder doesn't translate the encoded-words which are run together with adjacent tokens, and I think the patched behavior is more correct. The first line would be decodable by a program which tokenized the header and passed only the 2047-encodable phrases to Encode, but the To: header shouldn't be decodable by an RFC2047 6.1(2) compliant decoder.
Well, formally, it (the comment, that is) certainly should be decodable, and it should decode to "=?UTF-8?B?5bCP6aO8?==Kogai,=?UTF-8?B?IOW8vg==?== Dan" ;-) But, yeah, I agree, the test really doesn't make much sense. If $bheader is supposed to be an RFC2047 encoded string, then the decoding in $dheader is wrong, the only thing a correct decoder should change during decoding in that case is (part of) the Subject: field (stricly, only the last three atoms, the first one is not an encoded-word due to its length), and unfold some of the line breaks. Rejecting the input presumably would also be OK as the line breaks in parts don't actually follow the rules for RFC2047 encoded strings. If, on the other hand, this is supposed to be a full set of RFC822 message headers, then putting that through an RFC2047 decoder makes no sense at all, you might just as well try an HTML parser. This is an RFC2047 parser, not an RFC822 parser. I just noticed, though, that these test cases that I submitted probably are wrong as well: "=?us-ascii?q?foo?==?us-ascii?q?bar?=" => "foo=?us-ascii?q?bar?=" "foo =?us-ascii?q?=20?==?us-ascii?q?bar?=" => "foo =?us-ascii?q?bar?=" The correct decodings probably rather should look like this: "=?us-ascii?q?foo?==?us-ascii?q?bar?=" => "=?us-ascii?q?foo?==?us-ascii?q?bar?=" "foo =?us-ascii?q?=20?==?us-ascii?q?bar?=" => "foo =?us-ascii?q?=20?==?us-ascii?q?bar?=" If someone finally manages to merge this bugfix, I might be willing to fix the parser to handle those cases correctly as well, but for now the fix as it is should still be much better than the current state of affairs. Regards, Florian
MIME-Version: 1.0
In-Reply-To: <20110418232555.GA12828 [...] florz.florz.dyndns.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <20110418232555.GA12828 [...] florz.florz.dyndns.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-2595-1446129242-1581.67569-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 1327
Download (untitled) / with headers
text/plain 1.2k
At least the broken decoding is consistent with the encoding in eating white spaces on line wraps. Given the header $h = "From: The quick brown fox runs over the lazy dog \N{WOLF FACE} <wolfy\@example.com>"; Encode produces the incorrect output: From:=?UTF-8?Q?=20The=20quick=20brown=20fox=20?= =?UTF-8?Q?runs=20over=20the=20lazy=20do?= =?UTF-8?Q?g=20=F0=9F=90=BA=20?=< wolfy@example.com> There are two things wrong here: First, when a properly implemented decode is run against that, there is a space between the "<" and the "wolfy". The broken decoder in Encode::MIME::Header eats that space, so it is self-consistent, yet wrong. Secondly, RFC 2047 forbids encoding the address part. Encode does that as well even though the documents state it will not encode parts that are not supposed to be. Given this header: my $h = "From: The quick brown fox runs over the lazy dog \N{WOLF FACE} <wolfy\N{WOLF FACE}\@example.com>"; The output looks like this: From:=?UTF-8?Q?=20The=20quick=20brown=20fox=20?= =?UTF-8?Q?runs=20over=20the=20lazy=20do?= =?UTF-8?Q?g=20=F0=9F=90=BA=20?=< =?UTF-8?Q?wolfy=F0=9F=90=BA=40example=2Ecom?=> Now, I'm not sure what you're supposed to do with that utf8 character in the address part, but 2047 says don't mess with it. Sending it raw works with at least some mail servers and clients.
MIME-Version: 1.0
X-Spam-Status: No, score=-5.91 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, FROM_OUR_RT=-4, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
In-Reply-To: <rt-4.0.18-2595-1446129242-1797.67569-6-0 [...] rt.cpan.org>
Content-Disposition: inline
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-67569 [...] rt.cpan.org> <20110418232555.GA12828 [...] florz.florz.dyndns.org> <rt-4.0.18-2595-1446129242-1797.67569-6-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <20151029202811.GF23465 [...] florz.florz.de>
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -5.91
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id CBF6824008F for <cpan-bug+Encode [...] hipster.bestpractical.com>; Thu, 29 Oct 2015 16:28:28 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nYwIyO4BBB0Z for <cpan-bug+Encode [...] hipster.bestpractical.com>; Thu, 29 Oct 2015 16:28:26 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 8DCA6240020 for <bug-Encode [...] rt.cpan.org>; Thu, 29 Oct 2015 16:28:25 -0400 (EDT)
Received: (qmail 28492 invoked by alias); 29 Oct 2015 20:28:25 -0000
Received: from rain.florz.de (HELO rain.florz.de) (46.101.147.201) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Thu, 29 Oct 2015 13:28:22 -0700
Received: from florz.florz.de ([192.168.0.121]:34090) by rain.florz.de with esmtps (TLSv1.2:DHE-RSA-AES256-SHA256:256) (Exim 4.80) (envelope-from <florz [...] florz.de>) id 1Zrtny-0000PG-DU for bug-Encode [...] rt.cpan.org; Thu, 29 Oct 2015 21:28:14 +0100
Received: from florz by florz.florz.de with local (Exim 4.80) (envelope-from <florz [...] florz.de>) id 1Zrtnv-0003Fp-Od for bug-Encode [...] rt.cpan.org; Thu, 29 Oct 2015 21:28:11 +0100
Delivered-To: cpan-bug+Encode [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #67569] incorrect unfolding and other decoding bugs in Encode::MIME::RFC2047
User-Agent: Mutt/1.5.21 (2010-09-15)
Return-Path: <florz [...] florz.de>
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Encode [...] hipster.bestpractical.com
X-RT-Mail-Extension: encode
Date: Thu, 29 Oct 2015 21:28:11 +0100
X-Spam-Level:
To: Vivek Khera via RT <bug-Encode [...] rt.cpan.org>
From: Florian Zumbiehl <florz [...] florz.de>
RT-Message-ID: <rt-4.0.18-22615-1446150509-1933.67569-0-0 [...] rt.cpan.org>
Content-Length: 952
Download (untitled) / with headers
text/plain 952b
Hi, Show quoted text
> Now, I'm not sure what you're supposed to do with that utf8 character in the address part, but 2047 says don't mess with it. Sending it raw works with at least some mail servers and clients.
You essentially have it all backwards. RFC2047, as far as address fields are concerned, is for encoding the display name, not for "encoding an address field" - feeding an address field into an RFC2047 encoder is a type error. This supposed RFC2047 encoder still is horribly broken, as this: Show quoted text
> my $h = "From: The quick brown fox runs over the lazy dog \N{WOLF FACE} <wolfy\N{WOLF FACE}\@example.com>";
Would have to be encoded correctly into something like this: "=?UTF-8?Q?From=3A?= The quick brown fox runs over the lazy dog =?UTF-8?Q?=F0=9F=90=BA_=3Cwolfy=F0=9F=90=BA=40example=2Ecom=3E?=" And then you could append an address and prepend "From:" in order to use this rather weird display name in the source address of an email. Regards, Florian
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-22615-1446150509-1933.67569-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <RT-Ticket-67569 [...] rt.cpan.org> <20110418232555.GA12828 [...] florz.florz.dyndns.org> <rt-4.0.18-2595-1446129242-1797.67569-6-0 [...] rt.cpan.org> <20151029202811.GF23465 [...] florz.florz.de> <rt-4.0.18-22615-1446150509-1933.67569-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-4203-1453444094-124.67569-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 1119
cf. https://rt.cpan.org/Ticket/Display.html?id=88717 On Thu Oct 29 16:28:29 2015, florz@florz.de wrote: Show quoted text
> Hi, >
> > Now, I'm not sure what you're supposed to do with that utf8 character > > in the address part, but 2047 says don't mess with it. Sending it raw > > works with at least some mail servers and clients.
> > You essentially have it all backwards. RFC2047, as far as address > fields > are concerned, is for encoding the display name, not for "encoding an > address field" - feeding an address field into an RFC2047 encoder is a > type > error. This supposed RFC2047 encoder still is horribly broken, as > this: >
> > my $h = "From: The quick brown fox runs over the lazy dog \N{WOLF > > FACE} <wolfy\N{WOLF FACE}\@example.com>";
> > Would have to be encoded correctly into something like this: > > "=?UTF-8?Q?From=3A?= The quick brown fox runs over the lazy dog =?UTF- > 8?Q?=F0=9F=90=BA_=3Cwolfy=F0=9F=90=BA=40example=2Ecom=3E?=" > > And then you could append an address and prepend "From:" in order to > use > this rather weird display name in the source address of an email. > > Regards, Florian
MIME-Version: 1.0
X-Spam-Status: No, score=-5.901 tagged_above=-99.9 required=10 tests=[BAYES_00=-1.9, FROM_OUR_RT=-4, RP_MATCHES_RCVD=-0.001] autolearn=ham
In-Reply-To: <rt-4.0.18-4203-1453444100-1019.67569-10-0 [...] rt.cpan.org>
Content-Disposition: inline
X-Spam-Flag: NO
X-RT-Interface: API
References: <RT-Ticket-67569 [...] rt.cpan.org> <rt-4.0.18-4203-1453444100-1019.67569-10-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <20160122174832.GG24603 [...] florz.florz.de>
content-type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -5.901
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 60BAF2403A9 for <cpan-bug+Encode [...] hipster.bestpractical.com>; Fri, 22 Jan 2016 12:48:46 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mP4k50+vHwhZ for <cpan-bug+Encode [...] hipster.bestpractical.com>; Fri, 22 Jan 2016 12:48:44 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id 7A523240328 for <bug-Encode [...] rt.cpan.org>; Fri, 22 Jan 2016 12:48:44 -0500 (EST)
Received: (qmail 10586 invoked by alias); 22 Jan 2016 17:48:44 -0000
Received: from rain.florz.de (HELO rain.florz.de) (62.216.164.86) by la.mx.develooper.com (qpsmtpd/0.28) with ESMTP; Fri, 22 Jan 2016 09:48:40 -0800
Received: from florz.florz.de ([192.168.0.121]:44293) by rain.florz.de with esmtps (TLSv1.2:DHE-RSA-AES256-SHA256:256) (Exim 4.80) (envelope-from <florz [...] florz.de>) id 1aMfp5-000086-7Q for bug-Encode [...] rt.cpan.org; Fri, 22 Jan 2016 18:48:35 +0100
Received: from florz by florz.florz.de with local (Exim 4.80) (envelope-from <florz [...] florz.de>) id 1aMfp2-00078t-G4 for bug-Encode [...] rt.cpan.org; Fri, 22 Jan 2016 18:48:32 +0100
Delivered-To: cpan-bug+Encode [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #67569] Resolved: incorrect unfolding and other decoding bugs in Encode::MIME::RFC2047
User-Agent: Mutt/1.5.21 (2010-09-15)
Return-Path: <florz [...] florz.de>
X-Spam-Check-BY: la.mx.develooper.com
X-Original-To: cpan-bug+Encode [...] hipster.bestpractical.com
X-RT-Mail-Extension: encode
Date: Fri, 22 Jan 2016 18:48:32 +0100
X-Spam-Level:
To: Dan Kogai via RT <bug-Encode [...] rt.cpan.org>
From: Florian Zumbiehl <florz [...] florz.de>
RT-Message-ID: <rt-4.0.18-30713-1453484927-1537.67569-0-0 [...] rt.cpan.org>
Content-Length: 231
Download (untitled) / with headers
text/plain 231b
Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=67569 > > > According to our records, your request has been resolved. If you have any > further questions or concerns, please respond to this message.
That's obviously bullshit.
MIME-Version: 1.0
In-Reply-To: <rt-4.0.18-30713-1453484927-1537.67569-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.504 (Entity 5.504)
Content-Disposition: inline
X-RT-Interface: Web
References: <RT-Ticket-67569 [...] rt.cpan.org> <rt-4.0.18-4203-1453444100-1019.67569-10-0 [...] rt.cpan.org> <20160122174832.GG24603 [...] florz.florz.de> <rt-4.0.18-30713-1453484927-1537.67569-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
Message-ID: <rt-4.0.18-25143-1459276650-537.67569-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Encrypt: 0
X-RT-Sign: 0
Content-Length: 27
It should be fixed in 2.83.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.