Skip Menu |
 

This queue is for tickets about the MIME-tools CPAN distribution.

Report information
The Basics
Id: 5462
Status: resolved
Priority: 0/
Queue: MIME-tools

People
Owner: dfs+pause [...] roaringpenguin.com
Requestors: jonas [...] paranormal.se
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 5.411a
Fixed in: (no value)



Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.405 (Entity 5.404)
Subject: MIME::Words::encode_mimewords strips spaces
X-RT-Original-Encoding: iso-8859-1
Content-Length: 356
Download (untitled) / with headers
text/plain 356b
MIME::Words::encode_mimewords removes the spaces between words in the cases there two words in a row is mime encoded. I'm currently working around this problem by doing: sub encode_mimewords { use MIME::Words; my $string = MIME::Words::encode_mimewords($_[0]); $string =~ s/\?= =\?ISO-8859-1\?Q\?/?= =?ISO-8859-1?Q?_/g; return $string; }
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.411 (Entity 5.404)
From: christian.jaeger-rtcpanorg [...] ethlife.ethz.ch
X-RT-Original-Encoding: iso-8859-1
Content-Length: 661
Download (untitled) / with headers
text/plain 661b
[JONAS - Thu Feb 26 08:18:16 2004]: Show quoted text
> MIME::Words::encode_mimewords removes the spaces between words in the > cases there two words in a row is mime encoded.
.. Show quoted text
> $string =~ s/\?= =\?ISO-8859-1\?Q\?/?= =?ISO-8859-1?Q?_/g;
Hm, I've seen too that two mime encoded words (two encoded tokens, only separated by a space like in ...?= =?...) would be viewed without the space when read in Eudora. But then I've noticed that squirrelmail displays that with a space. So I then concluded that it was a bug in Eudora. I've not read the specs, so can't decide what's correct, but just wanted to let you know about the differing client behaviour. Cheers Christian
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
MIME-Version: 1.0
X-Mailer: MIME-tools 5.411 (Entity 5.404)
From: Olivier Salaun [...] cru.fr
X-RT-Original-Encoding: iso-8859-1
Content-Length: 1042
Mozilla (1.6) interprets encoded subjects the same way. RFC 1522 seem to indicate that spaces between encoded-words are ignored : ftp://ftp.cru.fr/pub/reseau/RFCs/rfc1522.txt, chapter 5 ... an encoded-word that appears in a header field defined as **text** MUST be separated from any adjacent encoded-word or **text** by linear-white-space. Therefore encoding a string like "word1 word2", should look like "=?charset?Q?encoded_word1_?= =?charset?Q?encoded_word2?= Note the '_' at the end of the encoded_word1 is needed to preserve the white-space between words. [guest - Thu Aug 5 20:34:36 2004]: Show quoted text
> Hm, I've seen too that two mime encoded words (two encoded tokens, > only > separated by a space like in ...?= =?...) would be viewed without the > space when read in Eudora. But then I've noticed that squirrelmail > displays that with a space. So I then concluded that it was a bug in > Eudora. I've not read the specs, so can't decide what's correct, but > just wanted to let you know about the differing client behaviour.
MIME-Version: 1.0
X-Mailer: MIME-tools 5.411 (Entity 5.404)
From: Alexey Mahotkin <alexm [...] hsys.msk.ru>
Content-Type: multipart/mixed; boundary="----------=_1098988189-24406-0"
Content-Length: 0
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: iso-8859-1
Content-Length: 472
Download (untitled) / with headers
text/plain 472b
[JONAS - Thu Feb 26 08:18:16 2004]: Show quoted text
> MIME::Words::encode_mimewords removes the spaces between words in the > cases there two words in a row is mime encoded.
Jonas, please try the attached patch. Thank you. Show quoted text
> > I'm currently working around this problem by doing: > > sub encode_mimewords > { > use MIME::Words; > my $string = MIME::Words::encode_mimewords($_[0]); > $string =~ s/\?= =\?ISO-8859-1\?Q\?/?= =?ISO-8859-1?Q?_/g; > return $string; > }
Content-Type: application/octet-stream; name="encode_mimewords.patch"
Content-Disposition: inline; filename="encode_mimewords.patch"
Content-Transfer-Encoding: base64
Content-Length: 6811
? Makefile ? blib ? encode_mimewords.patch ? pm_to_blib ? testout Index: ChangeLog =================================================================== RCS file: /home/cvs/src/perl-MIME-tools/ChangeLog,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -u -r1.1.1.1 -r1.2 --- ChangeLog 2004/10/28 17:12:16 1.1.1.1 +++ ChangeLog 2004/10/28 18:05:22 1.2 @@ -1,3 +1,7 @@ +2004-10-28 Alexey Mahotkin <alexm:eternal-eval.com> + + * Made encode_mimewords fully compliant to RFC1522 + 2004-10-27 David F. Skoll <dfs@roaringpenguin.com> * VERSION 5.415 RELEASED Index: lib/MIME/Words.pm =================================================================== RCS file: /home/cvs/src/perl-MIME-tools/lib/MIME/Words.pm,v retrieving revision 1.1.1.1 retrieving revision 1.4 diff -u -r1.1.1.1 -r1.4 --- lib/MIME/Words.pm 2004/10/28 17:12:16 1.1.1.1 +++ lib/MIME/Words.pm 2004/10/28 18:05:22 1.4 @@ -267,7 +267,7 @@ I<Function.> Given a RAW string, try to find and encode all "unsafe" sequences -of characters: +of characters, according to RFC1522: ### Encode a string with some unsafe "words": $encoded = encode_mimewords("Me and \xABFran\xE7ois\xBB"); @@ -292,13 +292,6 @@ =back -B<Warning:> this is a quick-and-dirty solution, intended for character -sets which overlap ASCII. B<It does not comply with the RFC-1522 -rules regarding the use of encoded words in message headers>. -You may want to roll your own variant, -using C<encoded_mimeword()>, for your application. -I<Thanks to Jan Kasprzak for reminding me about this problem.> - =cut sub encode_mimewords { @@ -306,17 +299,60 @@ my $charset = $params{Charset} || 'ISO-8859-1'; my $encoding = lc($params{Encoding} || 'q'); - ### Encode any "words" with unsafe characters. - ### We limit such words to 18 characters, to guarantee that the - ### worst-case encoding give us no more than 54 + ~10 < 75 characters - my $word; - $rawstr =~ s{([a-zA-Z0-9\x7F-\xFF]{1,18})}{ ### get next "word" - $word = $1; - (($word !~ /[$NONPRINT]/o) - ? $word ### no unsafe chars - : encode_mimeword($word, $encoding, $charset)); ### has unsafe chars - }xeg; - $rawstr; + my $safe_chars = "-+*/=_!A-Za-z0-9"; + my $re = "[$safe_chars]"; + my $nre = "[^$safe_chars]"; + + my $result = ""; + my $current = $rawstr; + + while ($current ne "") { + if ($current =~ s/^(([$safe_chars]|\s)+)//) { + # safe chars (w/spaces) are handled as-is + $result .= $1; + next; + } elsif ($current =~ s/^(([^$safe_chars]|\s)+)//) { + # unsafe chars (w/spaces) are encoded + my $unsafe_chars = $1; + CHUNK75: + while ($unsafe_chars ne "") { + + my $full_len = length($unsafe_chars); + my $len = 1; + my $prev_encoded = ""; + + while ($len <= $full_len) { + # we try to encode next beginning of unsafe string + my $possible = substr $unsafe_chars, 0, $len; + my $encoded = encode_mimeword($possible, $encoding, $charset); + + if (length($encoded) < 75) { + # if it could be encoded in specified maximum length, try + # bigger beginning... + $prev_encoded = $encoded; + } else { + # + # ...otherwise, add encoded chunk which still fits, and + # restart with rest of unsafe string + $result .= $prev_encoded; + $prev_encoded = ""; + substr $unsafe_chars, 0, $len - 1, ""; + next CHUNK75; + } + + # if we have reached the end of the string, add final + # encoded chunk + if ($len == $full_len) { + $result .= $encoded; + last CHUNK75; + } + + $len++; + } + } + } + } + return $result; } 1; @@ -331,10 +367,11 @@ MIME::Base64 and MIME::QuotedPrint. -=head1 AUTHOR +=head1 AUTHORS Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>). David F. Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com +Alexey Mahotkin (alexm:eternal-eval.com) http://eternal-eval.com/ All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. Index: t/Words.t =================================================================== RCS file: /home/cvs/src/perl-MIME-tools/t/Words.t,v retrieving revision 1.1.1.1 retrieving revision 1.4 diff -u -r1.1.1.1 -r1.4 --- t/Words.t 2004/10/28 17:12:16 1.1.1.1 +++ t/Words.t 2004/10/28 18:08:52 1.4 @@ -4,7 +4,7 @@ use ExtUtils::TBone; use MIME::QuotedPrint qw(decode_qp); -use MIME::Words qw(decode_mimewords); +use MIME::Words qw(decode_mimewords encode_mimewords); #------------------------------------------------------------ # BEGIN @@ -12,8 +12,22 @@ # Create checker: my $T = typical ExtUtils::TBone; -$T->begin(10); +# we test each non-empty line in subjects.txt +open WORDS, "<testin/subjects.txt" or die "open: $!"; +my $subjects_count = 0; +while (my $line = <WORDS>) { + next if ($line =~ /^\s*$/); + $subjects_count++; +} +close WORDS; + + +# for each line we do 4 tests: +# whether each line correctly encodes/decodes, twice for each encoding +# whethere encoded chunks are smaller than 75 bytes +$T->begin(10 + $subjects_count * 4); + { local($/) = ''; open WORDS, "<testin/words.txt" or die "open: $!"; @@ -36,6 +50,47 @@ } close WORDS; } + +{ + open WORDS, "<testin/subjects.txt" or die "open: $!"; + while (my $line = <WORDS>) { + chomp $line; + next if ($line =~ /^\s*$/); + + foreach my $encoding (qw(q b)) { + my $encoded = encode_mimewords($line, + Encoding => $encoding, + ); + my $decoded = decode_mimewords($encoded, + Encoding => $encoding, + ); + if ($line eq $decoded) { + # warn "ok: $line\nencoded: $encoded\ndecoded: $decoded\n"; + $T->ok( 1 ); + } else { + + warn "in: $line\nencoded: $encoded\ndecoded: $decoded\n"; + + $T->ok( 0 ); + } + + my $failed_token = ""; + while ($encoded =~ /(=\?[^\?]+\?[bq]\?[^\?]+\?=)/ig) { + if (length($1) > 75) { + $failed_token = $1; + } + } + if ($failed_token ne "") { + warn "failed_token: '$failed_token'"; + $T->ok(0); + } else { + $T->ok(1); + } + } + } + + close WORDS; +} # Done! $T->end; Index: testin/subjects.txt =================================================================== RCS file: subjects.txt diff -N subjects.txt --- /dev/null Wed May 6 00:32:27 1998 +++ /tmp/cvsfOgeLU Thu Oct 28 22:08:59 2004 @@ -0,0 +1,19 @@ +Á +Á +ÁÂ× +ÁÂ×Ç +ÁÂ×ÇÄ +ÍÁÍÁ ÍÙÌÁ ÒÁÍÕ + +a +ab +abc +abcd +hello world + +hello ÒÕÓÓËÉÊ +hello ÒÕÓÓËÉÊ hello + +ÒÕÓÓËÉÊ a ÒÕÓÓËÉÊ b ÒÕÓÓËÉÊ c ÒÕÓÓËÉÊ d ÒÕÓÓËÉÊ e ÒÕÓÓËÉÊ +ËÁÖÄÙÊ ÏÈÏÔÎÉË ÖÅÌÁÅÔ ÚÎÁÔØ, ÇÄÅ ÓÉÄÉÔ ÆÁÚÁÎ ÓßÅÛØ ÅÝ£ ÜÔÉÈ ÍÑÇËÉÈ ÆÒÁÎÃÕÚÓËÉÊ ÂÕÌÏÞÅË, ÄÁ ×ÙÐÅÊ ÞÁÀ +
MIME-Version: 1.0
X-Mailer: MIME-tools 5.411 (Entity 5.404)
From: Alexey Mahotkin <alexm [...] hsys.msk.ru>
Content-Type: multipart/mixed; boundary="----------=_1098988201-24428-0"
Content-Length: 0
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: iso-8859-1
Content-Length: 472
Download (untitled) / with headers
text/plain 472b
[JONAS - Thu Feb 26 08:18:16 2004]: Show quoted text
> MIME::Words::encode_mimewords removes the spaces between words in the > cases there two words in a row is mime encoded.
Jonas, please try the attached patch. Thank you. Show quoted text
> > I'm currently working around this problem by doing: > > sub encode_mimewords > { > use MIME::Words; > my $string = MIME::Words::encode_mimewords($_[0]); > $string =~ s/\?= =\?ISO-8859-1\?Q\?/?= =?ISO-8859-1?Q?_/g; > return $string; > }
Content-Type: application/octet-stream; name="encode_mimewords.patch"
Content-Disposition: inline; filename="encode_mimewords.patch"
Content-Transfer-Encoding: base64
Content-Length: 6811
? Makefile ? blib ? encode_mimewords.patch ? pm_to_blib ? testout Index: ChangeLog =================================================================== RCS file: /home/cvs/src/perl-MIME-tools/ChangeLog,v retrieving revision 1.1.1.1 retrieving revision 1.2 diff -u -r1.1.1.1 -r1.2 --- ChangeLog 2004/10/28 17:12:16 1.1.1.1 +++ ChangeLog 2004/10/28 18:05:22 1.2 @@ -1,3 +1,7 @@ +2004-10-28 Alexey Mahotkin <alexm:eternal-eval.com> + + * Made encode_mimewords fully compliant to RFC1522 + 2004-10-27 David F. Skoll <dfs@roaringpenguin.com> * VERSION 5.415 RELEASED Index: lib/MIME/Words.pm =================================================================== RCS file: /home/cvs/src/perl-MIME-tools/lib/MIME/Words.pm,v retrieving revision 1.1.1.1 retrieving revision 1.4 diff -u -r1.1.1.1 -r1.4 --- lib/MIME/Words.pm 2004/10/28 17:12:16 1.1.1.1 +++ lib/MIME/Words.pm 2004/10/28 18:05:22 1.4 @@ -267,7 +267,7 @@ I<Function.> Given a RAW string, try to find and encode all "unsafe" sequences -of characters: +of characters, according to RFC1522: ### Encode a string with some unsafe "words": $encoded = encode_mimewords("Me and \xABFran\xE7ois\xBB"); @@ -292,13 +292,6 @@ =back -B<Warning:> this is a quick-and-dirty solution, intended for character -sets which overlap ASCII. B<It does not comply with the RFC-1522 -rules regarding the use of encoded words in message headers>. -You may want to roll your own variant, -using C<encoded_mimeword()>, for your application. -I<Thanks to Jan Kasprzak for reminding me about this problem.> - =cut sub encode_mimewords { @@ -306,17 +299,60 @@ my $charset = $params{Charset} || 'ISO-8859-1'; my $encoding = lc($params{Encoding} || 'q'); - ### Encode any "words" with unsafe characters. - ### We limit such words to 18 characters, to guarantee that the - ### worst-case encoding give us no more than 54 + ~10 < 75 characters - my $word; - $rawstr =~ s{([a-zA-Z0-9\x7F-\xFF]{1,18})}{ ### get next "word" - $word = $1; - (($word !~ /[$NONPRINT]/o) - ? $word ### no unsafe chars - : encode_mimeword($word, $encoding, $charset)); ### has unsafe chars - }xeg; - $rawstr; + my $safe_chars = "-+*/=_!A-Za-z0-9"; + my $re = "[$safe_chars]"; + my $nre = "[^$safe_chars]"; + + my $result = ""; + my $current = $rawstr; + + while ($current ne "") { + if ($current =~ s/^(([$safe_chars]|\s)+)//) { + # safe chars (w/spaces) are handled as-is + $result .= $1; + next; + } elsif ($current =~ s/^(([^$safe_chars]|\s)+)//) { + # unsafe chars (w/spaces) are encoded + my $unsafe_chars = $1; + CHUNK75: + while ($unsafe_chars ne "") { + + my $full_len = length($unsafe_chars); + my $len = 1; + my $prev_encoded = ""; + + while ($len <= $full_len) { + # we try to encode next beginning of unsafe string + my $possible = substr $unsafe_chars, 0, $len; + my $encoded = encode_mimeword($possible, $encoding, $charset); + + if (length($encoded) < 75) { + # if it could be encoded in specified maximum length, try + # bigger beginning... + $prev_encoded = $encoded; + } else { + # + # ...otherwise, add encoded chunk which still fits, and + # restart with rest of unsafe string + $result .= $prev_encoded; + $prev_encoded = ""; + substr $unsafe_chars, 0, $len - 1, ""; + next CHUNK75; + } + + # if we have reached the end of the string, add final + # encoded chunk + if ($len == $full_len) { + $result .= $encoded; + last CHUNK75; + } + + $len++; + } + } + } + } + return $result; } 1; @@ -331,10 +367,11 @@ MIME::Base64 and MIME::QuotedPrint. -=head1 AUTHOR +=head1 AUTHORS Eryq (F<eryq@zeegee.com>), ZeeGee Software Inc (F<http://www.zeegee.com>). David F. Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com +Alexey Mahotkin (alexm:eternal-eval.com) http://eternal-eval.com/ All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. Index: t/Words.t =================================================================== RCS file: /home/cvs/src/perl-MIME-tools/t/Words.t,v retrieving revision 1.1.1.1 retrieving revision 1.4 diff -u -r1.1.1.1 -r1.4 --- t/Words.t 2004/10/28 17:12:16 1.1.1.1 +++ t/Words.t 2004/10/28 18:08:52 1.4 @@ -4,7 +4,7 @@ use ExtUtils::TBone; use MIME::QuotedPrint qw(decode_qp); -use MIME::Words qw(decode_mimewords); +use MIME::Words qw(decode_mimewords encode_mimewords); #------------------------------------------------------------ # BEGIN @@ -12,8 +12,22 @@ # Create checker: my $T = typical ExtUtils::TBone; -$T->begin(10); +# we test each non-empty line in subjects.txt +open WORDS, "<testin/subjects.txt" or die "open: $!"; +my $subjects_count = 0; +while (my $line = <WORDS>) { + next if ($line =~ /^\s*$/); + $subjects_count++; +} +close WORDS; + + +# for each line we do 4 tests: +# whether each line correctly encodes/decodes, twice for each encoding +# whethere encoded chunks are smaller than 75 bytes +$T->begin(10 + $subjects_count * 4); + { local($/) = ''; open WORDS, "<testin/words.txt" or die "open: $!"; @@ -36,6 +50,47 @@ } close WORDS; } + +{ + open WORDS, "<testin/subjects.txt" or die "open: $!"; + while (my $line = <WORDS>) { + chomp $line; + next if ($line =~ /^\s*$/); + + foreach my $encoding (qw(q b)) { + my $encoded = encode_mimewords($line, + Encoding => $encoding, + ); + my $decoded = decode_mimewords($encoded, + Encoding => $encoding, + ); + if ($line eq $decoded) { + # warn "ok: $line\nencoded: $encoded\ndecoded: $decoded\n"; + $T->ok( 1 ); + } else { + + warn "in: $line\nencoded: $encoded\ndecoded: $decoded\n"; + + $T->ok( 0 ); + } + + my $failed_token = ""; + while ($encoded =~ /(=\?[^\?]+\?[bq]\?[^\?]+\?=)/ig) { + if (length($1) > 75) { + $failed_token = $1; + } + } + if ($failed_token ne "") { + warn "failed_token: '$failed_token'"; + $T->ok(0); + } else { + $T->ok(1); + } + } + } + + close WORDS; +} # Done! $T->end; Index: testin/subjects.txt =================================================================== RCS file: subjects.txt diff -N subjects.txt --- /dev/null Wed May 6 00:32:27 1998 +++ /tmp/cvsfOgeLU Thu Oct 28 22:08:59 2004 @@ -0,0 +1,19 @@ +Á +Á +ÁÂ× +ÁÂ×Ç +ÁÂ×ÇÄ +ÍÁÍÁ ÍÙÌÁ ÒÁÍÕ + +a +ab +abc +abcd +hello world + +hello ÒÕÓÓËÉÊ +hello ÒÕÓÓËÉÊ hello + +ÒÕÓÓËÉÊ a ÒÕÓÓËÉÊ b ÒÕÓÓËÉÊ c ÒÕÓÓËÉÊ d ÒÕÓÓËÉÊ e ÒÕÓÓËÉÊ +ËÁÖÄÙÊ ÏÈÏÔÎÉË ÖÅÌÁÅÔ ÚÎÁÔØ, ÇÄÅ ÓÉÄÉÔ ÆÁÚÁÎ ÓßÅÛØ ÅÝ£ ÜÔÉÈ ÍÑÇËÉÈ ÆÒÁÎÃÕÚÓËÉÊ ÂÕÌÏÞÅË, ÄÁ ×ÙÐÅÊ ÞÁÀ +
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Message-Id: <rt-3.6.HEAD-12193-1159345133-1290.5462-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
From: os [...] cru.fr
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 672
Download (untitled) / with headers
text/plain 672b
We are using MIME-tools and MIME::Words in Sympa software (http://www.sympa.org). We were also reported these problem with spaces stripped because Sympa needs to decode and then re-encode Subject mail header fields. This process leads to spaces removed. Here is a short script that demonstrates the problem : #!/usr/bin/perl use MIME::Words qw(:all); $s1 = 'hé hé'; $s2 = encode_mimewords($s1, ('Encode' => 'Q', 'Charset' => 'iso-8859-1')); $s3 = decode_mimewords($s2); printf "S1: %s\nS2: %s\nS3: %s\n", $s1, $s2, $s3; ## here is the output : S1: hé hé /S2: =?ISO-8859-1?Q?h=E9?= =?ISO-8859-1?Q?h=E9?= S3: héhé We look forward to get a patch for this problem.
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Message-Id: <rt-3.6.HEAD-18467-1164298587-700.5462-0-0 [...] rt.cpan.org>
Content-Type: multipart/mixed; boundary="----------=_1164298587-18467-16"
X-RT-Original-Encoding: utf-8
Content-Length: 0
Content-Disposition: inline
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 498
Download (untitled) / with headers
text/plain 498b
Hi, we stumbled over this bug today, too. I've prepared a patch for this problem myself, because I impulsive enough to not look into rt.cpan.org prior to solving this problem on my own. The attached patch also fixes the problem of quoted-printable-encoded strings still containing spaces, which is invalid according to RFC1521 and RFC2047. Since this problem appears to be quite old and the fix is simple, I'd apprechiate it if you could pick a solution and apply it. Thank you :) Regards, -octo
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Type: multipart/mixed; boundary="----------=_1164298587-18467-15"
Content-Length: 0
Content-Type: text/plain; charset="utf8"
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 0
Content-Type: text/x-patch; name="encode_mimewords-octo.patch"
Content-Disposition: inline; filename="encode_mimewords-octo.patch"
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: ascii
X-RT-Original-Encoding: ascii
Content-Length: 2171
diff -pur a/lib/MIME/Words.pm b/lib/MIME/Words.pm --- a/lib/MIME/Words.pm 2006-03-17 22:03:23.000000000 +0100 +++ b/lib/MIME/Words.pm 2006-11-23 16:28:21.000000000 +0100 @@ -117,7 +117,8 @@ sub _decode_Q { # almost, but not exactly, quoted-printable. :-P sub _encode_Q { my $str = shift; - $str =~ s{([_\?\=$NONPRINT])}{sprintf("=%02X", ord($1))}eog; + $str =~ s{([_\?\=\s$NONPRINT])}{sprintf("=%02X", ord($1))}eog; + $str =~ s/=20/_/g; $str; } @@ -306,17 +307,41 @@ sub encode_mimewords { my $charset = $params{Charset} || 'ISO-8859-1'; my $encoding = lc($params{Encoding} || 'q'); - ### Encode any "words" with unsafe characters. - ### We limit such words to 18 characters, to guarantee that the - ### worst-case encoding give us no more than 54 + ~10 < 75 characters - my $word; - $rawstr =~ s{([a-zA-Z0-9\x7F-\xFF]{1,18})}{ ### get next "word" - $word = $1; - (($word !~ /[$NONPRINT]/o) - ? $word ### no unsafe chars - : encode_mimeword($word, $encoding, $charset)); ### has unsafe chars - }xeg; - $rawstr; + my $return = ''; + + while ($rawstr =~ m! + ^([^$NONPRINT]*\s+)? # Words that don't need quoting + ( + \S*[$NONPRINT]\S* # Word that needs quoting + (?:\s+\S*[$NONPRINT]\S*)* # More words that need quoting + ) + (.*)$ # Rest of the string, to get around using $'. + !x) # look, no /g modifier! + { + my $match = $2; + + $return .= $1; + $rawstr = $3; + + while ($match) + { + my $i = length ($match); + $i = 68 if ($i > 68); + + # While there is no limit to the length of a multiple-line + # header field, each line of a header field that contains + # one or more 'encoded-word's is limited to 76 characters. + # -- RFC2047 + while (length (encode_mimeword (substr ($match, 0, $i))) > 74) + { + $i--; + } + $return .= encode_mimeword (substr ($match, 0, $i)); + $match = substr ($match, $i); + if ($match) { $return .= ' '; } + } + } + return ($return); } 1;
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-18467-1164298587-700.5462-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Charset: utf8
References: <rt-3.6.HEAD-18467-1164298587-700.5462-0-0 [...] rt.cpan.org>
Message-Id: <rt-3.6.HEAD-27055-1197330395-1224.5462-0-0 [...] rt.cpan.org>
Content-Type: text/plain
Content-Transfer-Encoding: binary
From: martini [...] cpan.org
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 183
Download (untitled) / with headers
text/plain 183b
Hi, I also got this problem! :-((( Anyway, there is a other 2 line fix for that: http://bugs.otrs.org/show_bug.cgi?id=1428#c4 Please fix it in further releases!!! Thx, -Martin
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-27055-1197330395-1224.5462-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Charset: utf8
References: <rt-3.6.HEAD-18467-1164298587-700.5462-0-0 [...] rt.cpan.org> <rt-3.6.HEAD-27055-1197330395-1224.5462-0-0 [...] rt.cpan.org>
Message-Id: <rt-3.6.HEAD-27034-1197331475-445.5462-0-0 [...] rt.cpan.org>
Content-Type: text/plain
Content-Transfer-Encoding: binary
From: martini [...] cpan.org
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 1206
Download (untitled) / with headers
text/plain 1.1k
Just the patch for the Words.pm: ----------------------------------------------------- ----------------------------------------------------- --- Words.pm Fri Aug 10 12:29:35 2007 +++ Words.pm Fri Aug 10 13:45:57 2007 @@ -117,7 +117,7 @@ # almost, but not exactly, quoted-printable. :-P sub _encode_Q { my $str = shift; - $str =~ s{([_\?\=$NONPRINT])}{sprintf("=%02X", ord($1))}eog; + $str =~ s{([ _\?\=$NONPRINT])}{sprintf("=%02X", ord($1))}eog; $str; } @@ -310,7 +310,7 @@ ### We limit such words to 18 characters, to guarantee that the ### worst-case encoding give us no more than 54 + ~10 < 75 characters my $word; - $rawstr =~ s{([a-zA-Z0-9\x7F-\xFF]{1,18})}{ ### get next "word" + $rawstr =~ s{([a-zA-Z0-9\x7F-\xFF]+\s*)}{ ### get next "word" $word = $1; (($word !~ /[$NONPRINT]/o) ? $word ### no unsafe chars On Mon Dec 10 18:46:36 2007, martini2 wrote: Show quoted text
> Hi, > > I also got this problem! :-((( > > Anyway, there is a other 2 line fix for that: > > http://bugs.otrs.org/show_bug.cgi?id=1428#c4 > > > Please fix it in further releases!!! > > Thx, > > -Martin
X-Scanned-BY: CanIt (www . roaringpenguin . com) on 205.150.199.215
MIME-Version: 1.0
X-Canit-Stats-Id: Bayes signature not available
X-Spam-Status: No, hits=-2.6 required=8.0 tests=BAYES_00,SPF_PASS
In-Reply-To: <rt-3.6.HEAD-27055-1197330395-1224.5462-5-0 [...] rt.cpan.org>
Content-Disposition: inline
References: <RT-Ticket-5462 [...] rt.cpan.org> <rt-3.6.HEAD-18467-1164298587-700.5462-5-0 [...] rt.cpan.org> <rt-3.6.HEAD-27055-1197330395-1224.5462-5-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="utf-8"
X-RT-Original-Encoding: us-ascii
X-Spam-Score: undef - relay 209.217.122.203 marked with SkipSpamScan
Received: from x1.develooper.com (x1.develooper.com [63.251.223.170]) by diesel.bestpractical.com (Postfix) with SMTP id 5261A4D81DE for <bug-MIME-tools [...] rt.cpan.org>; Tue, 11 Dec 2007 10:16:43 -0500 (EST)
Received: (qmail 31944 invoked from network); 11 Dec 2007 15:16:42 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 11 Dec 2007 15:16:42 -0000
Received: from colo.dmo.ca (HELO colo.dmo.ca) (205.150.199.215) by 16.mx.develooper.com (qpsmtpd/0.40-dev) with ESMTP; Tue, 11 Dec 2007 07:16:26 -0800
Received: from macallan.i.dmo.ca (home.dmo.ca [209.217.122.203]) by colo.dmo.ca (8.13.4/8.13.4/Debian-3sarge3) with ESMTP id lBBFGMwg003695 for <bug-MIME-tools [...] rt.cpan.org>; Tue, 11 Dec 2007 10:16:22 -0500
Received: from macallan.i.dmo.ca (localhost.localdomain [127.0.0.1]) by macallan.i.dmo.ca (8.13.8/8.13.8/Debian-3) with ESMTP id lBBFGL1Z026242 for <bug-MIME-tools [...] rt.cpan.org>; Tue, 11 Dec 2007 10:16:21 -0500
Received: (from dmo [...] localhost) by macallan.i.dmo.ca (8.13.8/8.13.8/Submit) id lBBFGLEu026236 for bug-MIME-tools [...] rt.cpan.org; Tue, 11 Dec 2007 10:16:21 -0500
Delivered-To: cpan-bug+MIME-tools [...] diesel.bestpractical.com
Subject: Re: [rt.cpan.org #5462] MIME::Words::encode_mimewords strips spaces
User-Agent: Mutt/1.5.13 (2006-08-11)
Return-Path: <dmo [...] dmo.ca>
X-Spam-Check-BY: 16.mx.develooper.com
X-Original-To: bug-MIME-tools [...] rt.cpan.org
Date: Tue, 11 Dec 2007 10:16:21 -0500
X-Spam-Level: *
Message-Id: <20071211151621.GB21028 [...] macallan.i.dmo.ca>
X-Canitpro-Stream: base:default
To: Martin Edenhofer via RT <bug-MIME-tools [...] rt.cpan.org>
From: "Dave O'Neill" <dmo [...] dmo.ca>
X-RT-Original-Encoding: utf-8
RT-Message-ID: <rt-3.6.HEAD-27057-1197386222-901.5462-0-0 [...] rt.cpan.org>
Content-Length: 184
Download (untitled) / with headers
text/plain 184b
On Mon, Dec 10, 2007 at 06:46:38PM -0500, Martin Edenhofer via RT wrote: Show quoted text
> > Please fix it in further releases!!! >
Thanks for the patch. It will appear in the next release. Dave
MIME-Version: 1.0
X-Mailer: MIME-tools 5.418 (Entity 5.418)
Content-Disposition: inline
Charset: utf8
Message-Id: <rt-3.6.HEAD-23866-1205873115-644.5462-0-0 [...] rt.cpan.org>
Content-Type: text/plain
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
X-RT-Original-Encoding: utf-8
Content-Length: 17
Patch is in 5.426
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-23866-1205873115-644.5462-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.426 (Entity 5.426)
Content-Disposition: inline
Charset: utf8
References: <rt-3.6.HEAD-23866-1205873115-644.5462-0-0 [...] rt.cpan.org>
Message-Id: <rt-3.6.HEAD-21098-1217682009-275.5462-0-0 [...] rt.cpan.org>
Content-Type: text/plain
Content-Transfer-Encoding: binary
From: me+bitcard [...] bogen.net
X-RT-Original-Encoding: utf-8
Content-Length: 484
Download (untitled) / with headers
text/plain 484b
On Tue Mar 18 16:45:15 2008, DONEILL wrote: Show quoted text
> Patch is in 5.426
JFI, Patch is not in v5.427. :( By applying this patch to v5.427 it's working fine again. Just create an utf8 mail by using MIME::Tools with subject "это специальныйсабжект для теста системы тикетов" -=> The generated subject is broken an not readably. See also http://bugs.otrs.org/show_bug.cgi?id=3121 for more information. Feel free for further questions. -Martin
MIME-Version: 1.0
In-Reply-To: <rt-3.6.HEAD-21098-1217682009-275.5462-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.426 (Entity 5.426)
Content-Disposition: inline
Charset: utf8
References: <rt-3.6.HEAD-23866-1205873115-644.5462-0-0 [...] rt.cpan.org> <rt-3.6.HEAD-21098-1217682009-275.5462-0-0 [...] rt.cpan.org>
Message-Id: <rt-3.6.HEAD-11536-1221500383-260.5462-0-0 [...] rt.cpan.org>
Content-Type: text/plain
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 88
Can you provide a short testcase that triggers the problem you're seeing? Cheers, Dave
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-6782-1272475756-1176.5462-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 36
Appears to have been fixed long ago.
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-6782-1272475756-1176.5462-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <rt-3.8.HEAD-6782-1272475756-1176.5462-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-7766-1359472247-624.5462-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
From: mg.pub [...] gmx.net
X-RT-Original-Encoding: utf-8
Content-Length: 1850
Download (untitled) / with headers
text/plain 1.8k
Am Mi 28. Apr 2010, 13:29:16, DONEILL schrieb: Show quoted text
> Appears to have been fixed long ago.
This bug is still open. Here is a small snippet to reproduce the problem: use MIME::Words; use MIME::WordDecoder; use Encode; my $String = "Служба поддержки"; my $Encoded = MIME::Words::encode_mimewords(Encode::encode('utf-8', $String,), Charset => 'utf-8'); my $Decoded = MIME::WordDecoder::mime_to_perl_string($Encoded); print "$String, $Encoded, $Decoded, " . ($String eq $Decoded ? 'equal' : 'not equal') . "\n"; With the current version 5.503 this will print: Служба поддержки, =?UTF-8?Q? =D0=A1=D0=BB=D1=83=D0=B6=D0=B1=D0=B0=20=D0=BF=D0=BE=D0?= =?UTF-8?Q? =B4=D0=B4=D0=B5=D1=80=D0=B6=D0=BA=D0=B8?=, Служба по\xD0\xB4держки, not equal We worked around this problem as follows: sub encode_mimewords { my ($rawstr, %params) = @_; my $charset = $params{Charset} || 'ISO-8859-1'; my $encoding = lc($params{Encoding} || 'q'); ### Encode any "words" with unsafe characters. ### We limit such words to 18 characters, to guarantee that the ### worst-case encoding give us no more than 54 + ~10 < 75 characters my $word; local $1; # --- # OTRS # --- # 2008-08-02 added patch/workaround for bug in MIME::Words (v5.428, maybe # also higner) # see also: http://rt.cpan.org/Public/Bug/Display.html?id=5462 # http://bugs.otrs.org/show_bug.cgi?id=3121 # $rawstr =~ s{([a-zA-Z0-9\x7F-\xFF]+\s*)}{ ### get next "word" # --- $rawstr =~ s{([ a-zA-Z0-9\x7F-\xFF]{1,18})}{ ### get next "word" $word = $1; (($word !~ /(?:[$NONPRINT])|(?:^\s+$)/o) ? $word ### no unsafe chars : encode_mimeword($word, $encoding, $charset)); ### has unsafe chars }xeg; $rawstr =~ s/\?==\?/?= =?/g; $rawstr; }
MIME-Version: 1.0
In-Reply-To: <rt-3.8.HEAD-7766-1359472247-624.5462-0-0 [...] rt.cpan.org>
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
References: <rt-3.8.HEAD-6782-1272475756-1176.5462-0-0 [...] rt.cpan.org> <rt-3.8.HEAD-7766-1359472247-624.5462-0-0 [...] rt.cpan.org>
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-18700-1359472364-1122.5462-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
From: mg.pub [...] gmx.net
X-RT-Original-Encoding: utf-8
Content-Length: 90
(Activate the line in the commented part instead of the original line, and it works fine.)
MIME-Version: 1.0
X-Canit-Geo: No geolocation information available for 192.168.10.23
X-Canit-Archived-As: base/20130130 / 01ISknBpx
X-Spam-Flag: NO
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Content-Type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -6.92
Authentication-Results: hipster.bestpractical.com (amavisd-new); dkim=pass header.i= [...] roaringpenguin.com
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id 8107624053F for <cpan-bug+MIME-tools [...] hipster.bestpractical.com>; Wed, 30 Jan 2013 15:23:47 -0500 (EST)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UCejcyP7Hr9c for <cpan-bug+MIME-tools [...] hipster.bestpractical.com>; Wed, 30 Jan 2013 15:23:46 -0500 (EST)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id AF0AA24044C for <bug-MIME-tools [...] rt.cpan.org>; Wed, 30 Jan 2013 15:23:45 -0500 (EST)
Received: (qmail 17428 invoked by uid 103); 30 Jan 2013 20:23:44 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 30 Jan 2013 20:23:44 -0000
Received: from roaringpenguin.com (HELO colo3.roaringpenguin.com) (70.38.112.54) by 16.mx.develooper.com (qpsmtpd/0.84/v0.84-167-g4ed6cab) with ESMTP; Wed, 30 Jan 2013 12:23:41 -0800
Received: from vanadium.roaringpenguin.com ([192.168.10.23]) by colo3.roaringpenguin.com (8.14.3/8.14.3/Debian-9.4) with ESMTP id r0UKNb6x010310 for <bug-MIME-tools [...] rt.cpan.org>; Wed, 30 Jan 2013 15:23:37 -0500
Received: from hydrogen.roaringpenguin.com (dfs [...] hydrogen.roaringpenguin.com [192.168.10.1]) by vanadium.roaringpenguin.com (8.14.3/8.14.3/Debian-9.4) with ESMTP id r0UKNbbX020959 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT) for <bug-MIME-tools [...] rt.cpan.org>; Wed, 30 Jan 2013 15:23:37 -0500
Delivered-To: cpan-bug+MIME-tools [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #5462] MIME::Words::encode_mimewords strips spaces
X-Spam-Check-BY: 16.mx.develooper.com
Dkim-Signature: v=1; a=rsa-sha1; c=relaxed; d=roaringpenguin.com; h=date :from:to:subject:message-id:in-reply-to:references:mime-version :content-type:content-transfer-encoding; s=main; bh=KLrIX3vLGYBg ZaxtoK8sw77Zg9U=; b=EBIXCZFFrh49KihL7eMQ7aXDMqXBf09uJilKOqyqMFUM 3eSEMMWHdzXrg/QFQC7V4O20HutxPeBqVeYicyR7Wp4rdCbXUJCho++5C0JzLjnf WTslI84Li9JkkAtfzH5cmQBLhiBKkdKA9F5YblS0vc4NY6epO0H64hTo4vslW0YK DF2lRQUgecauJaS8P4AV/K3IPhuPuoDU6rHP2G79UQZyIREUTosat9k3RsaELFN7 aWn+E2RkIfPDxxAwaz2PwaqlSLA2XOVDYxdlCVQ+uK5V9CN9CXyRezWgXwk284AN SCmIye8u28wlJelZJV4Tbig+7X8dXu5lAyvSXVip1A==
Date: Wed, 30 Jan 2013 15:23:36 -0500
X-Spam-Level:
To: bug-MIME-tools [...] rt.cpan.org
Content-Transfer-Encoding: 7bit
X-Scanned-BY: CanIt (www . roaringpenguin . com)
X-Scanned-BY: MIMEDefang 2.73 on 192.168.10.23
From dfs [...] roaringpenguin.com Wed Jan 30 15: 23:47 2013
In-Reply-To: <rt-3.8.HEAD-18700-1359472364-754.5462-5-0 [...] rt.cpan.org>
X-Spam-Status: No, score=-6.92 tagged_above=-99.9 required=10 tests=[AWL=0.079, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_FAIL=0.001] autolearn=ham
X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; x86_64-pc-linux-gnu)
X-Canit-Archive-Cluster: SQVyZJxqklY5buiWXYCN4T/BjiM
References: <RT-Ticket-5462 [...] rt.cpan.org> <rt-3.8.HEAD-6782-1272475756-1176.5462-5-0 [...] rt.cpan.org> <rt-3.8.HEAD-7766-1359472247-624.5462-5-0 [...] rt.cpan.org> <rt-3.8.HEAD-18700-1359472364-754.5462-5-0 [...] rt.cpan.org>
Message-ID: <20130130152336.6ed8f84f [...] hydrogen.roaringpenguin.com>
Organization: Roaring Penguin Software Inc.
Return-Path: <dfs [...] roaringpenguin.com>
X-RT-Mail-Extension: mime-tools
X-Original-To: cpan-bug+MIME-tools [...] hipster.bestpractical.com
X-Canitpro-Stream: outgoing (inherits from default)
From: "David F. Skoll" <dfs [...] roaringpenguin.com>
RT-Message-ID: <rt-3.8.HEAD-12834-1359577428-1495.5462-0-0 [...] rt.cpan.org>
Content-Length: 283
Download (untitled) / with headers
text/plain 283b
On Tue, 29 Jan 2013 10:12:44 -0500 " via RT" <bug-MIME-tools@rt.cpan.org> wrote: Show quoted text
> (Activate the line in the commented part instead of the original > line, and it works fine.)
Thanks. I have applied your patch and it will be in the next release of MIME::tools. Regards, David.
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-14579-1359580046-1225.5462-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 130
Download (untitled) / with headers
text/plain 130b
Hi, I've uploaded MIME-tools 5.504 to CPAN. It should appear soon in the module index, and it fixes this bug. Regards, David.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.