Skip Menu |
 

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 18105
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: jgmyers [...] proofpoint.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 2.14
Fixed in: (no value)



Subject: UTF-8 decodes illegal (non)character U+FFFE
Download (untitled) / with headers
text/plain 291b
No input should cause the UTF-8 decoder to produce illegal characters, any such should be replaced with U+FFFD. The attached script generates the output and warning: fffe Unicode character 0xfffe is illegal at utf8-nonchar.pl line 11. It should instead produce: fffd and no warning.
Subject: utf8-nonchar.pl
Download utf8-nonchar.pl
text/x-perl 191b
use Encode; use strict; use warnings; my $text = "aaa\xef\xbf\xbebbb"; my $utf = Encode::decode('UTF-8', $text, 0); printf "%x\n", ord(substr($utf, 3, 1)); $utf =~ /\b(?:https?|ftp)/o;
Download (untitled) / with headers
text/plain 127b
Same thing goes for U+1FFFE, F0 9F BF BE. Presumably all of the xFFFE up to U+10FFFE are affected, but I haven't tested that.
From: jgmyers [...] proofpoint.com
Download (untitled) / with headers
text/plain 154b
Also affects U+1FFFF, U+2FFFF, on up to U+10FFFF. Also affects U+FDD0 through U+FDEF. Filed perl #38722 on the underlying bug in Perl_utf8n_to_uvuni().
From: jgmyers [...] proofpoint.com
Proposed fix.
Only in Encode-2.12-1utf8nonchar/: blib Only in Encode-2.12-1utf8nonchar/Byte: Byte.bs Only in Encode-2.12-1utf8nonchar/Byte: Byte.c Only in Encode-2.12-1utf8nonchar/Byte: Byte.o Only in Encode-2.12-1utf8nonchar/Byte: byte_t.c Only in Encode-2.12-1utf8nonchar/Byte: byte_t.exh Only in Encode-2.12-1utf8nonchar/Byte: byte_t.fnm Only in Encode-2.12-1utf8nonchar/Byte: byte_t.h Only in Encode-2.12-1utf8nonchar/Byte: byte_t.o Only in Encode-2.12-1utf8nonchar/Byte: Byte.xs Only in Encode-2.12-1utf8nonchar/Byte: Makefile Only in Encode-2.12-1utf8nonchar/Byte: pm_to_blib Only in Encode-2.12-1utf8nonchar/CN: CN.bs Only in Encode-2.12-1utf8nonchar/CN: CN.c Only in Encode-2.12-1utf8nonchar/CN: CN.o Only in Encode-2.12-1utf8nonchar/CN: CN.xs Only in Encode-2.12-1utf8nonchar/CN: cp_00_t.c Only in Encode-2.12-1utf8nonchar/CN: cp_00_t.exh Only in Encode-2.12-1utf8nonchar/CN: cp_00_t.fnm Only in Encode-2.12-1utf8nonchar/CN: cp_00_t.h Only in Encode-2.12-1utf8nonchar/CN: cp_00_t.o Only in Encode-2.12-1utf8nonchar/CN: eu_01_t.c Only in Encode-2.12-1utf8nonchar/CN: eu_01_t.exh Only in Encode-2.12-1utf8nonchar/CN: eu_01_t.fnm Only in Encode-2.12-1utf8nonchar/CN: eu_01_t.h Only in Encode-2.12-1utf8nonchar/CN: eu_01_t.o Only in Encode-2.12-1utf8nonchar/CN: gb_02_t.c Only in Encode-2.12-1utf8nonchar/CN: gb_02_t.exh Only in Encode-2.12-1utf8nonchar/CN: gb_02_t.fnm Only in Encode-2.12-1utf8nonchar/CN: gb_02_t.h Only in Encode-2.12-1utf8nonchar/CN: gb_02_t.o Only in Encode-2.12-1utf8nonchar/CN: gb_03_t.c Only in Encode-2.12-1utf8nonchar/CN: gb_03_t.exh Only in Encode-2.12-1utf8nonchar/CN: gb_03_t.fnm Only in Encode-2.12-1utf8nonchar/CN: gb_03_t.h Only in Encode-2.12-1utf8nonchar/CN: gb_03_t.o Only in Encode-2.12-1utf8nonchar/CN: ir_04_t.c Only in Encode-2.12-1utf8nonchar/CN: ir_04_t.exh Only in Encode-2.12-1utf8nonchar/CN: ir_04_t.fnm Only in Encode-2.12-1utf8nonchar/CN: ir_04_t.h Only in Encode-2.12-1utf8nonchar/CN: ir_04_t.o Only in Encode-2.12-1utf8nonchar/CN: ma_05_t.c Only in Encode-2.12-1utf8nonchar/CN: ma_05_t.exh Only in Encode-2.12-1utf8nonchar/CN: ma_05_t.fnm Only in Encode-2.12-1utf8nonchar/CN: ma_05_t.h Only in Encode-2.12-1utf8nonchar/CN: ma_05_t.o Only in Encode-2.12-1utf8nonchar/CN: Makefile Only in Encode-2.12-1utf8nonchar/CN: pm_to_blib Only in Encode-2.12-1utf8nonchar/: def_t.c Only in Encode-2.12-1utf8nonchar/: def_t.exh Only in Encode-2.12-1utf8nonchar/: def_t.fnm Only in Encode-2.12-1utf8nonchar/: def_t.h Only in Encode-2.12-1utf8nonchar/: def_t.o Only in Encode-2.12-1utf8nonchar/EBCDIC: EBCDIC.bs Only in Encode-2.12-1utf8nonchar/EBCDIC: EBCDIC.c Only in Encode-2.12-1utf8nonchar/EBCDIC: EBCDIC.o Only in Encode-2.12-1utf8nonchar/EBCDIC: ebcdic_t.c Only in Encode-2.12-1utf8nonchar/EBCDIC: ebcdic_t.exh Only in Encode-2.12-1utf8nonchar/EBCDIC: ebcdic_t.fnm Only in Encode-2.12-1utf8nonchar/EBCDIC: ebcdic_t.h Only in Encode-2.12-1utf8nonchar/EBCDIC: ebcdic_t.o Only in Encode-2.12-1utf8nonchar/EBCDIC: EBCDIC.xs Only in Encode-2.12-1utf8nonchar/EBCDIC: Makefile Only in Encode-2.12-1utf8nonchar/EBCDIC: pm_to_blib Only in Encode-2.12-1utf8nonchar/: encengine.o Only in Encode-2.12-1utf8nonchar/: Encode.bs Only in Encode-2.12-1utf8nonchar/: Encode.c Only in Encode-2.12-1utf8nonchar/: Encode.o diff -ru Encode-2.12-0orig/Encode.xs Encode-2.12-1utf8nonchar/Encode.xs --- Encode-2.12-0orig/Encode.xs 2006-03-13 10:09:45.000000000 -0800 +++ Encode-2.12-1utf8nonchar/Encode.xs 2006-03-13 11:19:59.000000000 -0800 @@ -335,6 +335,10 @@ if (strict && uv > PERL_UNICODE_MAX) ulen = -1; #endif + /* Work around perl #38722 */ + if (strict && ((uv & 0xFFFE) == 0xFFFE || + (uv >= 0xFDD0 && uv <= 0xFDEF))) + ulen = -1; if (ulen == -1) { if (strict) { uv = utf8n_to_uvuni(s, e - s, &ulen, Only in Encode-2.12-1utf8nonchar/: Encode.xs~ Only in Encode-2.12-1utf8nonchar/JP: cp_00_t.c Only in Encode-2.12-1utf8nonchar/JP: cp_00_t.exh Only in Encode-2.12-1utf8nonchar/JP: cp_00_t.fnm Only in Encode-2.12-1utf8nonchar/JP: cp_00_t.h Only in Encode-2.12-1utf8nonchar/JP: cp_00_t.o Only in Encode-2.12-1utf8nonchar/JP: eu_01_t.c Only in Encode-2.12-1utf8nonchar/JP: eu_01_t.exh Only in Encode-2.12-1utf8nonchar/JP: eu_01_t.fnm Only in Encode-2.12-1utf8nonchar/JP: eu_01_t.h Only in Encode-2.12-1utf8nonchar/JP: eu_01_t.o Only in Encode-2.12-1utf8nonchar/JP: ji_02_t.c Only in Encode-2.12-1utf8nonchar/JP: ji_02_t.exh Only in Encode-2.12-1utf8nonchar/JP: ji_02_t.fnm Only in Encode-2.12-1utf8nonchar/JP: ji_02_t.h Only in Encode-2.12-1utf8nonchar/JP: ji_02_t.o Only in Encode-2.12-1utf8nonchar/JP: ji_03_t.c Only in Encode-2.12-1utf8nonchar/JP: ji_03_t.exh Only in Encode-2.12-1utf8nonchar/JP: ji_03_t.fnm Only in Encode-2.12-1utf8nonchar/JP: ji_03_t.h Only in Encode-2.12-1utf8nonchar/JP: ji_03_t.o Only in Encode-2.12-1utf8nonchar/JP: ji_04_t.c Only in Encode-2.12-1utf8nonchar/JP: ji_04_t.exh Only in Encode-2.12-1utf8nonchar/JP: ji_04_t.fnm Only in Encode-2.12-1utf8nonchar/JP: ji_04_t.h Only in Encode-2.12-1utf8nonchar/JP: ji_04_t.o Only in Encode-2.12-1utf8nonchar/JP: JP.bs Only in Encode-2.12-1utf8nonchar/JP: JP.c Only in Encode-2.12-1utf8nonchar/JP: JP.o Only in Encode-2.12-1utf8nonchar/JP: JP.xs Only in Encode-2.12-1utf8nonchar/JP: ma_05_t.c Only in Encode-2.12-1utf8nonchar/JP: ma_05_t.exh Only in Encode-2.12-1utf8nonchar/JP: ma_05_t.fnm Only in Encode-2.12-1utf8nonchar/JP: ma_05_t.h Only in Encode-2.12-1utf8nonchar/JP: ma_05_t.o Only in Encode-2.12-1utf8nonchar/JP: Makefile Only in Encode-2.12-1utf8nonchar/JP: pm_to_blib Only in Encode-2.12-1utf8nonchar/JP: sh_06_t.c Only in Encode-2.12-1utf8nonchar/JP: sh_06_t.exh Only in Encode-2.12-1utf8nonchar/JP: sh_06_t.fnm Only in Encode-2.12-1utf8nonchar/JP: sh_06_t.h Only in Encode-2.12-1utf8nonchar/JP: sh_06_t.o Only in Encode-2.12-1utf8nonchar/KR: cp_00_t.c Only in Encode-2.12-1utf8nonchar/KR: cp_00_t.exh Only in Encode-2.12-1utf8nonchar/KR: cp_00_t.fnm Only in Encode-2.12-1utf8nonchar/KR: cp_00_t.h Only in Encode-2.12-1utf8nonchar/KR: cp_00_t.o Only in Encode-2.12-1utf8nonchar/KR: eu_01_t.c Only in Encode-2.12-1utf8nonchar/KR: eu_01_t.exh Only in Encode-2.12-1utf8nonchar/KR: eu_01_t.fnm Only in Encode-2.12-1utf8nonchar/KR: eu_01_t.h Only in Encode-2.12-1utf8nonchar/KR: eu_01_t.o Only in Encode-2.12-1utf8nonchar/KR: jo_02_t.c Only in Encode-2.12-1utf8nonchar/KR: jo_02_t.exh Only in Encode-2.12-1utf8nonchar/KR: jo_02_t.fnm Only in Encode-2.12-1utf8nonchar/KR: jo_02_t.h Only in Encode-2.12-1utf8nonchar/KR: jo_02_t.o Only in Encode-2.12-1utf8nonchar/KR: KR.bs Only in Encode-2.12-1utf8nonchar/KR: KR.c Only in Encode-2.12-1utf8nonchar/KR: KR.o Only in Encode-2.12-1utf8nonchar/KR: KR.xs Only in Encode-2.12-1utf8nonchar/KR: ks_03_t.c Only in Encode-2.12-1utf8nonchar/KR: ks_03_t.exh Only in Encode-2.12-1utf8nonchar/KR: ks_03_t.fnm Only in Encode-2.12-1utf8nonchar/KR: ks_03_t.h Only in Encode-2.12-1utf8nonchar/KR: ks_03_t.o Only in Encode-2.12-1utf8nonchar/KR: ma_04_t.c Only in Encode-2.12-1utf8nonchar/KR: ma_04_t.exh Only in Encode-2.12-1utf8nonchar/KR: ma_04_t.fnm Only in Encode-2.12-1utf8nonchar/KR: ma_04_t.h Only in Encode-2.12-1utf8nonchar/KR: ma_04_t.o Only in Encode-2.12-1utf8nonchar/KR: Makefile Only in Encode-2.12-1utf8nonchar/KR: pm_to_blib Only in Encode-2.12-1utf8nonchar/: Makefile Only in Encode-2.12-1utf8nonchar/: pm_to_blib Only in Encode-2.12-1utf8nonchar/Symbol: Makefile Only in Encode-2.12-1utf8nonchar/Symbol: pm_to_blib Only in Encode-2.12-1utf8nonchar/Symbol: Symbol.bs Only in Encode-2.12-1utf8nonchar/Symbol: Symbol.c Only in Encode-2.12-1utf8nonchar/Symbol: Symbol.o Only in Encode-2.12-1utf8nonchar/Symbol: symbol_t.c Only in Encode-2.12-1utf8nonchar/Symbol: symbol_t.exh Only in Encode-2.12-1utf8nonchar/Symbol: symbol_t.fnm Only in Encode-2.12-1utf8nonchar/Symbol: symbol_t.h Only in Encode-2.12-1utf8nonchar/Symbol: symbol_t.o Only in Encode-2.12-1utf8nonchar/Symbol: Symbol.xs Only in Encode-2.12-1utf8nonchar/TW: bi_00_t.c Only in Encode-2.12-1utf8nonchar/TW: bi_00_t.exh Only in Encode-2.12-1utf8nonchar/TW: bi_00_t.fnm Only in Encode-2.12-1utf8nonchar/TW: bi_00_t.h Only in Encode-2.12-1utf8nonchar/TW: bi_00_t.o Only in Encode-2.12-1utf8nonchar/TW: bi_01_t.c Only in Encode-2.12-1utf8nonchar/TW: bi_01_t.exh Only in Encode-2.12-1utf8nonchar/TW: bi_01_t.fnm Only in Encode-2.12-1utf8nonchar/TW: bi_01_t.h Only in Encode-2.12-1utf8nonchar/TW: bi_01_t.o Only in Encode-2.12-1utf8nonchar/TW: cp_02_t.c Only in Encode-2.12-1utf8nonchar/TW: cp_02_t.exh Only in Encode-2.12-1utf8nonchar/TW: cp_02_t.fnm Only in Encode-2.12-1utf8nonchar/TW: cp_02_t.h Only in Encode-2.12-1utf8nonchar/TW: cp_02_t.o Only in Encode-2.12-1utf8nonchar/TW: ma_03_t.c Only in Encode-2.12-1utf8nonchar/TW: ma_03_t.exh Only in Encode-2.12-1utf8nonchar/TW: ma_03_t.fnm Only in Encode-2.12-1utf8nonchar/TW: ma_03_t.h Only in Encode-2.12-1utf8nonchar/TW: ma_03_t.o Only in Encode-2.12-1utf8nonchar/TW: Makefile Only in Encode-2.12-1utf8nonchar/TW: pm_to_blib Only in Encode-2.12-1utf8nonchar/TW: TW.bs Only in Encode-2.12-1utf8nonchar/TW: TW.c Only in Encode-2.12-1utf8nonchar/TW: TW.o Only in Encode-2.12-1utf8nonchar/TW: TW.xs Only in Encode-2.12-1utf8nonchar/Unicode: Makefile Only in Encode-2.12-1utf8nonchar/Unicode: pm_to_blib Only in Encode-2.12-1utf8nonchar/Unicode: Unicode.bs Only in Encode-2.12-1utf8nonchar/Unicode: Unicode.c Only in Encode-2.12-1utf8nonchar/Unicode: Unicode.o Only in Encode-2.12-1utf8nonchar/Unicode: Unicode.xs~
From: jgmyers [...] proofpoint.com
Updated proposed fix. Needed to adjust a test case to avoid a problematic character.
diff -ru Encode-2.12-0orig/Encode.xs Encode-2.12-1utf8nonchar/Encode.xs --- Encode-2.12-0orig/Encode.xs 2006-03-13 10:09:45.000000000 -0800 +++ Encode-2.12-1utf8nonchar/Encode.xs 2006-03-13 11:19:59.000000000 -0800 @@ -335,6 +335,10 @@ if (strict && uv > PERL_UNICODE_MAX) ulen = -1; #endif + /* Work around perl #38722 */ + if (strict && ((uv & 0xFFFE) == 0xFFFE || + (uv >= 0xFDD0 && uv <= 0xFDEF))) + ulen = -1; if (ulen == -1) { if (strict) { uv = utf8n_to_uvuni(s, e - s, &ulen, diff -ru Encode-2.12-0orig/t/utf8strict.t Encode-2.12-1utf8nonchar/t/utf8strict.t --- Encode-2.12-0orig/t/utf8strict.t 2006-03-13 10:09:43.000000000 -0800 +++ Encode-2.12-1utf8nonchar/t/utf8strict.t 2006-03-13 13:46:54.000000000 -0800 @@ -43,7 +43,7 @@ %SEQ = ( qq/ed 9f bf/ => 0, # 2.3.1 qq/ee 80 80/ => 0, # 2.3.2 - qq/f4 8f bf bf/ => 0, # 2.3.3 + qq/f4 8f bf bd/ => 0, # 2.3.3 qq/f4 90 80 80/ => 1, # 2.3.4 -- out of range so NG # "3 Malformed sequences" are checked by perl. # "4 Overlong sequences" are checked by perl.
Download (untitled) / with headers
text/plain 243b
On Mon Mar 13 16:54:08 2006, guest wrote: Show quoted text
> Updated proposed fix. Needed to adjust a test case to avoid a > problematic character. >
The test in your attachment passes on Encode 2.17 so I consider this one fixed. Dan the Encode Maintainer


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.