Skip Menu |
 

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 46701
Status: open
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: bhawkeslewis [...] googlemail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.33
Fixed in: (no value)



Subject: Incorrect character mapping in Encode::GSM0338
Download (untitled) / with headers
text/plain 724b
As you can see from the source code: http://cpansearch.perl.org/src/DANKOGAI/Encode- 2.33/lib/Encode/GSM0338.pm Encode::GSM maps 0x09 in GSM to lowercase c cedilla in Unicode (U+00E7). "\x{00E7}" => "\x09", # LATIN SMALL LETTER C WITH CEDILLA But I think this is wrong. GSM 03.38 maps the same character to /uppercase/ c cedilla (U+00C7). See ETSI TS 100 900 V7.2.0 (1999-07) Digital cellular telecommunications system (Phase 2+); Alphabets and language-specific information (GSM 03.38 version 7.2.0 Release 1998), Section 6.2.1 ("Default Alphabet"): http://pda.etsi.org/exchangefolder/ts_100900v070200p.pdf So this line needs changing to: "\x{00C7}" => "\x09", # LATIN CAPITAL LETTER C WITH CEDILLA
Download (untitled) / with headers
text/plain 1007b
On Sat Jun 06 06:03:09 2009, benjaminhawkeslewis wrote: Show quoted text
> As you can see from the source code: > > http://cpansearch.perl.org/src/DANKOGAI/Encode- > 2.33/lib/Encode/GSM0338.pm > > Encode::GSM maps 0x09 in GSM to lowercase c cedilla in Unicode (U+00E7). > > "\x{00E7}" => "\x09", # LATIN SMALL LETTER C WITH CEDILLA > > But I think this is wrong. > > GSM 03.38 maps the same character to /uppercase/ c cedilla (U+00C7). > > See ETSI TS 100 900 V7.2.0 (1999-07) Digital cellular telecommunications > system (Phase 2+); Alphabets and language-specific information (GSM > 03.38 version 7.2.0 Release 1998), Section 6.2.1 ("Default Alphabet"): > > http://pda.etsi.org/exchangefolder/ts_100900v070200p.pdf > > So this line needs changing to: > > "\x{00C7}" => "\x09", # LATIN CAPITAL LETTER C WITH CEDILLA
But that conflicts with what http://pda.etsi.org/exchangefolder/ts_100900v070200p.pdf says. Section 6.2.1 just shows the glyph. No unicode code point. Dan the Encode Maintainer
Download (untitled) / with headers
text/plain 653b
The Unicode Consortium's mapping table for GSM 03.38 has this to say on the matter: # The ETSI GSM 03.38 specification shows an uppercase C-cedilla # glyph at 0x09. This may be the result of limited display # capabilities for handling characters with descenders. However, the # language coverage intent is clearly for the lowercase c-cedilla, as shown # in the mapping below. The mapping for uppercase C-cedilla is shown # in a commented line in the mapping table. The other accented characters in column 0000 of the table are mostly lowercase with no uppercase equivalents elsewhere in the mapping, so who knows.


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.