Skip Menu |

This queue is for tickets about the XML-SAX-PurePerl CPAN distribution.

Report information
The Basics
Id: 19543
Status: new
Priority: 0/
Queue: XML-SAX-PurePerl

Owner: Nobody in particular
Requestors: jmf [...]

Bug Information
Severity: Critical
Broken in: 0.80
Fixed in: (no value)

Subject: XML::SAX::PurePerl causes parse_string() to crash when handling UTF-8 combining characters
Download (untitled) / with headers
text/plain 858b
When XML::Sax is handed a string to parse that has UTF-8 combining characters in it, and XML::SAX::PurePerl is the SAX parser, it dies with an error: Cannot decode string with wide characters at /usr/local/lib/perl/5.8.4/ line 188. I've attached a short script that demonstrates this problem on my Linux box (debian sarge) running kernel 2.6.12-1.1372_FC3, Perl v5.8.5. The application I'm working with, Koha ( is an open-source integrated library automation system (library as in public library), which uses the MARC::File::XML module (which uses XML::SAX) to handle bibliographic records in the MARCXML format. This bug is a major problem for us as we have many users who have records in their system with combining characters. I'm sorry I don't have a patch, I'm still pretty new to SAX and encoding issues in general. Thanks!
text/x-perl 431b
#!/usr/bin/perl use XML::SAX; my $parser = XML::SAX::ParserFactory->parser( Handler => MySAXHandler->new ); binmode STDOUT, ":utf8"; print "\x{65}\x{301}\n"; $parser->parse_string("<xml>\xEF\xBB\xBF\x{65}\x{301}</xml>"); package MySAXHandler; use base qw(XML::SAX::Base); sub start_document { my ($self, $doc) = @_; # process document start event } sub start_element { my ($self, $el) = @_; # process element start event }

This service is sponsored and maintained by Best Practical Solutions and runs on infrastructure.

Please report any issues with to