Skip Menu |
 
rt.cpan.org will be shut down on March 1st, 2021.

This queue is for tickets about the XML-Simple CPAN distribution.

Report information
The Basics
Id: 108956
Status: rejected
Priority: 0/
Queue: XML-Simple

People
Owner: grantm [...] cpan.org
Requestors: PJNEWMAN [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.18
Fixed in: (no value)



Subject: XML::Simple doesn't encode unprintable characters
Download (untitled) / with headers
text/plain 439b
XML::Simple doesn't encode unprintable characters and therefore generates invalid XML: XML::Simple isn't escaping the characters, hence the invalid XML being generated. See the example code below: #!/usr/bin/perl -w use strict; use XML::Simple; my $conf; $conf->{baz}[0] = "foo\x07bar"; print XMLout($conf, keyattr => ['']); ./testxml | xmllint --noout - -:2: parser error : PCDATA invalid Char value 7 <baz>foobar</baz> ^
Download (untitled) / with headers
text/plain 822b
My workaround is currently this, but it's not particularly ideal: #!/usr/bin/perl -w use strict; use XML::Simple; my $conf; my $string = "foo\x07bar"; $string =~ s/([\0-\x08\x0b\x0c\x0e-\x1f\x7f])/sprintf("\\x%02x",ord($1));/xeg; $conf->{baz}[0] = $string; print XMLout($conf, keyattr => ['']); On Sun Nov 15 10:38:43 2015, PJNEWMAN wrote: Show quoted text
> XML::Simple doesn't encode unprintable characters and therefore > generates invalid XML: > > XML::Simple isn't escaping the characters, hence the invalid XML being > generated. See the example code below: > > #!/usr/bin/perl -w > use strict; > use XML::Simple; > > my $conf; > $conf->{baz}[0] = "foo\x07bar"; > print XMLout($conf, keyattr => ['']); > > > ./testxml | xmllint --noout - > -:2: parser error : PCDATA invalid Char value 7 > <baz>foobar</baz> > ^
Download (untitled) / with headers
text/plain 1.2k
Of course I actually meant the following to correctly escape to XML: #!/usr/bin/perl -w use strict; use XML::Simple; my $conf; my $string = "foo\x07bar"; $string =~ s/([\0-\x08\x0b\x0c\x0e-\x1f\x7f])/sprintf("&#x%02x;",ord($1));/eg; $conf->{baz}[0] = $string; print XMLout($conf, keyattr => ['']); On Sun Nov 15 11:06:14 2015, PJNEWMAN wrote: Show quoted text
> My workaround is currently this, but it's not particularly ideal: > #!/usr/bin/perl -w > use strict; > use XML::Simple; > > my $conf; > my $string = "foo\x07bar"; > $string =~ s/([\0-\x08\x0b\x0c\x0e- > \x1f\x7f])/sprintf("\\x%02x",ord($1));/xeg; > $conf->{baz}[0] = $string; > print XMLout($conf, keyattr => ['']); > > On Sun Nov 15 10:38:43 2015, PJNEWMAN wrote:
> > XML::Simple doesn't encode unprintable characters and therefore > > generates invalid XML: > > > > XML::Simple isn't escaping the characters, hence the invalid XML > > being > > generated. See the example code below: > > > > #!/usr/bin/perl -w > > use strict; > > use XML::Simple; > > > > my $conf; > > $conf->{baz}[0] = "foo\x07bar"; > > print XMLout($conf, keyattr => ['']); > > > > > > ./testxml | xmllint --noout - > > -:2: parser error : PCDATA invalid Char value 7 > > <baz>foobar</baz> > > ^
Download (untitled) / with headers
text/plain 1.1k
Hi Peter The problem with the characters you're encountering is that they are not valid characters in an XML document regardless of whether they are represented as simple bytes or as numeric character entities. The relevant section of the spec is: http://www.w3.org/TR/REC-xml/#charsets and it defines a character as: Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] You'll notice that this excludes characters in the ranges x00-x08 and x0E-x1F. I think that XML 1.1 does relax this restriction but that won't help unless you're passing the XML to a 1.1 compliant parser (you'd probably also need to add an XML declaration with version='1.1'). If you are feeding the resulting XML to a parser that accepts &#x07; as a valid character, then you can subclass XML::Simple to implement your escaping: ========== package XML::SimpleCustomEscapes; use parent 'XML::Simple'; sub escape_value { my $self = shift; my $data = $self->SUPER::escape_value(shift); $data =~ s/([\0-\x08\x0b\x0c\x0e-\x1f\x7f])/sprintf("&#x%02x;",ord($1));/eg; return $data; } 1; ========== Regards Grant


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.