Skip Menu |
 

This queue is for tickets about the XML-Twig CPAN distribution.


Subject: xml_split behaves incorrectly when encoding is set in XML declaration and xml_pp is incorrect
Date: Wed, 6 May 2009 16:33:36 +0200
To: "bug-XML-Twig [...] rt.cpan.org" <bug-XML-Twig [...] rt.cpan.org>
From: Frederik Fouvry <Frederik.Fouvry [...] acrolinx.com>
Download ok.xml
text/xml 132b

Message body is not shown because sender requested not to inline it.

Download notok-00.xml
text/xml 89b

Message body is not shown because sender requested not to inline it.

Download notok-01.xml
text/xml 22b

Message body is not shown because sender requested not to inline it.

Download notok-02.xml
text/xml 22b

Message body is not shown because sender requested not to inline it.

Download notok.xml
text/xml 149b

Message body is not shown because sender requested not to inline it.

Download ok-00.xml
text/xml 89b

Message body is not shown because sender requested not to inline it.

Download ok-01.xml
text/xml 41b

Message body is not shown because sender requested not to inline it.

Download ok-02.xml
text/xml 45b

Message body is not shown because sender requested not to inline it.

Download (untitled) / with headers
text/plain 1.5k
Hi, xml_split does not seem to be working correctly when the xml declaration contains encoding="utf-8" or encoding="UTF-8": in those cases, it removes all element content. XML::Twig version 3.32 Perl version: This is perl, v5.10.0 built for cygwin-thread-multi-64int (with 6 registered patches, see perl -V for more detail) OS version: CYGWIN_NT-5.1 fenera 1.5.25(0.156/4/2) 2008-06-12 19:34 i686 Cygwin Attached are some test data to reproduce the problem: Notok.xml contains the encoding in the XML declaration, and the output files have no content. Ok.xml does not contain the encoding attribute, and the output files work fine. Expected behaviour: the encoding attribute is respected, or at least does not remove any content ;-) In char_parser(), the print to $_[0] seems odd after shifting two arguments (according to the XML::Parser documentation, it only has two). Also, $state->{current_fh} never seemed to have a value when the encoding is set. And while I'm at it: xml_pp does not seem to be syntactically correct in the same version of XML::Twig: $ xml_pp Bareword "pod2text" not allowed while "strict subs" in use at /usr/bin/xml_pp line 119. Execution of /usr/bin/xml_pp aborted due to compilation errors. Many thanks! Frederik Fouvry Senior Linguistic Engineer -- Telephone +49 (0)30 288 84 83 34 - Facsimile: +49 (0)30 288 84 83 39 acrolinx GmbH, Rosenstraße 2, 10178 Berlin, Germany - WWW: www.acrolinx.com Geschäftsführer: Andrew Bredenkamp Registration HRB 84183, Amtsgericht Berlin-Charlottenburg
Download (untitled) / with headers
text/plain 1.6k
On Wed May 06 10:34:58 2009, Frederik.Fouvry@acrolinx.com wrote: Show quoted text
> Hi, > > xml_split does not seem to be working correctly when the xml > declaration contains encoding="utf-8" or encoding="UTF-8": in those > cases, it removes all element content. > > XML::Twig version 3.32 > Perl version: > This is perl, v5.10.0 built for cygwin-thread-multi-64int > (with 6 registered patches, see perl -V for more detail) > > OS version: > CYGWIN_NT-5.1 fenera 1.5.25(0.156/4/2) 2008-06-12 19:34 i686 Cygwin > > Attached are some test data to reproduce the problem: > Notok.xml contains the encoding in the XML declaration, and the output > files have no content. > Ok.xml does not contain the encoding attribute, and the output files > work fine. > > Expected behaviour: the encoding attribute is respected, or at least > does not remove any content ;-) > > In char_parser(), the print to $_[0] seems odd after shifting two > arguments (according to the XML::Parser documentation, it only has > two). Also, $state->{current_fh} never seemed to have a value when > the encoding is set. > > And while I'm at it: > xml_pp does not seem to be syntactically correct in the same version > of XML::Twig: > $ xml_pp > Bareword "pod2text" not allowed while "strict subs" in use at > /usr/bin/xml_pp line 119. > Execution of /usr/bin/xml_pp aborted due to compilation errors.
Both problems are fixed in the development version. I still have to figure out why I set that character handler, the tests don't show any problem when I don't set it, but I have to see what happens in the usual annoying cases, like a long CDATA section. __ mirod


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.