This queue is for tickets about the XML-Twig CPAN distribution.

Report information
The Basics
Id:
45782
Status:
resolved
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
Frederik.Fouvry [...] acrolinx.com
Cc:
AdminCc:

BugTracker
Severity:
(no value)
Broken in:
(no value)
Fixed in:
3.33

Attachments
notok-02.xml notok.xml ok-00.xml ok-01.xml ok-02.xml Show all



Subject: xml_split behaves incorrectly when encoding is set in XML declaration and xml_pp is incorrect
Date: Wed, 6 May 2009 16:33:36 +0200
To: "bug-XML-Twig@rt.cpan.org" <bug-XML-Twig@rt.cpan.org>
From: Frederik Fouvry <Frederik.Fouvry@acrolinx.com>

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Hi,

 

xml_split does not seem to be working correctly when the xml declaration contains encoding=”utf-8” or encoding=”UTF-8”: in those cases, it removes all element content.

 

XML::Twig version 3.32

Perl version:

This is perl, v5.10.0 built for cygwin-thread-multi-64int

(with 6 registered patches, see perl -V for more detail)

 

OS version:

CYGWIN_NT-5.1 fenera 1.5.25(0.156/4/2) 2008-06-12 19:34 i686 Cygwin

 

Attached are some test data to reproduce the problem:

Notok.xml contains the encoding in the XML declaration, and the output files have no content.

Ok.xml does not contain the encoding attribute, and the output files work fine.

 

Expected behaviour: the encoding attribute is respected, or at least does not remove any content ;-)

 

In char_parser(), the print to $_[0] seems odd after shifting two arguments (according to the XML::Parser documentation, it only has two).  Also, $state->{current_fh} never seemed to have a value when the encoding is set.

 

And while I’m at it:

xml_pp does not seem to be syntactically correct in the same version of XML::Twig:

$ xml_pp

Bareword "pod2text" not allowed while "strict subs" in use at /usr/bin/xml_pp line 119.

Execution of /usr/bin/xml_pp aborted due to compilation errors.

 

Many thanks!


Frederik Fouvry
Senior Linguistic Engineer

--
Telephone +49 (0)30 288 84 83 34       -         Facsimile: +49 (0)30 288 84 83 39
acrolinx GmbH, Rosenstraße 2, 10178 Berlin, Germany - WWW: www.acrolinx.com
Geschäftsführer: Andrew Bredenkamp
Registration HRB 84183, Amtsgericht Berlin-Charlottenburg

 

On Wed May 06 10:34:58 2009, Frederik.Fouvry@acrolinx.com wrote:
Show quoted text
> Hi, > > xml_split does not seem to be working correctly when the xml > declaration contains encoding="utf-8" or encoding="UTF-8": in those > cases, it removes all element content. > > XML::Twig version 3.32 > Perl version: > This is perl, v5.10.0 built for cygwin-thread-multi-64int > (with 6 registered patches, see perl -V for more detail) > > OS version: > CYGWIN_NT-5.1 fenera 1.5.25(0.156/4/2) 2008-06-12 19:34 i686 Cygwin > > Attached are some test data to reproduce the problem: > Notok.xml contains the encoding in the XML declaration, and the output > files have no content. > Ok.xml does not contain the encoding attribute, and the output files > work fine. > > Expected behaviour: the encoding attribute is respected, or at least > does not remove any content ;-) > > In char_parser(), the print to $_[0] seems odd after shifting two > arguments (according to the XML::Parser documentation, it only has > two). Also, $state->{current_fh} never seemed to have a value when > the encoding is set. > > And while I'm at it: > xml_pp does not seem to be syntactically correct in the same version > of XML::Twig: > $ xml_pp > Bareword "pod2text" not allowed while "strict subs" in use at > /usr/bin/xml_pp line 119. > Execution of /usr/bin/xml_pp aborted due to compilation errors.
Both problems are fixed in the development version. I still have to figure out why I set that character handler, the tests don't show any problem when I don't set it, but I have to see what happens in the usual annoying cases, like a long CDATA section. __ mirod


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.