Skip Menu | You are currently an anonymous guest. | Login | Return to Main | About rt.cpan.org
 

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.

X Report information
Id: 35672
Status: patched
Left: 0 min
Priority: 0/0
Queue: XML-Twig

Owner: Nobody
Requestors: seth.viebrock <seth.viebrock [...] gmail.com>
Cc:
AdminCc:

Severity: Critical
Broken in: 3.32
Fixed in: (no value)




X History Display mode: Brief headersFull headers
#   Tue May 06 18:01:45 2008 seth.viebrock - Ticket created  
Subject: Large element size makes XML::Twig stall while parsing
[text/plain 532b]
See the attached xml doc.
I am working on a major project that relies on XML::Twig at its core. We
embed files in an xml doc in base64 format, after compressing with
Compress::Zlib, so that we can pass xml around to different systems. The
base64 is housed in a single element. I've found that the size of this
element is exponentially related to processing time during parsing.
Creating a single element with 4MB of base64 makes XML::Twig completely
unusable. Tested on Mac OS Leopard Perl 5.10.0 and Ubuntu 7.10 Server
Perl 5.8.8.
Subject: big_element.xml

[text/xml 4632.8k]
Message body not shown because it is too large or is not plain text.
#   Tue May 06 21:49:41 2008 seth.viebrock - Correspondence added  
From: seth.viebrock[...]gmail.com
[text/plain 136b]
From further investigation this seems to be a "bug" in XML::Parser. The
code that makes it hang is:
$t= eval { $t->SUPER::parse( @_); };
#   Tue May 06 22:22:28 2008 seth.viebrock - Correspondence added  
From: seth.viebrock[...]gmail.com
[text/plain 270b]
...which ultimately ends up calling the following code in XML::Expat,
and hangs. A cold call to XML::Parser->parse does not yield this error,
so it seems related to the arguments that Twig is ultimately passing to
Expat.

eval {
$result = $expat->parse($arg);
};
#   Wed May 07 03:45:23 2008 xmltwig[...]gmail.com - Correspondence added  
Subject: Re: [rt.cpan.org #35672] Large element size makes XML::Twig stall while parsing
Date: Wed, 07 May 2008 09:45:37 +0200
To: bug-XML-Twig[...]rt.cpan.org
From: mirod <xmltwig[...]gmail.com>
[text/plain 1.4k]
Seth Viebrock via RT wrote:
> Queue: XML-Twig
> Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=35672 >
>
> ...which ultimately ends up calling the following code in XML::Expat,
> and hangs. A cold call to XML::Parser->parse does not yield this error,
> so it seems related to the arguments that Twig is ultimately passing to
> Expat.
>
> eval {
> $result = $expat->parse($arg);
> };

Hi,

The problem is that the simple call to expat doesn't includes any
handlers. As XML::Twig builds the tree for the XML, OTOH, it kinda needs
to set handlers on the various events.

In this case the character handler is called for each line of the data,
actually twice for each line, once for the data and once for the line
return. So it ends up being called over 120 000 times for your example.
That's always going to be longer than not calling the handler at all!

The good news is that I made a mistake in that handler. I did not
provide an explicit return: the returned value is not used in any way,
so why bother? Why? Because as it was written it returned the partial
content of the element. So it ended up passing 120 000 * 4Mb/2 (average
size of the text content of the element) so 500G of data to be
allocated, copied, and de-allocated (one hopes!). I added an explicit
empty return and voilĂ ! Processing time went from 581s down to 2s.

The new version is at the usual place: http://xmltwig.com/xmltwig/

Thanks a lot for the bug report, this improvement should benefit most
users (including me!)

--
mirod

#   Wed May 07 03:45:24 2008 RT_System - Status changed from 'new' to 'open'  
#   Wed May 07 14:34:27 2008 seth.viebrock - Correspondence added  
From: seth.viebrock[...]gmail.com
[text/plain 1.7k]
On Wed May 07 03:45:23 2008, xmltwig[...]gmail.com wrote:
> Seth Viebrock via RT wrote:
> > Queue: XML-Twig
> > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=35672 >
> >
> > ...which ultimately ends up calling the following code in XML::Expat,
> > and hangs. A cold call to XML::Parser->parse does not yield this error,
> > so it seems related to the arguments that Twig is ultimately passing to
> > Expat.
> >
> > eval {
> > $result = $expat->parse($arg);
> > };
>
> Hi,
>
> The problem is that the simple call to expat doesn't includes any
> handlers. As XML::Twig builds the tree for the XML, OTOH, it kinda needs
> to set handlers on the various events.
>
> In this case the character handler is called for each line of the data,
> actually twice for each line, once for the data and once for the line
> return. So it ends up being called over 120 000 times for your example.
> That's always going to be longer than not calling the handler at all!
>
> The good news is that I made a mistake in that handler. I did not
> provide an explicit return: the returned value is not used in any way,
> so why bother? Why? Because as it was written it returned the partial
> content of the element. So it ended up passing 120 000 * 4Mb/2 (average
> size of the text content of the element) so 500G of data to be
> allocated, copied, and de-allocated (one hopes!). I added an explicit
> empty return and voilĂ ! Processing time went from 581s down to 2s.
>
> The new version is at the usual place: http://xmltwig.com/xmltwig/
>
> Thanks a lot for the bug report, this improvement should benefit most
> users (including me!)
>

Beautiful! Thanks so much for the quick response. This definitely saved
my hide, and I'm glad it will help others, too. Open source and
XML::Twig rule!


#   Mon Aug 17 05:04:26 2009 MIROD - Status changed from 'open' to 'patched'