Skip Menu |
 

This queue is for tickets about the XML-Twig CPAN distribution.

Report information
The Basics
Id: 35672
Status: resolved
Priority: 0/
Queue: XML-Twig

People
Owner: Nobody in particular
Requestors: seth.viebrock [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 3.32
Fixed in: 3.33



Subject: Large element size makes XML::Twig stall while parsing
Download (untitled) / with headers
text/plain 532b
See the attached xml doc. I am working on a major project that relies on XML::Twig at its core. We embed files in an xml doc in base64 format, after compressing with Compress::Zlib, so that we can pass xml around to different systems. The base64 is housed in a single element. I've found that the size of this element is exponentially related to processing time during parsing. Creating a single element with 4MB of base64 makes XML::Twig completely unusable. Tested on Mac OS Leopard Perl 5.10.0 and Ubuntu 7.10 Server Perl 5.8.8.
Subject: big_element.xml
Download big_element.xml
text/xml 4.5m

Message body is not shown because it is too large.

From: seth.viebrock [...] gmail.com
Download (untitled) / with headers
text/plain 136b
From further investigation this seems to be a "bug" in XML::Parser. The code that makes it hang is: $t= eval { $t->SUPER::parse( @_); };
From: seth.viebrock [...] gmail.com
Download (untitled) / with headers
text/plain 270b
...which ultimately ends up calling the following code in XML::Expat, and hangs. A cold call to XML::Parser->parse does not yield this error, so it seems related to the arguments that Twig is ultimately passing to Expat. eval { $result = $expat->parse($arg); };
Subject: Re: [rt.cpan.org #35672] Large element size makes XML::Twig stall while parsing
Date: Wed, 07 May 2008 09:45:37 +0200
To: bug-XML-Twig [...] rt.cpan.org
From: mirod <xmltwig [...] gmail.com>
Download (untitled) / with headers
text/plain 1.4k
Seth Viebrock via RT wrote: Show quoted text
> Queue: XML-Twig > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=35672 > > > ...which ultimately ends up calling the following code in XML::Expat, > and hangs. A cold call to XML::Parser->parse does not yield this error, > so it seems related to the arguments that Twig is ultimately passing to > Expat. > > eval { > $result = $expat->parse($arg); > };
Hi, The problem is that the simple call to expat doesn't includes any handlers. As XML::Twig builds the tree for the XML, OTOH, it kinda needs to set handlers on the various events. In this case the character handler is called for each line of the data, actually twice for each line, once for the data and once for the line return. So it ends up being called over 120 000 times for your example. That's always going to be longer than not calling the handler at all! The good news is that I made a mistake in that handler. I did not provide an explicit return: the returned value is not used in any way, so why bother? Why? Because as it was written it returned the partial content of the element. So it ended up passing 120 000 * 4Mb/2 (average size of the text content of the element) so 500G of data to be allocated, copied, and de-allocated (one hopes!). I added an explicit empty return and voilà! Processing time went from 581s down to 2s. The new version is at the usual place: http://xmltwig.com/xmltwig/ Thanks a lot for the bug report, this improvement should benefit most users (including me!) -- mirod
From: seth.viebrock [...] gmail.com
Download (untitled) / with headers
text/plain 1.7k
On Wed May 07 03:45:23 2008, xmltwig@gmail.com wrote: Show quoted text
> Seth Viebrock via RT wrote:
> > Queue: XML-Twig > > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=35672 > > > > > ...which ultimately ends up calling the following code in XML::Expat, > > and hangs. A cold call to XML::Parser->parse does not yield this error, > > so it seems related to the arguments that Twig is ultimately passing to > > Expat. > > > > eval { > > $result = $expat->parse($arg); > > };
> > Hi, > > The problem is that the simple call to expat doesn't includes any > handlers. As XML::Twig builds the tree for the XML, OTOH, it kinda needs > to set handlers on the various events. > > In this case the character handler is called for each line of the data, > actually twice for each line, once for the data and once for the line > return. So it ends up being called over 120 000 times for your example. > That's always going to be longer than not calling the handler at all! > > The good news is that I made a mistake in that handler. I did not > provide an explicit return: the returned value is not used in any way, > so why bother? Why? Because as it was written it returned the partial > content of the element. So it ended up passing 120 000 * 4Mb/2 (average > size of the text content of the element) so 500G of data to be > allocated, copied, and de-allocated (one hopes!). I added an explicit > empty return and voilà! Processing time went from 581s down to 2s. > > The new version is at the usual place: http://xmltwig.com/xmltwig/ > > Thanks a lot for the bug report, this improvement should benefit most > users (including me!) >
Beautiful! Thanks so much for the quick response. This definitely saved my hide, and I'm glad it will help others, too. Open source and XML::Twig rule!


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.