Skip Menu |
 

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 63449
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: ambs [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.70
Fixed in: (no value)



Subject: Parsing HTML files using too much memory
Download (untitled) / with headers
text/plain 717b
Hello Some script worked in an older incarnation of XML::LibXML and libxml2 (unfortunately I am not able to tell you whose, but I am trying to find out) and now it is not working, because XML::LibXML is eating too much memory. In attach you can check the test case. xmllint --html processes it very fast. But with perl -MXML::LibXML -e '$d = XML::LibXML->load_html(location => shift, recover => 2, encoding=>"UTF-8")' file.html it never ends processing (Perl gets out of memory). I'll add any information as soon as I can dig it out. Thank you Alberto This is perl 5, version 12, subversion 2 (v5.12.2) built for darwin-thread-multi-2level on Mac OS X, but had the same problem with Linux. Can get more details.
Subject: agriculture.html
Download agriculture.html
text/html 1.1m

Message body is not shown because it is too large.

From: psantann [...] gmail.com
Download (untitled) / with headers
text/plain 1.5k
On Mon Nov 29 17:28:38 2010, AMBS wrote: Show quoted text
> Hello > > Some script worked in an older incarnation of XML::LibXML and libxml2 > (unfortunately I am not able to tell you whose, but I am trying to find > out) and now it is not working, because XML::LibXML is eating too much > memory. > > In attach you can check the test case. xmllint --html processes it very > fast. But with > > perl -MXML::LibXML -e '$d = XML::LibXML->load_html(location => shift, > recover => 2, encoding=>"UTF-8")' file.html > > it never ends processing (Perl gets out of memory). > > I'll add any information as soon as I can dig it out. > Thank you > Alberto > > This is perl 5, version 12, subversion 2 (v5.12.2) built for > darwin-thread-multi-2level on Mac OS X, but had the same problem with > Linux. Can get more details.
Hello, I have a similar using with loading a large XSD schema (32M): http://code.activestate.com/lists/perl-xml/8898/ With xmllint it is fine but using LibXML it takes a wealth more of memory (I needed at least 8G of ram ): procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st # system idle 2 0 92 1467960 435156 5180300 0 0 0 10 1 2 0 0 100 0 0 # xmllint 1 0 92 1289672 435156 5180304 0 0 0 232 1071 438 8 6 86 0 0 # LibXML: 3 0 92 43880 434892 5126952 0 0 0 33 1061 195 13 10 77 0 0 ^^^^^ I believe the root cause should be the same, but let me know if you need more info to reproduce this. Thanks, Paulo
From: psantann [...] gmail.com
Download (untitled) / with headers
text/plain 1.7k
On Thu Jun 30 16:25:22 2011, psantanna wrote: Show quoted text
> On Mon Nov 29 17:28:38 2010, AMBS wrote:
> > Hello > > > > Some script worked in an older incarnation of XML::LibXML and
> libxml2
> > (unfortunately I am not able to tell you whose, but I am trying to
> find
> > out) and now it is not working, because XML::LibXML is eating too
> much
> > memory. > > > > In attach you can check the test case. xmllint --html processes it
> very
> > fast. But with > > > > perl -MXML::LibXML -e '$d = XML::LibXML->load_html(location =>
> shift,
> > recover => 2, encoding=>"UTF-8")' file.html > > > > it never ends processing (Perl gets out of memory). > > > > I'll add any information as soon as I can dig it out. > > Thank you > > Alberto > > > > This is perl 5, version 12, subversion 2 (v5.12.2) built for > > darwin-thread-multi-2level on Mac OS X, but had the same problem
> with
> > Linux. Can get more details.
> > Hello, > > I have a similar using with loading a large XSD schema (32M): > > http://code.activestate.com/lists/perl-xml/8898/ > > With xmllint it is fine but using LibXML it takes a wealth more of > memory (I needed at > least 8G of ram ): > > procs -----------memory---------- ---swap-- -----io---- --system-- > -----cpu------ > r b swpd free buff cache si so bi bo in cs us > sy id wa st > # system idle > 2 0 92 1467960 435156 5180300 0 0 0 10 1 2 0 > 0 100 0 0 > # xmllint > 1 0 92 1289672 435156 5180304 0 0 0 232 1071 438 8 > 6 86 0 0 > # LibXML: > 3 0 92 43880 434892 5126952 0 0 0 33 1061 195 13 > 10 77 0 0 > ^^^^^ > I believe the root cause should be the same, but let me know if you > need more info to > reproduce this. > > Thanks, > > Paulo
From: psantann [...] gmail.com
Download (untitled) / with headers
text/plain 2.2k
Not sure if it was a side effect of another fix or if someone purposefully pursed this bug but I just test this again with the new 1.77 version and the memory issue is gone. AMBS: This probably should fix it for you was well and the bug be closed. Thanks a lot for all that worked in getting this fixed! Paulo On Wed Jul 06 12:51:48 2011, psantanna wrote: Show quoted text
> On Thu Jun 30 16:25:22 2011, psantanna wrote:
> > On Mon Nov 29 17:28:38 2010, AMBS wrote:
> > > Hello > > > > > > Some script worked in an older incarnation of XML::LibXML and
> > libxml2
> > > (unfortunately I am not able to tell you whose, but I am trying to
> > find
> > > out) and now it is not working, because XML::LibXML is eating too
> > much
> > > memory. > > > > > > In attach you can check the test case. xmllint --html processes it
> > very
> > > fast. But with > > > > > > perl -MXML::LibXML -e '$d = XML::LibXML->load_html(location =>
> > shift,
> > > recover => 2, encoding=>"UTF-8")' file.html > > > > > > it never ends processing (Perl gets out of memory). > > > > > > I'll add any information as soon as I can dig it out. > > > Thank you > > > Alberto > > > > > > This is perl 5, version 12, subversion 2 (v5.12.2) built for > > > darwin-thread-multi-2level on Mac OS X, but had the same problem
> > with
> > > Linux. Can get more details.
> > > > Hello, > > > > I have a similar using with loading a large XSD schema (32M): > > > > http://code.activestate.com/lists/perl-xml/8898/ > > > > With xmllint it is fine but using LibXML it takes a wealth more of > > memory (I needed at > > least 8G of ram ): > > > > procs -----------memory---------- ---swap-- -----io---- --system-- > > -----cpu------ > > r b swpd free buff cache si so bi bo in cs us > > sy id wa st > > # system idle > > 2 0 92 1467960 435156 5180300 0 0 0 10 1 2
0 Show quoted text
> > 0 100 0 0 > > # xmllint > > 1 0 92 1289672 435156 5180304 0 0 0 232 1071 438
8 Show quoted text
> > 6 86 0 0 > > # LibXML: > > 3 0 92 43880 434892 5126952 0 0 0 33 1061 195 13 > > 10 77 0 0 > > ^^^^^ > > I believe the root cause should be the same, but let me know if you > > need more info to > > reproduce this. > > > > Thanks, > > > > Paulo
> >
Download (untitled) / with headers
text/plain 199b
I solved my problem some time ago. It was something related to loading external entities, or something. Unfortunately I can't recall exactly what it was. I'm happy with closing the ticket. Cheers.
Download (untitled) / with headers
text/plain 178b
OK, so in accordance with the conversation here, I am closing this bug. Thanks for the report. Please report a new bug if there are still any problems. Regards, -- Shlomi Fish


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.