Skip Menu |
 

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 58024
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: milu71 [...] googlemail.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 1.70
Fixed in: (no value)



Subject: XML::LibXML->new, recover flag, suppress warnings
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 1659
Download (untitled) / with headers
text/plain 1.6k
As reported today on the Perl-XML mailing list: http://aspn.activestate.com/ASPN/Mail/Message/Perl-XML/3862042 In XML::LibXML, warnings are not suppressed when specifying the recover or recover_silently flags as per the following excerpt from the manpage: -------- recover /parser, html, reader/ recover from errors; possible values are 0, 1, and 2 A true value turns on recovery mode which allows one to parse broken XML or HTML data. The recovery mode allows the parser to return the successfully parsed portion of the input document. This is useful for almost well-formed documents, where for example a closing tag is missing somewhere. Still, XML::LibXML will only parse until the first fatal (non-recoverable) error occurs, reporting recoverable parsing errors as warnings. To suppress even these warnings, use recover=>2. -------- http://search.cpan.org/dist/XML-LibXML/lib/XML/LibXML/Parser.pod Here's a test case to evidence the behaviour: # use strict; # use warnings; # use utf8; use XML::LibXML; my $txt = <<'EOS'; <div> <a href="/app/search?op=list&type=50">eins</a> <!-- HTML parser error : htmlParseEntityRef: expecting ';' --> </div> EOS my $prsr = XML::LibXML->new( # see perldoc XML::LibXML::Parser recover => 2, # makes parser go on despite errors # suppress_warnings => 1, # doesn't shut the warning off # suppress_errors => 1, # not either ); my $dom = $prsr->load_html( string => $txt ); print $dom->toString( 1 ); print "$_\n" for XML::LibXML::LIBXML_DOTTED_VERSION, # 2.7.6 in my case XML::LibXML::LIBXML_VERSION, # 20706 XML::LibXML::LIBXML_RUNTIME_VERSION; # 20707 yeah, I know ;-)
From milu71 [...] gmx.de Tue Jun 1 17: 03:33 2010
MIME-Version: 1.0
X-Y-GMX-Trusted: 0
X-Spam-Status: No, score=-9.906 tagged_above=-99.9 required=10 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, SPF_FAIL=0.693] autolearn=ham
In-Reply-To: <rt-3.8.HEAD-10893-1275423822-58.58024-3-0 [...] rt.cpan.org>
Content-Disposition: inline
X-Spam-Flag: NO
X-Authenticated: #48488578
X-Provags-ID: V01U2FsdGVkX1/+Ofh/G0H+7TlOtC0L/DkvqQbmTZDkgW+nRFojeE 3c6kiax0Q9/aH5
References: <RT-Ticket-58024 [...] rt.cpan.org> <rt-3.8.HEAD-10893-1275423822-58.58024-3-0 [...] rt.cpan.org>
X-Virus-Scanned: Debian amavisd-new at bestpractical.com
Message-ID: <20100601210356.GI3700 [...] wladimir>
Content-Type: text/plain; charset="utf-8"
X-RT-Original-Encoding: utf-8
X-Spam-Score: -9.906
Received: from localhost (localhost [127.0.0.1]) by hipster.bestpractical.com (Postfix) with ESMTP id C027E2407A1 for <cpan-bug+XML-LibXML [...] hipster.bestpractical.com>; Tue, 1 Jun 2010 17:03:33 -0400 (EDT)
Received: from hipster.bestpractical.com ([127.0.0.1]) by localhost (hipster.bestpractical.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XLvhcW6cBaCZ for <cpan-bug+XML-LibXML [...] hipster.bestpractical.com>; Tue, 1 Jun 2010 17:03:31 -0400 (EDT)
Received: from la.mx.develooper.com (x1.develooper.com [207.171.7.70]) by hipster.bestpractical.com (Postfix) with SMTP id A279B240783 for <bug-XML-LibXML [...] rt.cpan.org>; Tue, 1 Jun 2010 17:03:31 -0400 (EDT)
Received: (qmail 13325 invoked by uid 103); 1 Jun 2010 21:04:19 -0000
Received: from x16.dev (10.0.100.26) by x1.dev with QMQP; 1 Jun 2010 21:04:19 -0000
Received: from mail.gmx.net (HELO mail.gmx.net) (213.165.64.20) by 16.mx.develooper.com (qpsmtpd/0.80) with SMTP; Tue, 01 Jun 2010 14:04:17 -0700
Received: (qmail invoked by alias); 01 Jun 2010 21:04:12 -0000
Received: from g224133142.adsl.alicedsl.de (EHLO wladimir) [92.224.133.142] by mail.gmx.net (mp071) with SMTP; 01 Jun 2010 23:04:12 +0200
Received: by wladimir (sSMTP sendmail emulation); Tue, 01 Jun 2010 23:03:56 +0200
Delivered-To: cpan-bug+XML-LibXML [...] hipster.bestpractical.com
Subject: Re: [rt.cpan.org #58024] AutoReply: XML::LibXML->new, recover flag, suppress warnings
User-Agent: Mutt/1.4.2.2i
Return-Path: <milu71 [...] gmx.de>
X-Spam-Check-BY: 16.mx.develooper.com
X-Original-To: cpan-bug+XML-LibXML [...] hipster.bestpractical.com
X-RT-Mail-Extension: xml-libxml
Date: Tue, 1 Jun 2010 23:03:56 +0200
X-Spam-Level:
To: Bugs in XML-LibXML via RT <bug-XML-LibXML [...] rt.cpan.org>
From: Michael Ludwig <milu71 [...] gmx.de>
RT-Message-ID: <rt-3.8.HEAD-10879-1275426263-936.58024-0-0 [...] rt.cpan.org>
Content-Length: 1111
Here's a better test case using Test::More: use strict; use warnings; use utf8; use XML::LibXML; use Test::More tests => 2; my $txt = <<'EOS'; <div> <a href="milu?a=eins&b=zwei"> ampersand not URL-encoded </a> <!-- HTML parser error : htmlParseEntityRef: expecting ';' --> </div> EOS my %opt = ( # see perldoc XML::LibXML::Parser recover => 1, # makes parser go on despite errors # suppress_warnings => 1, # doesn't shut the warning off # suppress_errors => 1, # not either ); my( $fh, $buf ); { open $fh, '>', \$buf; # open filehandle to scalar variable local *STDERR = $fh; # redirect STDERR there XML::LibXML->new( %opt )->load_html( string => $txt ); close $fh; # warning now in scalar variable like $buf, qr/htmlParseEntityRef:/, 'warning emitted'; open $fh, '>', \$buf; # new filehandle, clears buffer $opt{recover} = 2; # suppress warnings XML::LibXML->new( %opt )->load_html( string => $txt ); close $fh; is $buf, '', 'no warning emitted'; } -- Michael Ludwig
MIME-Version: 1.0
X-Mailer: MIME-tools 5.427 (Entity 5.427)
Content-Disposition: inline
Content-Type: text/plain; charset="UTF-8"
Message-ID: <rt-3.8.HEAD-9059-1309525736-1249.58024-0-0 [...] rt.cpan.org>
Content-Transfer-Encoding: binary
X-RT-Original-Encoding: utf-8
Content-Length: 213
Download (untitled) / with headers
text/plain 213b
Hi. Thanks for the report. I've integrated your test code into t/49_load_html.t and there's a fix here: https://bitbucket.org/shlomif/perl-xml-libxml It will be uploaded to CPAN later. Regards, -- Shlomi Fish


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.