Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the HTML-Tidy CPAN distribution.

Report information
The Basics
Id:
5548
Status:
resolved
Priority:
Low/Low
Queue:

People
Owner:
Nobody in particular
Requestors:
ben [...] sixapart.com
Cc:
AdminCc:

BugTracker
Severity:
(no value)
Broken in:
(no value)
Fixed in:
(no value)



To: Andy Lester <andy@petdance.com>
From: Benjamin Trott <ben@sixapart.com>
Subject: HTML::Tidy -- Patch for a "clean" method
Date: Tue, 2 Mar 2004 17:51:52 -0800
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Andy, Attached is a patch against 1.01_01 to provide a "clean" method to HTML::Tidy. If you're interested in providing HTML cleaning mechanisms through HTML::Tidy, this might be a useful start. I think it could take some thinking into how it should actually work--currently, for example, this: <a href="http://www.example.com/"><em>This is a test.</a> turns into this: <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <meta name="generator" content= "HTML Tidy for Linux/x86 (vers 1st March 2004), see www.w3.org"> <title></title> </head> <body> <a href="http://www.example.com/"><em>This is a test.</em></a> </body> </html> In other words, tidy encapsulates it in a full HTML document format. I don't know if this is what the caller would expect. It's possible that "clean" should get the string back into the minimal form that it was originally given, but I fear that may be a rather difficult task, and that callers should just be instructed to provide an entire HTML document, rather than a fragment. Anyway, just some thoughts. :) Ben - --- HTML-Tidy-1.01_01/lib/HTML/Tidy.pm 2004-02-29 09:41:29.000000000 - -0800 +++ HTML-Tidy-1.01_01-new/lib/HTML/Tidy.pm 2004-03-02 17:44:38.000000000 -0800 @@ -198,6 +198,20 @@ return !$parse_errors; } +=head2 clean( $str [, $str...] ) + +Cleans a string, or list of strings, that make up a single HTML file. + +Returns true if all went OK, or false if there was some problem calling +tidy, or parsing tidy's output. + +=cut + +sub clean { + my $self = shift; + _tidy_clean(join( "", @_ )); +} + # Tells whether a given message object is one that we should keep. sub _is_keeper { - --- HTML-Tidy-1.01_01/Tidy.xs 2004-02-29 09:37:40.000000000 -0800 +++ HTML-Tidy-1.01_01-new/Tidy.xs 2004-03-02 17:44:27.000000000 -0800 @@ -37,3 +37,32 @@ OUTPUT: RETVAL +SV * +_tidy_clean(input) + INPUT: + char *input + CODE: + TidyBuffer errbuf = {0}; + TidyDoc tdoc = tidyCreate(); // Initialize "document" + TidyBuffer output = {0}; + + int rc; + + rc = tidySetErrorBuffer( tdoc, &errbuf ); // Capture diagnostics + if ( rc >= 0 ) + rc = tidyParseString( tdoc, input ); // Parse the input + if (tidyCleanAndRepair(tdoc) >= 0) { + tidySaveBuffer(tdoc, &output); + char *str = (char *)output.bp; + RETVAL = newSVpvn( str, strlen(str) ); + tidyBufFree( &output ); + } else { + XSRETURN_UNDEF; + } + + tidyBufFree( &errbuf ); + tidyRelease( tdoc ); + + OUTPUT: + RETVAL + -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (Darwin) iD8DBQFARTo4zGeEk2uv818RAhR3AJ9Ui4DQQ0stQBJOg23fLrITNQr+bwCfUG9O FiYz/5qK0k8MjkvIh1PCY1w= =pIm5 -----END PGP SIGNATURE-----
Done. Will be in 1.02. Thanks, Ben.


This service runs on Request Tracker, is sponsored by The Perl Foundation, and maintained by Best Practical Solutions.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.