|Date:||Thu, 20 Jan 2005 10:55:40 +0100|
|From:||Anders Ardo <anders [...] it.lth.se>|
|To:||bug-html-tidy [...] rt.cpan.org|
|CC:||anders.ardo [...] it.lth.se|
|Subject:||Loading of tidy config files / small patch|
Hi Andy Lester, I'm using your HTML::Tidy with success - thanks! It's used to clean HTML files inside a focused Web-crawler. In this context it would be extremely handy to be able to influence the output from Tidy with some of it's many configuration options. So here is a small patch that implements that. Could you please have a look at it and see if it merits inclusion in the distribution? Thanks. The approach taken is to provide the configuration filename as a parameter to the new() method and then use it in calls to the internal _tidy_clean procedure. An alternative would ofcourse to have a new method to more explicitly set the config-file name. The patch passes your tests and my requirements, although I haven't tested it extensively or added a test to the 'make test' section. The other small change I've made is to add a "\n" to the end of the HTML string to be cleaned. It turned out that in a few cases tidy produced incomplete output (which is dissatrous in my application). If you clean the included t.html it ends with a '<p>' instead of '</body></html>' as it should. Adding "\n" to the end of the HTML string fixes that. t.pl is a small test script, usage: ./t.pl < t.html tidy.cfg is a Tidy configuration file used by t.pl Please let me know if there is anything else I can do to get this patch into the distribution. Cheers Anders -- Anders Ardö Department of Information Technology, Lund Institute of Technology Tel: +46 46 2227522 ; URL:
Message body not shown because it is not plain text.