Skip Menu |
 

This queue is for tickets about the HTML-Format CPAN distribution.

Report information
The Basics
Id: 69426
Status: open
Priority: 0/
Queue: HTML-Format

People
Owner: Nobody in particular
Requestors: jik [...] kamens.brookline.ma.us
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 2.05
Fixed in: (no value)



Subject: ’ in HTML input yields garbage character in PostScript output
Download (untitled) / with headers
text/plain 471b
Test script: Show quoted text
---cut here--- #!/usr/bin/perl use HTML::TreeBuilder; use HTML::FormatPS; $html = "<html><body>it&rsquo;s an apostrophe</body></html>"; $tree = HTML::TreeBuilder->new_from_content($html); $formatter = HTML::FormatPS->new(); $ps = $formatter->format($tree); binmode STDOUT; print $ps;
---cut here--- Redirect the output of the script to test.ps and then view test.ps and you'll see that there's a garbage character where the apostrophe is supposed to be.
Download (untitled) / with headers
text/plain 121b
This should be fixed in 2.08 See https://github.com/nigelm/html- format/commit/58fc839da0a0102d80c43acc1376347c7e56153e
Subject: Re: [rt.cpan.org #69426] &rsquo; in HTML input yields garbage character in PostScript output
Date: Wed, 13 Jul 2011 17:18:46 -0400
To: bug-HTML-Format [...] rt.cpan.org
From: Jonathan Kamens <jik [...] kamens.us>
Download (untitled) / with headers
text/plain 236b
You fixed &rsquo;, but it looks like you didn't fix &rdquo; or &ldquo;, and I don't know whether you fixed &rdquo;. Is it possible to do a more comprehensive fix that covers all the HTML entities that could cause problems? Thanks.
Download smime.p7s
application/pkcs7-signature 3.8k

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #69426] &rsquo; in HTML input yields garbage character in PostScript output
Date: Wed, 13 Jul 2011 17:19:33 -0400
To: bug-HTML-Format [...] rt.cpan.org
From: Jonathan Kamens <jik [...] kamens.us>
Sorry, I meant to say I don't know whether you fixed &lsquo;
Download smime.p7s
application/pkcs7-signature 3.8k

Message body not shown because it is not plain text.

On Wed Jul 13 17:19:43 2011, jik@kamens.us wrote: Show quoted text
> Sorry, I meant to say I don't know whether you fixed &lsquo;
&lsquo; is fixed in 2.08 The double quote sets cannot be fixed without just mapping both open/close (right/left) quote sets to &quot; which would have people screaming about that too. The postcript is using latin1 encoding. If you look at the latin1 character set - http://www.utoronto.ca/web/HTMLdocs/NewHTML/iso_table.html - you will see that there is only one double quote character. So to make this work correctly we would have to either:- change the postscript encoding (along with the embedded code font encoding vector) use a hacked latin1 encoding with 2 glyths replaced with double quote chars special case the double quote chars so the string is rendered differently any of these is a bit of a hack (best one is just making it handle unicode throughout - but thats a ton of work and would mean a huge boilerplate encoding vector). Alternative solutions welcome, but I don't think there is a reasonable fix.
Subject: Re: [rt.cpan.org #69426] &rsquo; in HTML input yields garbage character in PostScript output
Date: Thu, 14 Jul 2011 13:49:25 -0400
To: bug-HTML-Format [...] rt.cpan.org
From: Jonathan Kamens <jik [...] kamens.us>
Download (untitled) / with headers
text/plain 129b
Any of the options you listed is better than what happens now, which is that &ldquo; and &rdquo; show up as garbage characters.
Download smime.p7s
application/pkcs7-signature 3.8k

Message body not shown because it is not plain text.

Download (untitled) / with headers
text/plain 644b
On Thu Jul 14 13:49:47 2011, jik@kamens.us wrote: Show quoted text
> Any of the options you listed is better than what happens now, which is > that &ldquo; and &rdquo; show up as garbage characters.
The unmappable characters should now be replaced by ? chars - the Encode to latin1 should do that. However have changed all the double quote code points to map to " which is wrong, but the best that can be done without significant re-architecting. Would love someone to do the work of reimplementing the whole thing into unicode throughout but I took this on as a basic maintainer, and do not intend to get into serious rewrite work. 2.09 has just uploaded


This service is sponsored and maintained by Best Practical Solutions and runs on Perl.org infrastructure.

Please report any issues with rt.cpan.org to rt-cpan-admin@bestpractical.com.