|Subject:||Perl internal representation and output encoding|
Sorry to bother you again, with a question you have answered hundreds of times before, but each time slightly different. On the output side of $doc->toString, we have a byte-stream. On the Perl side we have strings in "Perl internal representation", which can be either (something close to) latin1 or (something close to) utf-8. As user, you expect it to DWIM: the differences should be invisible. But, when I run the attached script (the string contains three characters in latin1... may get mutilated during transport), then I see that they are not represented by utf-8 bytes in the output... still latin1. This breaks the parser on the other side. Of course, I can utf8::upgrade all the data fields myself. On dozens of spots. And all other XML::LibXML users can do the same. But this does not DWIM, is more error prone, more work and probably slower than calling the upgrade within XML::LibXML. I was under the impression that this did work correctly some time ago, but in my current setting it fails: Perl5.10, libxml2.7.3, XML::LibXML 1.69.
Message body not shown because it is not plain text.