I was pleased to find that the perl Encode module makes it fairly straightforward to convert between 8-bit character sets. The documentation isn’t as clear as one might like, so here’s an example.
<p>Say you have some Mac data and need to display it on the web, typically in ISO-8859-1 format. Pump it through this subroutine:</p>
<p><code>sub convertFromMacToISO8859 {<br />
my $line = shift;</p>
<p> use Encode;<br />
my $utf8 = decode("MacRoman", $line, );<br />
my $iso = encode("iso-8859-1,$utf8, Encode::FB_HTMLCREF);</p>
<p> return $iso;<br />
}
<p>and you’ll see those gremlins go away. The odd-looking Encode::FB_HTMLCREF constant at the end of the encode and decode functions serve to put the error checker into a mode where items which don’t exist in the target character set will be encoded as their HTML/Unicode equivalents. Unicode is a superset of both MacRoman and ISO-88591-1 (Latin 1) but neither MacRoman or Latin-1 are a superset of each other - most visible are the punctuation characters missing in Latin-1 like en and em dashes and ’smart quotes’.