Skip to content

Convert from MacRoman to ISO 8859-1 with Perl

I was pleased to find that the perl Encode module makes it fairly straightforward to convert between 8-bit character sets. The documentation isn’t as clear as one might like, so here’s an example.

<p>Say you have some Mac data and need to display it on the web, typically in ISO-8859-1 format.  Pump it through this subroutine:</p>
<p><code>sub convertFromMacToISO8859 {<br />
my $line = shift;</p>
<p>    use Encode;<br />
my $utf8 = decode("MacRoman", $line, );<br />
my $iso = encode("iso-8859-1,$utf8, Encode::FB_HTMLCREF);</p>
<p>    return $iso;<br />


<p>and you’ll see those gremlins go away.  The odd-looking Encode::FB_HTMLCREF constant at the end of the encode and decode functions serve to put the error checker into a mode where items which don’t exist in the target character set will be encoded as their HTML/Unicode equivalents.  Unicode is a superset of both MacRoman and ISO-88591-1 (Latin 1)  but neither MacRoman or Latin-1 are a superset of each other - most visible are the punctuation characters missing in Latin-1 like en and em dashes and ’smart quotes’.