Remove byte order mark (bom)
I have been doing a lot of localization work. Part of this includes a bunch of SQL scripts encoded using UTF-8. Every now and then, I would get a syntax error on the very first line of the script. The culprit – the byte order mark. Some programs insert this string of characters into the file, others don’t.
Under Linux, it is easy enough to find out if there is a byte order mark in a file. I used the following command on the SQL files in question:
> file filename.sql
filename.sql: UTF-8 Unicode (with BOM) text, with CRLF line terminators
A file without the byte order mark gave me the following:
> file filename.sql
filename.sql: UTF-8 Unicode text, with CRLF line terminators
I wanted a simple way to remove the byte order mark. I found a newsgroup post where Benjamin A’Lee provided a little perl script to do the job. I named the script removeBOM.pl:
#!/usr/bin/perl
@file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(@file);
This little script works like a champ.
1 Comment