On Sat, Mar 5, 2011 at 11:46 PM, Mike Miller <mbmiller+l at gmail.com> wrote: > On Sat, 5 Mar 2011, Adam Morris wrote: > > Try \x{8a0} instead. I think that \x normally accepts only two following >> characters, so you have to use \x{} for long hexadecimal numbers. >> > > You top posted, so I have to ignore you. > > Just kidding. I did try that and that didn't work either. Then I did > this... > > perl -pe 's/[[:ascii:]]//g ; s/(.)/$1\n/g' file.txt | sort | uniq -c >| > bad_chars.txt > > ...and when I looked at the resulting bad_chars.txt file in emacs again, > the characters looked different. Before they were appearing as purple > rectangles, but now they appeared as a pair of characters that looked like > this: \302\240 > > I could represent them exactly that way in perl and delete them. I don't > really get what was happening there. > I'm guessing you were looking at (possibly variable-length) unicode characters, and your perl filter split them into fixed-length octets or something. -Rob -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mailman.mn-linux.org/pipermail/tclug-list/attachments/20110306/ff69845e/attachment.html>