help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Encoding help


From: B. T. Raven
Subject: Encoding help
Date: Mon, 01 Jun 2009 11:51:13 -0500
User-agent: Thunderbird 2.0.0.21 (Windows/20090302)

I have a file created by saving a pdf as text and I want to convert the whole thing to utf-8 encoding. If I force the encoding for save in Emacs 23.0 to utf-8 I get the following in a *Warning* buffer:

These default coding systems were tried to encode text
in the buffer `span.txt':
  (utf-8-dos (122 . 4194285) (165 . 4194257) (204 . 4194285) (253
  . 4194257) (292 . 4194285) (372 . 4194289) (410 . 4194285) (418
  . 4194285) (653 . 4194217) (689 . 4194285) (731 . 4194285))
  (iso-latin-1-dos (122 . 4194285) (165 . 4194257) (204 . 4194285)
  (253 . 4194257) (292 . 4194285) (372 . 4194289) (410 . 4194285) (418
  . 4194285) (653 . 4194217) (689 . 4194285) (731 . 4194285))
However, each of them encountered characters it couldn't encode:

[Below are many dozens of \xxx octal escape sequences]

  utf-8-dos cannot encode these:                     ...
  iso-latin-1-dos cannot encode these:                     ...

The original pdf shows many standard diacritics for Romance languages along with a few vowels with macrons. There is no option in Adobe Reader for saving as encoded text. If my only option is to Search and Replace these escape sequences with Unicode characters, how can I get a list of all these bad characters (they all show in red in Emacs 23 anyway). Has any of you written routines to replace things like these using a list of dotted pairs or something similar?


Thanks,

Ed


reply via email to

[Prev in Thread] Current Thread [Next in Thread]