bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6283: doc/lispref/searching.texi reference to octal code `0377' corr


From: MON KEY
Subject: bug#6283: doc/lispref/searching.texi reference to octal code `0377' correct?
Date: Fri, 28 May 2010 19:20:18 -0400

On Fri, May 28, 2010 at 3:15 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> Sorry, I don't see the relevance.  The manual talks about the
> _numeric_ code of characters, not about their read syntax.

I must be misunderstanding something.
What is the numeric code of \255 ?

> It uses "octal 0377" to present values because octal notation of
> single-byte characters is something many people are familiar with,

Where is this convention detailed/discussed in the manual?
I don't find it mentioned in the (info "(elisp)Conventions").

Should it be, esp. as 0377 is not a representation exposed by the
Emacs user level interface (at least none that that I'm aware of).

> After all, that is the codepoint of the character.

Of which character?

0377 doesn't have a character that I'm aware of.

> This is explained in "Non-ASCII Characters".  But we generally try not


But this is my point, that section (being the most relevant to
Non-ASCII notation) tends to use the #<Radian> notation.

> to advertise this issue too much, because there should be no good
> reason for a Lisp program to create raw bytes.  Emacs is a text
> editor, while raw bytes are not text

Thats just silly. Emacs accomodates noodling w/ raw-bytes because it
is neccesary to edit them on occasion. Heck, Emacs w32 distributes
with a dedicated executable just to edit binary data in hexadecimal
form.

>> whenever I need to manually revert some raw-bytes or improperly
>> encoded bit-rotted text using regexps.
>
> It's hard to believe Emacs couldn't handle any such text in some other
> way.

It generally can. However, sometimes file encodings get out of whack
over time and once they are more than a generation away from
rightedness Emacs isn't always able to revert them.

The good thing is Emacs can do this and I'm very glad it does :)

Besides, its my prerogative how I choose to abuse Emacs into abusing
my data.

> What "improper encoding" was that which Emacs couldn't handle?

The "mixed bag encoding". Not all of my files origniated in Emacs. Not
all of them get read into an Emacs buffer without problems.

GIGO c'est la vie.

FWIW I have entire SQL databases multi-lingual multi-encoding data
that was improperly uploaded into them via a misconfigured PHP script
with a funky encoding declartion which itself got its input from a
certain legacy proprietary w32 web-browser that understood (read
willfully mis-interpreted) UTF-8 according to its own whims and I can
assure you that encodings don't translate perfectly nor are the
mis-translations always easily caught or corrected.

Stuff like this can sometimes happen with system locales too.
Transitioning files from vfat will clobber file names too if your not carefull.

Sometimes I need to find the raw-bytes and replace them with their
character equivalent.

> Could it be that you simply gave up too early and tried to solve the
> problem by treating text as bytes, while it really wasn't?

Nope. I'm usually pretty good about _not_ approaching these problems
with this type of hammer unless it is a last resort.

--
/s_P\





reply via email to

[Prev in Thread] Current Thread [Next in Thread]