help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Making re-search-forward search for \377


From: Tyler Spivey
Subject: Re: Making re-search-forward search for \377
Date: Sun, 02 Nov 2008 20:54:52 -0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux)

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Tyler Spivey <tspivey@pcdesk.net>
>> Date: Sun, 02 Nov 2008 01:12:10 -0800
>> 
>> I'm probably going to end up working with binary data in a temp
>> buffer. Doing more research, I want enable-multibyte-characters to be
>> off. Given that, if we go to *scratch*
>> and run M-X toggle-enable-multibyte-characters until that variable
>> becomes nil, doing C-Q 377 RET gives 0xff, which is what I want
>> (according to C-x =, C-u C-x = and M-x describe-char). Now to
>> match it, I try:
>> 
>> (re-search-forward "\xff") - no luck
>> 
>> What did you use to figure out that the multibyte version of that
>> character was 0x00FF? I found it out accidentally as a lisp error, but
>> none of the previously described commands (C-X =, M-X describe-char or
>> C-u C-x =) will show that it is 0x00ff, they just show FF.
>
> Why are you trying to use re-search-forward with octal codes such as
> \377?  What are you trying to do? does the buffer you are searching
> hold human-readable text or does it hold binary data, i.e. raw bytes?
>
> In the former case, you need to use characters in the search string,
> not literal codes like \377 or xff, and the buffer should be in the
> (default) multibyte mode.  \377 is not a character code, as far as
> Emacs is concerned, it's an encoding of some character.  Do _not_ make
> a mistake of turning enable-multibyte-characters off and using raw
> bytes such as \377 for searching normal text, that way lies madness.

I think this is partially a problem with emacs, and partially a problem
with what I'm trying to do, or my understanding of regex. I posted to 
emacs-devel, maybe someone there
might know more. What I'm trying to do is split text up for use in a mud
client, based on the following re:
"\\(\377[\371\357]\\)\\|\\(\n\\)"
the encoding of the process is raw-text-unix.
manually running M-: (re-search-forward "\\(\377[\371\357]\\)") fails,
but
running M-: (re-search-forward "\377\371") works fine. However, I want
it to match
the longer re stated above, but running re-search on that just matches
the newlines.

This is mostly text, with telnet control characters thrown in that I
want to use as delimiters of a sort and process on them, while deleting
them from the text. Using a re-search would be perfect for this if I
could figure out how to do it.

In reading section 2.3.8.2 of the manual, we get this:
   You can represent a unibyte non-ASCII character with its character
code, which must be in the range from 128 (0200 octal) to 255 (0377
octal).  If you write all such character codes in octal and the string
contains no other characters forcing it to be multibyte, this produces
a unibyte string.  However, using any hex escape in a string (even for
an ASCII character) forces the string to be multibyte.

I've left enable-multibyte-characters alone, but even searching for
"[\377]\371" fails, while "\377\371" succeeds.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]