Matching programming language identifiers, not "words"

emacs-pretest-bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Matching programming language identifiers, not "words"

From:	Tim Van Holder
Subject:	Matching programming language identifiers, not "words"
Date:	Tue, 18 May 2004 09:26:45 +0200
User-agent:	Mozilla Thunderbird 0.6 (Windows/20040502)

I originally submitted the following as a bug report to the cc-modemaintainer:

Currently (CVS emacs, daily builds at approx. 8am CET), cc-mode does not
consider underscores to be word components.  This never gave me problems,
in fact it was occasionally useful that forward-word and backward-word
jumped to 'words' within identifiers.  However, for a programming language,
identifiers ARE the words, so underscores need to have word syntax ('_' by
itself is even a valid identifier).
In particular, a replace-regexp of \<long\> (e.g. by int32_t) should
replace all occurrences of the word "long", and nothing else.  This means
that, say, "long_answer" should be left alone.  Currently this is not the
case, which made such replacements in large files a much bigger pain than it
should have been.


However, as he pointed out, the current cc-mode behaviour is completely in
line with what the emacs lisp manual says.  According to the documentation,
an underscore does indeed belong in the "symbol constituent" syntax class.
Unfortunately, this means that processing symbol names is more of a pain than
it needs to be.  When editing a program source code, I generally care very
little about what is a valid "word constituent" in the natural language sense,
and more about what is a valid identifier.  In fact I can't seem to find a
decent way to say \<foo\> so that symbol constituents are taken into account;
\S_\<foo\>\S_ works, but can hardly be called convenient, as \( \) would have
to be used to avoid including the leading and trailing non-symbol (and in
addition, this would prevent matches at the start or end of the buffer/string).

So I would ask that either
- a \<\>-like regexp syntax is added for word+symbol grouping
- a (buffer-local?) variable is added that would switch \< \> to word+symbol
  mode; this variable could then be set by programming modes (or customized
  globally).
- some similar option that I didn't consider is added


By the way, I find it odd that, given the multilingual environments emacs
supports, the manual still lists word constituents as "parts of normal English
words", leaving accented characters and the like in an apparent twilight zone.


-- This e-mail was scanned by RAV Antivirus --

[Prev in Thread]

Current Thread

[Next in Thread]

Matching programming language identifiers, not "words", Tim Van Holder <=
- Re: Matching programming language identifiers, not "words", Eli Zaretskii, 2004/05/18
  - Re: Matching programming language identifiers, not "words", Tim Van Holder, 2004/05/18
    - Re: Matching programming language identifiers, not "words", Jim Blandy, 2004/05/18
    - Re: Matching programming language identifiers, not "words", Tim Van Holder, 2004/05/19
    - Re: Matching programming language identifiers, not "words", Stefan Monnier, 2004/05/19
    - Re: Matching programming language identifiers, not "words", Richard Stallman, 2004/05/20
    - Re: Matching programming language identifiers, not "words", Jim Blandy, 2004/05/24
    - Re: Matching programming language identifiers, not "words", Stefan Monnier, 2004/05/28
  - Re: Matching programming language identifiers, not "words", Richard Stallman, 2004/05/18

Prev by Date: Re: Info-last problem
Next by Date: Re: Matching programming language identifiers, not "words"
Previous by thread: Info-last problem
Next by thread: Re: Matching programming language identifiers, not "words"
Index(es):
- Date
- Thread