help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Understanding Word Boundaries


From: Xah Lee
Subject: Re: Understanding Word Boundaries
Date: Wed, 08 Dec 2010 15:14:48 -0000
User-agent: G2/1.0

On Jun 16, 3:44 am, Paul Drummond <paul.drumm...@iode.co.uk> wrote:
> I have been an Emacs users for a few years now so definitely still a
> newbie!  While initially I struggled to control its power, I eventually came
> round.  Every issue I've had so far I've been able to fix by a quick search
> in EmacsWiki, except for one frustrating and re-occurring problem that has
> plagued me for years - word boundaries.
>
> Before Emacs I used Vim exclusively and the word boundary behaviour in Vim
> *just worked* - I didn't even have to think about it.  No matter what
> language I used I could navigate and manipulate words without thinking about
> it.  The way word boundaries work in Vim is elegant and I have spent a lot
> of time trying to find some elisp to replicate the behaviour in Emacs but to
> no avail.
>
> I could write some elisp myself but I am still very new to it so it will
> take a while - it's something I would like to do but I don't have time at
> the moment.  Regardless, an elisp solution to the problem is not the point
> of this post.  I want to understand why word boundaries behave the way they
> do in Vanilla Emacs and I would greatly appropriate some views on this from
> some Emacs Gurus!
>
> Every time I notice the word boundary behaviour when hacking in Emacs I
> wonder to myself - "I must be missing something here.  Surely, experienced
> Emacs users don't just *put up* with this!  Yet every forum response, blog
> post, mailing-list post I have read suggests they do.  This is atypical of
> the Emacs community in my experience.  Usually when something behaves wrong
> in Emacs, it's easy to find some elisp that just fixes the problem full
> stop.  Yet with word-boundaries all I can find is suggestions that fix a
> particular gripe but nothing that provides a general solution.
>
> I have loads of examples but I will mentioned just a few here to hopefully
> kick-start further discussion.
>
> ** Example 1
>
> I use org-mode for my journal and today I hit the word-boundary problem
> while entering my morning journal entry - here's a contrived example of what
> I entered:
>
> ** [10:27] Understanding Word Boundaries in Emacs
>                                    ^
> With point at the end of the word "Understanding" I hit C-w (which I bind to
> backward-kill-word) and the word "Understanding" is killed as expected.  But
> when I hit C-w again, the point kills to the colon.  Why?  Why is colon a
> word-boundary but the closing square bracket isn't?
>
> ** Example 2
>
> When editing C++ files I often need to delete the "ClassName::" part when
> declaring functions in the header:
>
> void ClassName::function();
>        ^
>
> With point at the start of ClassName I want to press M-d twice to delete
> ClassName and :: but "::" isn't recognised as a word.  In Vim I just type
> "dw" twice and it *just works*.
>
> ** Example 3
>
> I have loads of problems when deleting and navigating words over multiple
> lines.  In the following C++ code for instance:
>
>     Page *page = new _Page(this);
>     page.load();
>            ^
>
> When point is after "page", before the dot on the second line and I hit M-b
> (backward-word) point ends up at the first opening bracket of "Page(" !!!
>
> Again, vim does the right thing here - pressing 'b' takes the point to the
> closing bracket of Page(this) so it doesn't recognise the semi-colon as a
> bracket which is intuitive and what I would expect.  This is really the
> point I am trying to make.  I have never taken the time to understand the
> behaviour of word boundaries in Vim because *it just works*.  In Emacs I am
> forced to think about word boundaries because Emacs keeps surprising me with
> its weird behaviour!
>
> Note: My examples happen to be C++ but I use lots of other languages too
> including elisp, Clojure, JavaScript, Python and Java and the
> word-boundaries seem to be wrong for all of them.
>
> I have tried several different elisp solutions but each one has at least one
> feature that isn't quite right.  Here are some links I kept, I've tried many
> other solutions but don't have the links to hand:
>
> http://stackoverflow.com/questions/2078855/about-the-forward-and-back...http://stackoverflow.com/questions/1771102/changing-emacs-forward-wor...
>
> So to wrap up, the point of this post is to kick-start a discussion about
> why the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.1
> in my case) seem to be so awkward and unintuitive.
>
> Regards,
> Paul Drummond


Good point.

I remember i felt something similar some 5 or 7 years ago and was
annoyed. But now i can't remember any detail... i just got used to
emacs and can't say i find it being problem at all.

actually, i think point is a valid one and a bit technically involved
in detail.

i'll have to study this in detail some other day but here's some
points.

For testing, save a file with this line as content:
something in the water does not compute

Now, you can try the word movement in different editors.

I tested this on Notepad, Notepad++, vim, emacs, Mac's TextEdit.

In short, different text editors all have a bit different behavior.
Here, Notepad, Notepad++, vim have the same behavior, while emacs and
TextEdit have similar behavior.

In Notepad, Notepad++, vim, the cursor always ends at the beginning of
each word.

In emacs and TextEdit, they end in the beginning of the word if you
are using backward-word, but ends at the end of the word if you are
using forward-word.

That's the first major difference.

--------------------------------------------------
Now, try this line:

something !! in @@ the ## water $$ does %% not ^^ compute

Now, vim and Notepad++ 's behavior are identical. Their behavior is
pretty simple and like before. They simply put the cursor at the
beginning of each string sequence, doesn't matter what the characters
are. Notepad is similar, except that it moves into between %%.

emacs and TextEdit behaved similarly.
Emacs will skip the symbol clusters entirely, except %%. (this depends
on what mode you are in)
TextEdit will also stop in middle of $$ and ^^, otherwise skip the
other symbols clusters entirely.

So, from this, it is clear that different editors has different
concepts of syntax group, or not such concept at all.

I understand well the emacs case. Emacs has a syntax table concept,
that groups certain chars into a classes of “whitespace”, “word”,
“symbol”, “punctuation”, ...etc. When you use backward-word, it simply
move untill it reaches a char that's not in the “word” group. So,
depending on which mode you are in, it'll either skip a character
sequence of identical chars entirely, or stop at their boundary. And
if the char sequence is of different symbols such as !@#$%&*() then
emacs may go into middle of them.

The question is whether other editors has syntax group notion, or that
their word movement behavior depends on the language mode at all.

--------------------------------------------------

Now, the interesting question is which model is more efficient for
general everyday coding of different languages.

First question is: is it more efficient in general for forward/
backward word motions to always land in front of the word as in vim,
Notepad, Notepad++ ?

Certainly i think it is more intuitive that way. But otherwise i don'
tknow. I'll have to do research on this some day.

The second question is whether it is good to have the movement
dependant on the language mode. Again i don't know.

Though, i do find emacs syntax table annoying from my experience of
working with it a bit in the past few years... from the little i know,
i felt that it doesn't do much, its power to model syntax is quite
weak, and very complicated to use... but i don't know for sure.

Btw, one of your example, this one:

Page *page = new _Page(this);
page.load();

i cannot duplicate.

  Xah
∑ http://xahlee.org/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]