Re: Understanding Word Boundaries

I have been an Emacs users for a few years now so definitely still a newbie! While initially I struggled to control its power, I eventually came round. Every issue I've had so far I've been able to fix by a quick search in EmacsWiki, except for one frustrating and re-occurring problem that has plagued me for years - word boundaries.

Before Emacs I used Vim exclusively and the word boundary behaviour in Vim *just worked* - I didn't even have to think about it. No matter what language I used I could navigate and manipulate words without thinking about it. The way word boundaries work in Vim is elegant and I have spent a lot of time trying to find some elisp to replicate the behaviour in Emacs but to no avail.

I could write some elisp myself but I am still very new to it so it will take a while - it's something I would like to do but I don't have time at the moment. Regardless, an elisp solution to the problem is not the point of this post. I want to understand why word boundaries behave the way they do in Vanilla Emacs and I would greatly appropriate some views on this from some Emacs Gurus!

Every time I notice the word boundary behaviour when hacking in Emacs I wonder to myself - "I must be missing something here. Surely, experienced Emacs users don't just *put up* with this! Yet every forum response, blog post, mailing-list post I have read suggests they do. This is atypical of the Emacs community in my experience. Usually when something behaves wrong in Emacs, it's easy to find some elisp that just fixes the problem full stop. Yet with word-boundaries all I can find is suggestions that fix a particular gripe but nothing that provides a general solution.

I have loads of examples but I will mentioned just a few here to hopefully kick-start further discussion.

** Example 1

I use org-mode for my journal and today I hit the word-boundary problem while entering my morning journal entry - here's a contrived example of what I entered:

** [10:27] Understanding Word Boundaries in Emacs
                                   ^
With point at the end of the word "Understanding" I hit C-w (which I bind to backward-kill-word) and the word "Understanding" is killed as expected. But when I hit C-w again, the point kills to the colon. Why? Why is colon a word-boundary but the closing square bracket isn't?

** Example 2

When editing C++ files I often need to delete the "ClassName::" part when declaring functions in the header:

void ClassName::function();
       ^

With point at the start of ClassName I want to press M-d twice to delete ClassName and :: but "::" isn't recognised as a word. In Vim I just type "dw" twice and it *just works*.

** Example 3

I have loads of problems when deleting and navigating words over multiple lines. In the following C++ code for instance:

    Page *page = new _Page(this);
    page.load();
           ^

When point is after "page", before the dot on the second line and I hit M-b (backward-word) point ends up at the first opening bracket of "Page(" !!!

Again, vim does the right thing here - pressing 'b' takes the point to the closing bracket of Page(this) so it doesn't recognise the semi-colon as a bracket which is intuitive and what I would expect. This is really the point I am trying to make. I have never taken the time to understand the behaviour of word boundaries in Vim because *it just works*. In Emacs I am forced to think about word boundaries because Emacs keeps surprising me with its weird behaviour!

Note: My examples happen to be C++ but I use lots of other languages too including elisp, Clojure, _javascript_, Python and Java and the word-boundaries seem to be wrong for all of them.

I have tried several different elisp solutions but each one has at least one feature that isn't quite right. Here are some links I kept, I've tried many other solutions but don't have the links to hand:

http://stackoverflow.com/questions/2078855/about-the-forward-and-backward-a-word-behaviour-in-emacs
http://stackoverflow.com/questions/1771102/changing-emacs-forward-word-behaviour/1772365#1772365

So to wrap up, the point of this post is to kick-start a discussion about why the word boundaries in Vanilla Emacs (specifically GNU Emacs 23.1.50.1 in my case) seem to be so awkward and unintuitive.

Regards,
Paul Drummond

From:	Karan Bathla
Subject:	Re: Understanding Word Boundaries
Date:	Wed, 16 Jun 2010 13:07:01 -0700 (PDT)