Re: Understanding Word and Sentence Boundaries

help-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Understanding Word and Sentence Boundaries

From:	Eric Abrahamsen
Subject:	Re: Understanding Word and Sentence Boundaries
Date:	Wed, 12 Dec 2012 15:02:03 +0800
User-agent:	Gnus/5.130006 (Ma Gnus v0.6) Emacs/24.2 (gnu/linux)

ken <gebser@mousecar.com> writes:

> On 12/11/2012 07:03 AM Eric Abrahamsen wrote:
>> ken<gebser@mousecar.com>  writes:
>>
>>> On 06/26/2010 11:05 PM Deniz Dogan wrote:
>>>> 2010/6/27 ken<gebser@mousecar.com>:
>>>>>
>>>>> On 06/26/2010 06:53 AM Paul Drummond wrote:
>>>>>> Thanks for the responses guys.
>>>>>>
>>>>>> ....
>>>>>>
>>>>>> I presume that Emacs hackers either a) put up with it or b) spend a lot
>>>>>> of time fixing each case until they are happy.
>>>>>>
>>>>>> I suspect the answer is b. ;-)
>>>>>>
>>>>>> I wish there was a single minor-mode that fixes all the word boundary
>>>>>> issues for every major-mode I use!  I can but dream.   Or maybe I will
>>>>>> get round to doing it myself one day!  ;)
>>>>>>
>>>>>> Cheers,
>>>>>> Paul Drummond
>>>>>
>>>>>
>>>>> Is it possible to specify word boundaries for a particular mode?
>>>>>
>>>>
>>>> Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.
>>>
>>> Thanks for the pointer to that function.
>>>
>>> The behavior I see in need of repair is the role of so-called "comments"
>>> in sentence syntax.</tag>   For instance, immediately before this
>>> sentence are two spaces... which should signify the end of the
>>> previous sentence.  But functions like "forward-sentence" and
>>> "fill-paragraph" and "backward-sentence" don't recognize it.
>>>
>>> Said another way, the "</tag>" string obscures the relationship
>>> between the period before it and the two spaces after it and so fails
>>> to see that one sentence ends and another starts.  This occurs in
>>> text-mode and seems to be inherited by other modes.
>>>
>>> If I'm reading "modify-syntax-entry" correctly, the default meanings
>>> of '<' and'>' are, respectively, beginning and end of comment, so
>>> modifying them wouldn't fix this problem.  Or can this be remedied by
>>> a change in the syntax table?  Or is this a bug?
>>
>> For this particular case, I think you can modify the value of the
>> `sentence-end' variable (which is returned by the `sentence-end'
>> function? The whole thing is a little confusing). You'd probably be best
>> off starting with the docstring for the sentence-end function, and
>> working back from there.
>>
>> I think the `sentence-end' variable is automatically buffer-local, which
>> means if you change it in a mode-hook it ought to work the way you want.
>> I agree that the whole syntax thing feels like a very well-polished
>> hack.
>>
>> E
>
> Eric,
>
> Yes, that would be the variable to adjust.  I took a hard look at it
> and discussed it (I believe) on this list years ago, but never came up
> with a fix.  As I see it, there are two problems:
>
> First, "one" of the items in that RE would need to be "zero or more
> consecutive instances of '<' followed by any number of other
> characters up until the next '>' is found."  E.g., the RE would need
> to be able to find the end of this
> sentence</b></i>.)</q></p></span></div>  Though I've used REs
> successfully in quite a few instances and so with a small bit of help
> could probably figure that part out, there's a second issue.
>
> My considered opinion is that in the above and similar examples, the
> end of the sentence is immediately after the period ('.')... or
> question mark, exclamation mark, etc. and not after the </div>.  That
> is where the point should go when forward-sentence is executed.  This
> means that no RE would work because, once it finds the RE-defined
> sentence-end, it then needs to go backwards within the found string
> until it encounters [.!?]+ and then search forward again to the first
> character after.  IOW, unless I'm missing some capability of REs,
> "sentence-end" needs to be a function rather than an RE and would be a
> different function than one which finds the beginning of a sentence.

I'm getting way out of my depth here, both regarding regexps and emacs'
sentence-related shenanigans, but you could consider advising the
`sentence-end' function so that it checks current the major mode, and
delegates to a different sentence-end function depending on the mode (or
declines to handle and bails to the built-in sentence-end).

The individual mode-specific sentence-end functions look at the text
after point, and return a different regexp every time, one specifically
tailored to this particular sentence in this particular mode. The call to
`forward-sentence' or whatever happily uses a different regexp every
time it is called.

Feels hacky, but I guess `sentence-end' is already doing this in a
sense -- potentially returning a different regexp every time.

My brain is exhausted!

E

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Understanding Word and Sentence Boundaries, ken, 2012/12/11
- Re: Understanding Word and Sentence Boundaries, Eric Abrahamsen, 2012/12/11
  - Re: Understanding Word and Sentence Boundaries, ken, 2012/12/11
    - Re: Understanding Word and Sentence Boundaries, Eric Abrahamsen <=
    - Finding end of sentence[ was Re: Understanding ... Sentence Boundaries], ken, 2012/12/12
    - Re: Finding end of sentence[ was Re: Understanding ... Sentence Boundaries], Eric Abrahamsen, 2012/12/12
    - Re: Finding end of sentence[ was Re: Understanding ... Sentence Boundaries], Eric Abrahamsen, 2012/12/13

Prev by Date: Re: How close is elisp to CL now?
Next by Date: How to bind a command to mouse-1 properly?
Previous by thread: Re: Understanding Word and Sentence Boundaries
Next by thread: Finding end of sentence[ was Re: Understanding ... Sentence Boundaries]
Index(es):
- Date
- Thread