help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Finding end of sentence[ was Re: Understanding ... Sentence Boundaries]


From: ken
Subject: Finding end of sentence[ was Re: Understanding ... Sentence Boundaries]
Date: Wed, 12 Dec 2012 09:32:31 -0500
User-agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.11) Gecko/20121120 Thunderbird/10.0.11

On 12/12/2012 02:02 AM Eric Abrahamsen wrote:
ken<gebser@mousecar.com>  writes:

On 12/11/2012 07:03 AM Eric Abrahamsen wrote:
ken<gebser@mousecar.com>   writes:

On 06/26/2010 11:05 PM Deniz Dogan wrote:
2010/6/27 ken<gebser@mousecar.com>:

On 06/26/2010 06:53 AM Paul Drummond wrote:
Thanks for the responses guys.

....

Is it possible to specify word boundaries for a particular mode?


Yes, it's part of the syntax table. See e.g. `modify-syntax-entry'.

Thanks for the pointer to that function.

The behavior I see in need of repair is the role of so-called "comments"
in sentence syntax.</tag>    For instance, immediately before this
sentence are two spaces... which should signify the end of the
previous sentence.  But functions like "forward-sentence" and
"fill-paragraph" and "backward-sentence" don't recognize it.

Said another way, the "</tag>" string obscures the relationship
between the period before it and the two spaces after it and so fails
to see that one sentence ends and another starts.  This occurs in
text-mode and seems to be inherited by other modes.

If I'm reading "modify-syntax-entry" correctly, the default meanings
of '<' and'>' are, respectively, beginning and end of comment, so
modifying them wouldn't fix this problem.  Or can this be remedied by
a change in the syntax table?  Or is this a bug?

For this particular case, I think you can modify the value of the
`sentence-end' variable (which is returned by the `sentence-end'
function? The whole thing is a little confusing). You'd probably be best
off starting with the docstring for the sentence-end function, and
working back from there.

I think the `sentence-end' variable is automatically buffer-local, which
means if you change it in a mode-hook it ought to work the way you want.
I agree that the whole syntax thing feels like a very well-polished
hack.

E

Eric,

Yes, that would be the variable to adjust.  I took a hard look at it
and discussed it (I believe) on this list years ago, but never came up
with a fix.  As I see it, there are two problems:

First, "one" of the items in that RE would need to be "zero or more
consecutive instances of '<' followed by any number of other
characters up until the next '>' is found."  E.g., the RE would need
to be able to find the end of this
sentence</b></i>.)</q></p></span></div>   Though I've used REs
successfully in quite a few instances and so with a small bit of help
could probably figure that part out, there's a second issue.


[In my original post the paragraph below was unclear.  So changed it.]

My considered opinion is that in the above and similar examples, the
end of the sentence is immediately after the period ('.')... or
question mark, exclamation mark, etc. and not after the</div>.  That
is where the point should go when forward-sentence is executed.  This
means that no RE would work because, once it finds the RE-defined
sentence-end, it then needs to go backwards within the found string
until it encounters [.!?]+ and then move the mark one char forward to the
character after.  IOW, unless I'm missing some capability of REs,
"sentence-end" needs to be a function rather than an RE and would be a
different function than one which finds the beginning of a sentence.

I'm getting way out of my depth here, both regarding regexps and emacs'
sentence-related shenanigans, but you could consider advising the
`sentence-end' function so that it checks current the major mode, and
delegates to a different sentence-end function depending on the mode (or
declines to handle and bails to the built-in sentence-end).

The individual mode-specific sentence-end functions look at the text
after point, and return a different regexp every time, one specifically
tailored to this particular sentence in this particular mode. The call to
`forward-sentence' or whatever happily uses a different regexp every
time it is called.

Feels hacky, but I guess `sentence-end' is already doing this in a
sense -- potentially returning a different regexp every time.

My brain is exhausted!

E

If one were to write a mode-specific replacement for the existing "forward-sentence" and "sentence-end", what are some ways in elisp to ensure that they're invoked when working in that mode? Would it be enough to include (the recoded) "forward-sentence" and "sentence-end" in the code for that mode...? or would some kind of specific hook language need to be included in ~/.emacs?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]