emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: paragraphs.el: do forward-sentence and friends not work?


From: David Reitter
Subject: Re: paragraphs.el: do forward-sentence and friends not work?
Date: Thu, 14 Feb 2008 09:45:01 +0000

On 14 Feb 2008, at 04:42, Richard Stallman wrote:

Using two spaces after end of sentence enables Emacs to distinguish
between periods that end sentences and periods for abbreviations.
That is why it should be the default.


We can improve this to make it work without depending on the double- space.

Sentence tokenization is a known problem. You can throw machine learning algorithms at it, but that's not a viable option in our case. However, Grefenstette&Tapanainen (1994) examined this in detail for English, using the Brown corpus. They basically say that using a small lexicon of common abbreviations, they can classify 99.1% of all periods correctly. Even without the lexicon, you can achieve 97.7% accuracy (on English) using the right regular expressions, and I think this will be similar for other languages as well. I think that's good enough for M-e and M-a.

http://citeseer.ist.psu.edu/grefenstette94what.html






reply via email to

[Prev in Thread] Current Thread [Next in Thread]