help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "split-sentences"?


From: Tomas Hlavaty
Subject: Re: "split-sentences"?
Date: Sat, 23 Jan 2021 10:07:06 +0100

On Sat 23 Jan 2021 at 09:41, <tomas@tuxteam.de> wrote:
> On Sat, Jan 23, 2021 at 07:38:49AM +0100, moasenwood--- via Users list for 
> the GNU Emacs text editor wrote:
>> Can I parse/split a string into sentences based on
>> human-language punctuation?

not easily

>> Did anyone do that already?

https://www.unicode.org/reports/tr29/#Sentence_Boundaries

Does emacs expose unicode text functions?  For example to classify
characters, determine graphemes, words, sentences, line breaks etc?

>> I mean very mechanically is fine, no linguistics or anything.
>> 
>> So this
>> 
>> "'This sentence is spoken by Mr. W. E. B Dubois, Esq.!' played
>> through amazon.com alexa speakers?"
>>
>> would be
>> 
>> ("'" "This sentence is spoken by Mr" "." "W" "." "E" "." "B
>> Dubois" "," "Esq" "." "!" "'" "played through amazon" "."
>> "com" "alexa "speakers" "?")

That is not really split-sentences.

The example has two sentences.  Moreover the first sentence is a subject
of the second.

This would be represented something like this:

(sentence
  (sentence "This sentence is spoken by Mr. W. E. B Dubois, Esq.!")
  "played through amazon.com alexa speakers?")

but it depends, what do you want to achieve.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]