emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Beginingless paragraphs


From: Alan Mackenzie
Subject: Re: Beginingless paragraphs
Date: Sat, 3 Sep 2005 12:26:23 +0000 (GMT)

Hi, Emacs!

On Fri, 2 Sep 2005, Richard M. Stallman wrote:

>    For example, I was tearing my hair out in frustration a couple of years
>    back, trying to get the sentence/paragraph movement and filling stuff to
>    work properly in CC Mode.

>If the documentation of paragraph-start and paragraph-separate is not
>clear enough, we can clarify it.  I doubt that this would take the form
>of a "definition of paragraphs", though.  The reason is that there is no
>simple "definition of paragraphs" at the base of the current code or
>these two variables.

Read my patch and reconsider!

>The concepts that the design is based on are the concepts that you see
>in the manual.

I've worked out just what's been bugging me, and that's the definition of
`paragraph-start':  It suggests (though it doesn't quite explicitly say)
that paragraph-start matches the start of _every_ paragraph.  This isn't
true - any line following a separator line is the start of a paragraph.

>    The four regexps documented on this page all define chunks of
>    natural-language text: paragraphs, pages and sentences.  So how
>    about renaming this @section something like "Sentences, Paragraphs
>    and Pages", and making the focus of the @node the definition of
>    these things in terms of the regexps, rather than the regexps
>    themselves?

>I would be glad to consider a change of this sort.

OK, here's my first shot at a patch:  As a matter of interest, what's
this node doing in "Searching and Matching"?  Would it not be more at
home under "Text"?



2005-09-03  Alan Mackenzie  <address@hidden>

        * searching.texi (Standard Regexps): Rename the @section "Regular
        Expressions for Pages, Paragraphs, and Sentences".  Insert a full
        description of paragraphs.


*** searching.texi      Tue Aug 30 09:15:42 2005
--- searching-1.67.acm.texi     Sat Sep  3 12:01:10 2005
***************
*** 1643,1686 ****
  @end table
  
  @node Standard Regexps
! @section Standard Regular Expressions Used in Editing
  @cindex regexps used standardly in editing
  @cindex standard regexps used in editing
  
!   This section describes some variables that hold regular expressions
! used for certain purposes in editing:
! 
  @defvar page-delimiter
! This is the regular expression describing line-beginnings that separate
! pages.  The default value is @code{"^\014"} (i.e., @code{"^^L"} or
! @code{"^\C-l"}); this matches a line that starts with a formfeed
! character.
  @end defvar
  
!   The following two regular expressions should @emph{not} assume the
! match always starts at the beginning of a line; they should not use
! @samp{^} to anchor the match.  Most often, the paragraph commands do
! check for a match only at the beginning of a line, which means that
! @samp{^} would be superfluous.  When there is a nonzero left margin,
! they accept matches that start after the left margin.  In that case, a
! @samp{^} would be incorrect.  However, a @samp{^} is harmless in modes
! where a left margin is never used.
  
  @defvar paragraph-separate
! This is the regular expression for recognizing the beginning of a line
! that separates paragraphs.  (If you change this, you may have to
! change @code{paragraph-start} also.)  The default value is
! @address@hidden"[@ \t\f]*$"}}, which matches a line that consists entirely of
! spaces, tabs, and form feeds (after its left margin).
  @end defvar
  
  @defvar paragraph-start
! This is the regular expression for recognizing the beginning of a line
! that starts @emph{or} separates paragraphs.  The default value is
! @address@hidden"\f\\|[ \t]*$"}}, which matches a line containing only
! whitespace or starting with a form feed (after its left margin).
  @end defvar
  
  @defvar sentence-end
  If address@hidden, the value should be a regular expression describing
  the end of a sentence, including the whitespace following the
--- 1643,1729 ----
  @end table
  
  @node Standard Regexps
! @section Regular Expressions for Pages, Paragraphs, and Sentences
  @cindex regexps used standardly in editing
  @cindex standard regexps used in editing
  
!   This section describes the regular expressions Emacs uses to
! recognize pages, paragraphs, and sentences.  By setting these
! variables appropriately, the Elisp programmer can control the precise
! effect of the standard commands that move over, kill, fill, mark,
! narrow to, and otherwise operate on these pieces of text.  Note that
! these variables are @emph{not} buffer local by default.
! 
! @table @asis
! @cindex page
! @item Pages
  @defvar page-delimiter
! This is the regular expression describing line-beginnings that
! separate pages.  The default value is @code{"^\014"} (i.e.,
! @code{"^^L"} or @code{"^\C-l"}); this matches a line that starts with
! a formfeed character.
  @end defvar
  
! @cindex paragraph
! @item Paragraphs
!   Buffers divide into @dfn{paragraphs}, sequences of whole lines which
! normally don't address@hidden is possible for a blank line to be
! both the last line of one paragraph and the first line of the next.}.
! Between two paragraphs there may optionally be one or more
! @dfn{separator lines}, which aren't part of any paragraph.  The two
! regular expressions @code{paragraph-separate} and
! @code{paragraph-start} fully determine where paragraphs start and end.
! The beginning and end of the buffer always count as paragraph
! boundaries.
  
  @defvar paragraph-separate
! This regular expression recognizes a separator line by matching any
! portion of it which begins at its left margin (@pxref{Margins for
! Filling}).  (If you change this, you may have to change
! @code{paragraph-start} also.)  The default value is @address@hidden"[@
! \t\f]*$"}}, which matches a line that consists entirely of spaces,
! tabs, and form feeds (after its left margin).
  @end defvar
  
  @defvar paragraph-start
! This regular expression recognizes a line which starts a paragraph
! when the previous line is not a separator.  It need only match some
! portion beginning at the line's left margin (@pxref{Margins for
! Filling}), not the whole line.  It must also be set up to recognize a
! separator line.  The default value is @address@hidden"\f\\|[ \t]*$"}}, which
! matches a line containing only whitespace or starting with a form feed
! (after its left margin).
  @end defvar
  
+ The two variant forms of paragraph breaks are:
+ 
+ @table @asis
+ @item Paragraph break without separator lines
+ Any line, apart from a separator line, which @code{paragraph-start}
+ recognizes starts a new paragraph.
+ 
+ @item Paragraph break with separator lines
+ One or more separator lines split the old paragraph from the new one.
+ Whether @code{paragraph-start} would also recognize the first line of
+ the new paragraph is irrelevant.
+ @end table
+ 
+   As a heuristic feature, if a line tentatively recognized as the
+ start of a paragraph follows a whitespace line, the whitespace line
+ becomes the start of the paragraph instead.
+ 
+   Since the above two regular expressions, @code{paragraph-start} and
+ @code{paragraph-separate}, are matched against text at the left
+ margin, they should @emph{not} use @samp{^} to anchor the match to the
+ beginning of the line.  Most often, the paragraph commands do check
+ for a match only at the beginning of a line, which means that @samp{^}
+ would be superfluous.  When there is a nonzero left margin, they
+ accept matches that start after the left margin.  In that case, a
+ @samp{^} would be incorrect.  However, a @samp{^} is harmless in modes
+ where a left margin is never used.
+ 
+ @cindex sentence
+ @item Sentences
  @defvar sentence-end
  If address@hidden, the value should be a regular expression describing
  the end of a sentence, including the whitespace following the
***************
*** 1700,1705 ****
--- 1743,1749 ----
  @code{sentence-end-without-period} and
  @code{sentence-end-without-space}.
  @end defun
+ @end table
  
  @ignore
     arch-tag: c2573ca2-18aa-4839-93b8-924043ef831f



-- 
Alan Mackenzie (Munich, Germany)






reply via email to

[Prev in Thread] Current Thread [Next in Thread]