replace-match problem

bug-gnu-emacs
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
replace-match problem

From:	Wolfgang Scherer
Subject:	replace-match problem
Date:	Fri, 3 May 2002 17:44:13 +0200
This bug report will be sent to the Free Software Foundation,
not to your local site managers!
Please write in English, because the Emacs maintainers do not have
translators to read other languages for them.

Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.

In GNU Emacs 21.1.1 (i386-suse-linux, X toolkit, Xaw3d scroll bars)
 of 2002-03-25 on stephens
configured using `configure --with-gcc --with-pop --with-system-malloc 
--prefix=/usr --exec-prefix=/usr --infodir=/usr/share/info 
--mandir=/usr/share/man --sharedstatedir=/var/lib --libexecdir=/usr/lib 
--with-x --with-xpm --with-jpeg --with-tiff --with-gif --with-png 
--with-x-toolkit=lucid --x-includes=/usr/X11R6/include 
--x-libraries=/usr/X11R6/lib i386-suse-linux CC=gcc 'CFLAGS=-O2 -march=i486 
-mcpu=i686 -pipe          -DSYSTEM_PURESIZE_EXTRA=25000   
-DSITELOAD_PURESIZE_EXTRA=10000         -D_GNU_SOURCE ' LDFLAGS=-s 
build_alias=i386-suse-linux host_alias=i386-suse-linux 
target_alias=i386-suse-linux'
Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: POSIX
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: german
  locale-coding-system: iso-latin-1
  default-enable-multibyte-characters: nil

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

REPLACE-MATCH PROBLEM
=====================

The built-in function `replace-match' seems to behave inconsistently.
Specifically, I have a problem with the semantics of "words" and
"newtext".

>From the documentation of `replace-match':

    Otherwise maybe capitalize the whole text, or maybe just word
    initials, based on the replaced text.

    [1] If the replaced text has only capital letters and has at
    least one multiletter word, convert NEWTEXT to all caps.

    [2] If the replaced text has at least one word starting with a
    capital letter, then capitalize each word in NEWTEXT.


1. The lower case and upper case examples in lines 1, 2, 6, 7, 11, 12,
   16, 17 could suggest that "\\&" is subject to case conversion.
   Lines 4, 5, 9, 10, 14, 15, 19, 20 show that this is not the case.
   (I think a clarification would be nice, e.g. "Case conversion is
    done before any special sequences are expanded.")

2. The lower case and upper case examples also suggest, that the
   amount of non-word constituent characters between words does not make
   a difference.
   The examples for mixed-case replaced text in lines 3, 8, 13, 18
   show that the amount of non-word constituent characters does in fact
   make a difference.

   This is a consequence of replace-match in search.c not checking the
   syntax-code of the current character, which leads to the assumption,
   that the second and further separators are actually the initial
   characters of a word.

3. The test examples for mixed-case replaced text in lines 4, 5, 9,
   10, 14, 15, 19, 20 show that description [2] is plainly wrong.  It
   should state, that capitalization is only done, when ALL words in
   the replaced text are capitalized.

   At least the code in search.c says so:

       /* Capitalize each word, if the old text has all capitalized words.  */

TEST CASE
=========

The following table was generated with a test expression that copies
INPUT with fixed case ("\\& => \\&" ) and then replaces the copy of
INPUT with case conversion (e.g. "\\& : your-string").

    INPUT          \& REPL       STRING-REPL

  1 my-string   => my-string   : your--string 
  2 MY-STRING   => MY-STRING   : YOUR--STRING 
  3 My-String   => My-String   : Your--String 
  4 My-string   => My-string   : your--string 
  5 my-String   => my-String   : your--string 
          
  6 my--string  => my--string  : your--string
  7 MY--STRING  => MY--STRING  : YOUR--STRING
  8 My--String  => My--String  : your--string
  9 My--string  => My--string  : your--string
 10 my--String  => my--String  : your--string
          
 11 my string   => my string   : your string 
 12 MY STRING   => MY STRING   : YOUR STRING 
 13 My String   => My String   : Your String 
 14 My string   => My string   : your string 
 15 my String   => my String   : your string 
          
 16 my  string  => my  string  : your  string
 17 MY  STRING  => MY  STRING  : YOUR  STRING
 18 My  String  => My  String  : your  string
 19 My  string  => My  string  : your  string
 20 my  String  => my  String  : your  string

EMACS search.c (no difference between 20.7, 21.1 and 21.2)
==============

>>        if (LOWERCASEP (c))
>>          {
>>            /* Cannot be all caps if any original char is lower case */
>>
>>            some_lowercase = 1;
>>            if (SYNTAX (prevc) != Sword)
>>              some_nonuppercase_initial = 1;
>>            else
>>              some_multiletter_word = 1;
>>          }
>>        else if (!NOCASEP (c))
>>          {
>>            some_uppercase = 1;
>>            if (SYNTAX (prevc) != Sword)
>>              ;
>>            else
>>              some_multiletter_word = 1;
>>          }
>>        else
>>          {
>>            /* If the initial is a caseless word constituent,
>>               treat that like a lowercase initial.  */
>>            if (SYNTAX (prevc) != Sword)
>>              some_nonuppercase_initial = 1;
>>          }

I think it should be more correctly:

      if (SYNTAX (c) == Sword)
        {
>>        if (LOWERCASEP (c))
>>          {
>>            /* Cannot be all caps if any original char is lower case */
>>
>>            some_lowercase = 1;
>>            if (SYNTAX (prevc) != Sword)
>>              some_nonuppercase_initial = 1;
>>            else
>>              some_multiletter_word = 1;
>>          }
>>        else if (!NOCASEP (c))
>>          {
>>            some_uppercase = 1;
>>            if (SYNTAX (prevc) != Sword)
>>              ;
>>            else
>>              some_multiletter_word = 1;
>>          }
>>        else
>>          {
>>            /* If the initial is a caseless word constituent,
>>               treat that like a lowercase initial.  */
>>            if (SYNTAX (prevc) != Sword)
>>              some_nonuppercase_initial = 1;
>>          }
        }

Or:

>>        if (LOWERCASEP (c))
>>          {
>>            /* Cannot be all caps if any original char is lower case */
>>
>>            some_lowercase = 1;
>>            if (SYNTAX (prevc) != Sword)
>>              some_nonuppercase_initial = 1;
>>            else
>>              some_multiletter_word = 1;
>>          }
>>        else if (!NOCASEP (c))
>>          {
>>            some_uppercase = 1;
>>            if (SYNTAX (prevc) != Sword)
>>              ;
>>            else
>>              some_multiletter_word = 1;
>>          }
          else if (SYNTAX (c) == Sword)
>>          {
>>            /* If the initial is a caseless word constituent,
>>               treat that like a lowercase initial.  */
>>            if (SYNTAX (prevc) != Sword)
>>              some_nonuppercase_initial = 1;
>>          }

TEST EXPRESSION (JUST FOR REFERENCE)
====================================

;; |:debug:|
(let ((case-fold-search t)
      (case-replace nil)
      (str-wid 13)
      (line-no 1)
      (r-s (function
            (lambda (SEARCH REPL)
              (while (search-forward SEARCH nil t)
                ;; Duplicate SEARCH (with FIXEDCASE == t)
                (replace-match (format (format "%%-%ss => %%s"
                                               (- str-wid
                                                  (length (match-string 0))))
                                       "\\&" "\\&")
                               t nil)
                (goto-char (match-beginning 0))
                ;; Find copy of SEARCH
                (search-forward SEARCH nil t 2)
                ;; Replace SEARCH (with FIXEDCASE == nil) by "\\& => REPL"
                (replace-match (format (format "%%-%ss : %%s"
                                               (- str-wid
                                                  (length (match-string 0))))
                                       "\\&" REPL)
                               nil nil)
                ;; Add line number
                (beginning-of-line)
                (insert (format "%3d " line-no))
                (end-of-line)
                (setq line-no (1+ line-no)))))))
  (save-excursion
    (funcall r-s "my-string"  "your--string")
    (funcall r-s "my--string" "your--string")
    (funcall r-s "my string"  "your string")
    (funcall r-s "my  string" "your  string")
    ))
;; |:debug:|



Recent input:
<down> <down> <down> <down> <down> <down> <down> <down> 
<down> <down> <down> <down> <down> <down> <down> <down> 
<down> <down> <down> <down> <down> <down> <down> <down> 
<down> <down> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <up> <up> <up> <up> <up> <up> <up> <up> 
<up> <up> <up> <down> <down> <down> <down> <down> <down> 
<down> <down> <up> <C-prior> C-x C-f m a i l <return> 
C-u C-c u d f C-x k <return> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <help-echo> <help-echo> <help-echo> <help-echo> 
<help-echo> <menu-bar> <help-menu> <report-emacs-b
ug>

Recent messages:
emacs-replace-match-bug.el has auto save data; consider M-x recover-file
Scanning buffer for index (  0%)
Scanning buffer for index (100%)
call-interactively: Quit
Wrote /usr/people/ws/emacs-init/replace-match/emacs-replace-match-bug.el [3 
times]
Mark set [4 times]
Wrote /usr/people/ws/emacs-init/replace-match/emacs-replace-match-bug.el [2 
times]
Mark set
(New file)
Loading emacsbug...done
[Prev in Thread]
Current Thread
[Next in Thread]
replace-match problem, Wolfgang Scherer <=
Prev by Date: Re: Build Failure
Next by Date: Yacht Charter
Previous by thread: Re: Build Failure
Next by thread: Yacht Charter
Index(es):
- Date
- Thread