m4-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

improve substr


From: Eric Blake
Subject: improve substr
Date: Wed, 24 Dec 2008 23:23:22 +0000 (UTC)
User-agent: Loom/3.14 (http://gmane.org/)

POSIX is not very specific about negative arguments to substr.  It is only 
explicit that a positive second argument larger than the first argument's 
length is okay (the empty string must silently result).  Furthermore, BSD m4 
segfaults on substr(abc,-2), which gives a bit of weight to the argument that 
negative arguments aren't really standardized, so we might as well make 
behavior nice.

Up till now, we've been silently returning the empty string if any negative 
arguments occur, which matches Solaris m4, but is not very useful.  So, I think 
it's high time that we adopt perl's semantics for negative arguments 
(from 'perldoc -f substr):

   my $s = "The black cat climbed the green tree";
   my $color  = substr $s, 4, 5;       # black
   my $middle = substr $s, 4, -11;     # black cat climbed the
   my $end    = substr $s, 14;         # climbed the green tree
   my $tail   = substr $s, -4;         # tree
   my $z      = substr $s, -4, 2;      # tr

While it is true that this can be done with existing m4, it seems inefficient 
to have to use this (lightly tested) code (the use of incr/decr strips leading 
0 while preserving the sign, so that the argument -08 is parsed as decimal -8 
as in the original substr, and not as an octal error as in eval):

define(`substr', `ifelse(`$#', `0', ``$0'',
`_substr1(`$1', incr(decr(`$2')),
  ifelse(`$3', `', `len(`$1')', `incr(decr(`$3'))'), len(`$1'))')')
define(`_substr1', `_substr2(`$1',
  eval($2 < 0 ? ($2 + $4 < 0 ? 0 : $2 + 4) : $2),
  `$3', `$4')')
define(`_substr2', `builtin(`substr', `$1', `$2',
  eval($3 < 0 ? ($3 + $4 - $2 < 0 ? 0 : $3 + $4 - $2) : $3))')


Also, perl's use of an optional fourth argument to be spliced into the original 
string is cool.  Perl only allows a fourth argument when substr is used on an 
lvalue, which doesn't translate very well to m4, but m4 could treat it roughly 
like the following (untested):

define(`substr', `ifelse(`$#', `4',
  `$0(`$1', `0', `$2')$4`'$0(`$1', eval($2 + $3))',
  `builtin(`$0', $@)')

but with support for negative arguments, expecting decimal arguments, and 
issuing a warning like perl if the entire substring selected lies outside the 
original string.

If there are no newlines, this could also be achieved on the master branch with 
an extended regular expression, although that is probably slower:

define(`substr', `ifelse(`$#', `4',
  `patsubst(`$1', `^(.{$2}).{$3}', `\1$4', `extended')'
  `builtin(`$0', $@)')

Again, implementing this natively will be more efficient.  What do you think of 
adding these two enhancements to substr?

-- 
Eric Blake






reply via email to

[Prev in Thread] Current Thread [Next in Thread]