bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "special" spaces in Texinfo parsing and output


From: Dumas Patrice
Subject: Re: "special" spaces in Texinfo parsing and output
Date: Mon, 25 Mar 2013 02:56:06 +0100
User-agent: Mutt/1.5.20 (2009-12-10)

On Sun, Mar 24, 2013 at 06:42:37PM +0000, Karl Berry wrote:
> 
> Consider [\r\n\t ] to be space characters?  That's what seems like the
> simplest and most expected result to me.

That's not what I would expect.  I would expect any unicode space to be
treated as a space with respect to word and paragraph breaking.  For
feed is very different, I would expect it to be kept and not to be
considered as a space, but as a 'ghost' character that is kept in any
case, but is not a space nor a non space character.

Yet, considering [\r\n\t ] only to be space characters and everything
else to be non-space, treated as letters would simplify my life.

>     Do something different for parsing or is it ok to have all the space
>     like characters be considered as spaces?  
> 
> No, it's not ok.  That's what occasioned the report in the first place
> -- we don't want to lose form feeds.

But, should space character, including form feeds simply be considered
as not space, or should they be considered differently?

>     And for the output?  Break words only at [\r\n\t ]?  
> 
> Sounds right to me.  As Eli said, it seems reasonable to have a conf
> variable to allow people to make their own.  Maybe.

Ok.

>     Keep the first space character only if it is not [\r\n]?
> 
> I'm not sure what you mean.

Suppose we have a text with a '* SPACE' what should be done at the end
of a line, could it be replaced by a new line?  Or if there is a space 
following or preceding the '* SPACE'?  Should the '* SPACE' simply 
be considered as a normal character?  Should it be considered that 
it has a simple width but a line may be broken right after it as for 
fullwidth east asian characters?

> However, we don't really want to implement the semantics of the Unicode
> space characters, let alone the full Unicode line-breaking algorithm, I
> feel sure.  At least not right now.  That's another whole major project.
> Just for example, there are other characters which end a sentence in
> Unicode too ... http://www.fileformat.info/info/unicode/char/3002/
> etc.  It's endless.

Not necessarily, there is already some special handling of fullwidth
east asian characters, and of all the Texinfo control of spacing, thus
it may turn out not to be overly complex.  That being said, I won't
do it unless told to...

> Meanwhile, if someone declares us-ascii but then uses non-ascii
> characters, that's not our problem.  It would be nice to output them
> as-is instead of losing them (that's what C makeinfo did, I believe),
> but I wouldn't say even that is 100% mandatory.

I'd say that we let perl have its way.  What would have been nice would
have been to be able to catch perl error message and replace it by
something more consistent with the other error messages and a proper
accounting of line number, but I don't know how to do that.

-- 
Pat



reply via email to

[Prev in Thread] Current Thread [Next in Thread]