texinfo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "special" spaces in Texinfo parsing and output


From: Patrice Dumas
Subject: Re: "special" spaces in Texinfo parsing and output
Date: Sun, 7 Apr 2013 13:44:40 +0200
User-agent: Mutt/1.5.20 (2009-12-10)

On Tue, Mar 26, 2013 at 09:19:02PM +0000, Karl Berry wrote:
> (Switching to texinfo-devel)
> 
> Apparently Unicode agrees with you -- search for "breaking space" in
> http://www.unicode.org/reports/tr14; all the Unicode space chars are
> deemed breakpoints.  That seems quite wrong to me -- as an author, I
> would certainly not want a line break at, say, a thin space -- but
> Unicode is what it is.  Fine. 

Breaking at any type of space seems right to me.

>     Yet, considering [\r\n\t ] only to be space characters and everything
>     else to be non-space, treated as letters would simplify my life.
> 
> I think that is actually better, because makeinfo is not a display
> engine.  In practice, makeinfo has never tried to implement full (or
> most) Unicode semantics and I don't see any users wanting it, so I see
> no problem with just saying "Unicode chars stay as is in utf-8 encoding,
> all else is undefined".

Ok, but if we can do something sane anyway, why not try?  My feeling is
that simply considering any space as determined by perl as space but
preserving it in the output if between words would be easy to implement
and ok for most cases.

>     Suppose we have a text with a '* SPACE' what should be done at the
>     end of a line, could it be replaced by a new line?  
> 
> I'm sorry, but I don't understand what you mean by '* SPACE'.
> Do you mean three characters: an asterisk, a normal ASCII space, and
> then an unusual Unicode space character?  From the rest of
> what you write, I don't think so, but I can't figure it out.  

'* SPACE' was any of the of the unicode spaces, like EN SPACE, EM SPACE,
THIN SPACE...

>     Not necessarily, there is already some special handling of fullwidth
>     east asian characters, 
> 
> Sure, I know.  But there's a lot more to Unicode line breaking than East
> Asian character widths.  See above TR.  I would prefer that we *not*
> implement it.  No one is expecting it.  I foresee it causing only
> trouble to do so.

Agreed.  However, for spaces, there is already some support in perl, so
it is not that much a bother, rather, not using \s would be more
complicated...  Seems like we already need a specific handling of form
feed (though only in the output, not in the parser, unless I missed
something), but otherwise it could be simpler to go to what perl gives.

-- 
Pat



reply via email to

[Prev in Thread] Current Thread [Next in Thread]