screen-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[screen-devel] [bug #51890] screen randomly injects \b into UTF8 streams


From: Mike Frysinger
Subject: [screen-devel] [bug #51890] screen randomly injects \b into UTF8 streams when processing combining characters
Date: Tue, 29 Aug 2017 17:50:39 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; CrOS x86_64 9869.0.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3193.0 Safari/537.36

URL:
  <http://savannah.gnu.org/bugs/?51890>

                 Summary: screen randomly injects \b into UTF8 streams when
processing combining characters
                 Project: GNU Screen
            Submitted by: vapier
            Submitted on: Tue 29 Aug 2017 09:50:37 PM UTC
                Category: None
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
                 Release: 4.5.0
           Fixed Release: None
         Planned Release: None
           Work Required: None

    _______________________________________________________

Details:

simple example:
  printf 'xA\U0000030Ax\n'

that will write out the UTF-8 byte stream (hexdump view):
  78 41 cc 8a 78 0a  |xA..x.|

when i'm not using screen, the terminal emulator sees that exactly.  however,
screen will read that and then mangle it, passing along:
  xA\bA\U0000030Ax\n
  78 41 08 41 cc 8a 78 0a  |xA.A..x.|

a proper terminal emulator is able to deal with this.  but the question still
stands: why is it doing this ?  i couldn't locate the logic in the screen
source though.

poking it through strace shows the screen process doing the read() on the pty
master (/dev/ptmx) and getting the correct UTF-8 stream, then doing a write on
its slave pty with the mangled stream.  so it doesn't seem like it's an
external-to-screen mangling.

my locale is set to en_US.UTF8, screen was launched with -U, and .screenrc
has:
  defutf8 on
  defencoding utf8
using screen 4.05.00

since the whole pipeline is UTF-8 aware, i can't explain why screen would need
to interject these things.  i might understand if it was dealing with some
semi-broken systems where it tried to get slightly better output, but that
doesn't apply here.

the odd thing is that when screen dumps lines from its history (e.g. when you
attach or otherwise scrollback), it doesn't inject the \b logic.  only for new
content.

noticed originally with a bit more pathological line:
1.001.01a अ॒ग्निमी॑ळे पु॒रोहि॑तं
य॒ज्ञस्य॑ दे॒वमृ॒त्विज॑म् ।
that inserts a number of \b (all around combining chars?  didn't look super
close).




    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51890>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]