bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ⎕FIO Buffer limit is 5000 Bytes


From: Hans-Peter Sorge
Subject: Re: ⎕FIO Buffer limit is 5000 Bytes
Date: Tue, 3 Nov 2020 13:24:34 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.3.1

Hello Jürgen,

thank your for the insight.

Best Regards
Hans-Peter

Am 03.11.20 um 11:29 schrieb Dr. Jürgen Sauermann:
Hi Hans-Peter,

see below.

Best Regards,
Jürgen


On 11/2/20 8:02 PM, Hans-Peter Sorge wrote:
Hello Jürgen,

as far as some UTF study got me .... I have 2 questions:

Two bytes in range x8000 - xFFFF will be changed to x108000 - x10FFFF.  Correct?

No. The conversion works byte by byte. So if you have an invalid sequence of bytes,
say ... 0x80 0x90 ... then the 0x80 (which is an invalid UTF start byte) is translated
to 0x100080 iand the process restarts at 0x90 (which is again invalid)...

So you should get ... 0x100080 0x100090 ...
Could it happen, that the last byte in a file is an invalid UTF-8 char - leading to x1080__ - x10FF__.  What then would be __?

That can of course happen and in that case the UTF start character of the sequence (!) and not the offending
character is mapped to some 1000XX and the process is repeated after the start character. This way when an error
is detected the conversion resynchronises itself at the earliest possible time.

Best Regards
Hans-Peter



Am 02.11.20 um 14:51 schrieb Dr. Jürgen Sauermann:
Hi,

I have done some rework of the UTF8-to-Unicode conversion.
It now maps incorrect characters in an UTF8 encoding to corresponding
characters in the "Supplementary Private Use Area-B" (so that the
offending character becomes available at APL level and can be
recovered by subtracting 0x100000 from the codepoint) rather than raisong
an error.

SVN 1352.

Best Regards,
Jürgen



On 11/2/20 10:05 AM, Hans-Peter Sorge wrote:
Hi Jürgen,

I agree. A  cat BIN_FILE  in a terminal session is of artistically value only.

Best Regards
Hans-Peter

Am 01.11.20 um 20:48 schrieb Dr. Jürgen Sauermann:
Hi Hans-Peter,

the result of an ⍎'ed command is the output of that command, normally
one (nested) APL string for every line of command output.

This requires that the command output can be represented as APL
strings. This is the case for "normal" text output which must then
be either normal ASCII or else UTF8-encoded.

In theory one could have used raw bytes instead of UTF8 encoded
APL characters, but in most cases (and especially for interactive use
cases) the current solution is more convenient since the result can
be displayed directly (at least for text output).

Best Regards,
Jürgen



On 10/31/20 4:10 PM, Hans-Peter Sorge wrote:
EIJHHH - never thought about it - COOOL.


⍝IBM APL:
     ⍎ ')HOST ls'
VALUE ERROR
      )HOST ls
            ^
      ⍎')HOST ls'
                 ^
⍝ GNU-APL:
	l ← ⍎ ')HOST ls -1'
And it works:-)) Makes life much easier.

	f ← ⍎ ')HOST cat filename'
⍝ returns the file as nested vector
⍝ No intermediate file required.

⍝ Does not like binary data:
      ⍴⍎')HOST cat /OTH/APL/trunk/src/apl-Symbol.o'                                                                                                                                                                                          
Bad UTF8 string: 0x48 0x8B 0x45 0xD8 0x48 0x83 0xC0 0x30 0xEB 0x05 0xB8 at UCS_string.cc:120 ..... 



doc/apl.info (incorrectly) reads:
snip
Like system commands, user-define commands can only be executed in immediate
execution mode and not from user-defined functions or from ⍎.
/snip


A last thought:
How to connect apl-command_line to host-stdin? Like
        &⍞ ← 'test test test' 
	⍎ ')HOST &0  > data_entry_from_apl'

OK - Just a weekend. Asking too much:-)

Best Regards
Hans-Peter




 

Am 31.10.20 um 14:39 schrieb Dr. Jürgen Sauermann:
Hi,

On 10/30/20 6:14 PM, Kacper Gutowski wrote:
On Fri, Oct 30, 2020 at 02:34:35PM +0100, Dr. Jürgen Sauermann wrote:
There is also ⎕FIO[26] which reads an entire file, but I am not sure how it works with popen()ed streams.

It doesn't at all because it takes a path which additionally needs to be a regular file because it's mmaped rather than read.

For the record, something like ⍎')HOST ...' might sometimes be practical.

-k
That is actually a cool idea: run your pipe or program with )HOST, forward the
output into some /tmp/xxx and read back /tmp/xxx. It also gives you some more
control over using stdout, stderr, or both from the executed program .

Jürgen









reply via email to

[Prev in Thread] Current Thread [Next in Thread]