|
From: | Hans-Peter Sorge |
Subject: | Re: ⎕FIO Buffer limit is 5000 Bytes |
Date: | Tue, 3 Nov 2020 13:24:34 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.3.1 |
Hi Hans-Peter,
see below.
Best Regards,
Jürgen
On 11/2/20 8:02 PM, Hans-Peter Sorge wrote:
Hello Jürgen,No. The conversion works byte by byte. So if you have an invalid sequence of bytes,
as far as some UTF study got me .... I have 2 questions:
Two bytes in range x8000 - xFFFF will be changed to x108000 - x10FFFF. Correct?
say ... 0x80 0x90 ... then the 0x80 (which is an invalid UTF start byte) is translated
to 0x100080 iand the process restarts at 0x90 (which is again invalid)...
So you should get ... 0x100080 0x100090 ...
Could it happen, that the last byte in a file is an invalid UTF-8 char - leading to x1080__ - x10FF__. What then would be __?
That can of course happen and in that case the UTF start character of the sequence (!) and not the offending
character is mapped to some 1000XX and the process is repeated after the start character. This way when an error
is detected the conversion resynchronises itself at the earliest possible time.
Best Regards
Hans-Peter
Am 02.11.20 um 14:51 schrieb Dr. Jürgen Sauermann:
Hi,
I have done some rework of the UTF8-to-Unicode conversion.
It now maps incorrect characters in an UTF8 encoding to corresponding
characters in the "Supplementary Private Use Area-B" (so that the
offending character becomes available at APL level and can be
recovered by subtracting 0x100000 from the codepoint) rather than raisong
an error.
SVN 1352.
Best Regards,
Jürgen
On 11/2/20 10:05 AM, Hans-Peter Sorge wrote:
Hi Jürgen,
I agree. A cat BIN_FILE in a terminal session is of artistically value only.
Best Regards
Hans-Peter
Am 01.11.20 um 20:48 schrieb Dr. Jürgen Sauermann:
Hi Hans-Peter,
the result of an ⍎'ed command is the output of that command, normally
one (nested) APL string for every line of command output.
This requires that the command output can be represented as APL
strings. This is the case for "normal" text output which must then
be either normal ASCII or else UTF8-encoded.
In theory one could have used raw bytes instead of UTF8 encoded
APL characters, but in most cases (and especially for interactive use
cases) the current solution is more convenient since the result can
be displayed directly (at least for text output).
Best Regards,
Jürgen
On 10/31/20 4:10 PM, Hans-Peter Sorge wrote:
EIJHHH - never thought about it - COOOL.
⍝IBM APL:
⍎ ')HOST ls'VALUE ERROR)HOST ls^⍎')HOST ls'^⍝ GNU-APL:
l ← ⍎ ')HOST ls -1'And it works:-)) Makes life much easier.
f ← ⍎ ')HOST cat filename'⍝ returns the file as nested vector
⍝ No intermediate file required.
⍝ Does not like binary data:
⍴⍎')HOST cat /OTH/APL/trunk/src/apl-Symbol.o'Bad UTF8 string: 0x48 0x8B 0x45 0xD8 0x48 0x83 0xC0 0x30 0xEB 0x05 0xB8 at UCS_string.cc:120 .....
doc/apl.info (incorrectly) reads:
snip
Like system commands, user-define commands can only be executed in immediate
execution mode and not from user-defined functions or from ⍎.
/snip
A last thought:
How to connect apl-command_line to host-stdin? Like
&⍞ ← 'test test test' ⍎ ')HOST &0 > data_entry_from_apl'
OK - Just a weekend. Asking too much:-)
Best Regards
Hans-Peter
Am 31.10.20 um 14:39 schrieb Dr. Jürgen Sauermann:
Hi,
On 10/30/20 6:14 PM, Kacper Gutowski wrote:
On Fri, Oct 30, 2020 at 02:34:35PM +0100, Dr. Jürgen Sauermann wrote:That is actually a cool idea: run your pipe or program with )HOST, forward the
There is also ⎕FIO[26] which reads an entire file, but I am not sure how it works with popen()ed streams.
It doesn't at all because it takes a path which additionally needs to be a regular file because it's mmaped rather than read.
For the record, something like ⍎')HOST ...' might sometimes be practical.
-k
output into some /tmp/xxx and read back /tmp/xxx. It also gives you some more
control over using stdout, stderr, or both from the executed program .
Jürgen
[Prev in Thread] | Current Thread | [Next in Thread] |