bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] apl symbols from a file


From: Kacper Gutowski
Subject: Re: [Bug-apl] apl symbols from a file
Date: Mon, 15 Feb 2016 01:43:38 +0100

On Mon, Feb 15, 2016 at 12:08 AM,  <address@hidden> wrote:
> the system's word count mechanism says length four. APL says length three, I
> thought this was going to be length 1 and the symbol '⍝'. Does anyone know
> what I am doing incorrectly:
>
> address@hidden:~/aplstuff$ echo "⍝" > txt
> address@hidden:~/aplstuff$ cat txt
>
> address@hidden:~/aplstuff$ wc txt
> 1 1 4 txt

Your file ‘txt’ contains 4 bytes: the character ⍝ (U+235D, which takes
3 bytes encoded in UTF-8: e2 8d 9d), followed by a new line (0a).  And
this is what ‘wc’ reports: 1 line, 1 word, and 4 bytes.  Use ‘wc -m’
to get the count in characters (under current locale's encoding),
which shall be 2 for that file (newline still counts).

>       tie←⎕FIO[3] path
>       fontents←{⍵, ⎕FIO[6] tie}⍣{⍺⊢⎕FIO[10] tie}''

⎕FIO[6] reads bytes, not Unicode codepoints.  It reads bytes and
returns them as numeric vector, so at this point your fontents should
be a vector of 4 values: 226 141 157 10.

As a side note, this is better written using ⎕FIO[26] which reads the
whole named file (it didn't work with stdin, but there is no reason
not to use it here).  Using the trick above, you will get you an
additional zero at the end if the size of file happens to be exact
multiple of the default block size used by ⎕FIO[6] (which is 5000).
⎕FIO[26] doesn't have such problems.

>       fontents←⎕ucs⊃,/⍪(~ fontents∊10 11)⊂fontents
>       fontents
> â

And you passed this to ⎕UCS which obediently converted each number to
a character of the corresponding Unicode codepoint, i.e. 226 141 157
got turned into U+00E2 U+008D U+009D (â followed by non-printable
characters).

What you probably wanted was to decode the contents as UTF-8-encoded
characters.  You can achieve this by using 19 ⎕CR fontents.

In short:
      "⍝\n" ≡ 19⎕CR⎕FIO[26]'txt'
1

Note that ⎕FIO[26] returns bytes as character array (similarly to what
you got by using ⎕UCS on numeric array of bytes), and this is exactly
what 19 ⎕CR expects.

-k



reply via email to

[Prev in Thread] Current Thread [Next in Thread]