|
From: | Juergen Sauermann |
Subject: | Re: [Bug-apl] Spell corrector - APL |
Date: | Sat, 10 Sep 2016 16:12:05 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 |
Hi Ala'a In GNU APL every cell of a value takes 20 bytes. In your example in function hist, ∪⍵ is probably the entire alphabet with 26 characters and therefore the outer product has 26×6.2 Mio = 161,200,000 cells or 3.2 Gigabyte. Depending on your OS this will give you a WS full or the machine will start swapping. The nature of the underlying problem seems to be fairly sequential, so using the outer product (and reducing it right after having created it) may not be the best way of achieving the desired result. /// Jürgen On 09/09/2016 11:39 PM, Ala'a Mohammad
wrote:
Hi, I'm trying to create simple spell corrector (Norvig at http://norvig.com/spell-correct.html) in APL. I tried but stumbled upon the frequency/count stage and could not move further. The stopper was either WS Full, or apl process killed. I'm assuming the main issue is 'lack of experience with APL', and thus the inefficient coding. ftxt ← { ⎕FIO[26] ⍵ } a ← 'abcdefghijklmnopqrstuvwxyz' A ← 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' downcase ← { (a,⎕AV)[(A,⎕AV)⍳⍵] } nl ← ⎕UCS 13 cr ← ⎕UCS 10 tab ← ⎕UCS 9 nonalpha ← nl, cr, tab, ' 0123456789()[]!?%$,.:;/+*=<>-_#"`~@&' alphamask ← { ~ ⍵ ∊ nonalpha } hist ← { (⍪∪⍵),+/∨/¨(∪⍵)∘.⍷⍵ } fhist ← { hist (alphamask txt) ⊂ downcase txt ← ftxt ⍵ } ⍝ file ← '/misc/small.txt' ~ 28K ⍝ file ← '/misc/xaa' ~ 1.3M file ← '/misc/big.txt' ⍝ ~ 6.2M ⍝ following 2 lines for debugging ⎕ ← ⍴w ← (alphamask txt) ⊂ downcase txt ← ftxt file ⎕ ← ⍴u ← ∪w fhist file the errors happened inside 'hist' function, and I presume mostly due to the jot dot find (if understand correctly, operating on a matrix of length equal to : unique-length * words-length) Is there anyway to fix the issue? and then proceed to complete the solution. Also, Is this the way to create simple spell corrector in APL (that is a one which is capitalizing on APL strength as an array language)? I'm using LinuxMint 17.1 (kernel 3.13.0-37-generic #64-Ubuntu) Gnu APL 1.6 (794) Zsch 5.0.2 Emacs 25.1.50.1 Best, Ala'a P.S: I hoped that I could create the solution in APL and then get some wacks on the head from fellow experienced APL programmers before submitting it as 'another solution in X language'. but the hope stopped short before even getting the probability stage. |
[Prev in Thread] | Current Thread | [Next in Thread] |