|
From: | Sean Charles |
Subject: | Reading a file for lexing, one way to do it... |
Date: | Mon, 15 Jul 2013 20:47:36 +0100 |
OK, well I have produced this tiny little program which so far does what I wanted as a first step, but I am a little confused by my somewhat miraculous arrival at this point and I want to make sure that I understand what is going on. I ran it with tracing enabled on a small file containing just "ABC\n" and it was reassuring to see it do exactly what I thought it would do in the order that I expected it to BUT even so I would appreciate a little reassurance that I am on the right lines with it. Here is the code: lexit(Filename, Tokens) :- open(Filename, read, In), lexread(In, Tokens), close(In). lexread(In, _) :- at_end_of_stream(In), !. lexread(In, [ chr(C, Line, Col) | Tokens ]) :- stream_line_column(In, Line, Col), get_char(In, C), lexread(In, Tokens). Running it: lexit('small.txt',T). T = [chr('A',1,1),chr('B',1,2),chr('C',1,3),chr('\n',1,4)|_] Somehow I managed to figure out that I could put the "chr()" term inside the head as I read somewhere on stack overflow recently that you could do that to save a step or something. See, I am already running on vague, that's my lack of Prolog experience showing already! The "confusing" bit is this: lexread(In, [ chr(C, Line, Col) | Tokens ]) :- I can see that "Tokens" remains uninstantiated until the end-of-file condition triggers, at which point the complete call stack is picked up but I am unsure of the reasoning as to why the list comes out in the correct order, I think. I am seeing in my head a whole bund go .() "conses" all waiting to go ff one after the other. Then this line: stream_line_column(In, Line, Col), instantiates Line and Col thus the term cur(C,Line,Col) is now fully instantiated and then when the tail call to lexread() is made, a new temporary variable is created for Tokens because it is still uninstantiated. This continues until EOF at which point the stack frame is unwound and the list is constructed but why does it appear to be "right" i.e. the tokens read left to right in the same order as the characters in the file. I think I know but I am still al title shaky at this point! I have used Haskell for a few years now and on a memory consumption perspective, I have a hunch this method is very very bad as it would be creating huge swathes of stack frames especially for a very very large file but I am still learning. I have no doubt that there is a cleaner way using DCG-s but for now this is where I am thinking on. Thanks, Sean. |
[Prev in Thread] | Current Thread | [Next in Thread] |