|
From: | Sean Charles |
Subject: | Internal storage costs... |
Date: | Wed, 24 Jul 2013 23:35:02 +0100 |
Hello venerable list… Is it more efficient to store a list of lexemes as character codes or single character atoms? Without knowing the C code other than what I know of the FFI, is it more compact to store a list of integers which presumably represent themselves or is it more efficient to use single character atoms ? I am guessing that the FlyWeight pattern is used or similar which means that the single character atoms are actually pointers to the atom so a list of one hundred 'a'-s is in fact a list of one hundred pointers into the atom store but is the pointer size bigger than the character code size ? I ask because my lexer is working and producing output like this: | ?- feltlex('small.felt',X). X = [comment(block,pos(1,1),[' ','S',t,r,i,n,g,' ',t,e,s,t,i,n,g,'.','\n','\n',' ',' ',' ','A',l,l,o,w,' ',b,a,c,k,s,l,a,s,h,e,d,' ',d,e,l,i,m,t,e,r,' ',i,n,' ',t,h,e,' ',s,e,q,u,e,n,c,e,'.','.','.','.','\n']),chr(/),comment(single,pos(6,1),[' ','D',o,u,b,l,e,' ',q,u,o,t,e,d,' ',s,t,r,i,n,g,s,'.','.','.']),string(double,pos(7,1),[c,h,e,e,s,\,'"',e,b,u,r,g,e,r]),string(double,pos(8,1),[c,h,e,e,s,\,'''',e,b,u,r,g,e,r]),comment(single,pos(10,1),[' ','S',i,n,g,l,e,' ',q,u,o,t,e,d,' ',s,t,r,i,n,g,s,'.','.','.']),string(single,pos(11,1),[c,h,e,e,s,e,\,'"',b,u,r,g,e,r]),string(single,pos(12,1),[c,h,e,e,s,e,\,'''',b,u,r,g,e,r])] That's from a source file: /* String testing. Allow backslashed delimter in the sequence.... */ ; Double quoted strings... "chees\"eburger" "chees\'eburger" ; Single quoted strings... 'cheese\"burger' 'cheese\'burger' Not a brilliant example but it was for testing the comment handling and string consumption allowing for a backslashed single or double quote to be part of the string. It's parsing using get_char/peek_char with LA(1) and that allows me to cope well enough for now. It is s-_expression_ based. For a really large source file, I want to make sure that I am being as efficient with internal storage as possible because once I have completed the lexer I have to be able to create an AST from it and then translate it into something else and I have already found out recently that GNU Prolog seg-faults under OSX when dealing with large amounts of in-memory data. So, anybody know what is the more space compact representation, atoms or character codes ? Thanks, Sean. |
[Prev in Thread] | Current Thread | [Next in Thread] |