Re: Seeking Understanding about Symbols, Strings, Structure and Storage

help-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Seeking Understanding about Symbols, Strings, Structure and Storage

From:	Pascal J. Bourguignon
Subject:	Re: Seeking Understanding about Symbols, Strings, Structure and Storage
Date:	Fri, 06 Nov 2009 13:03:27 +0100
User-agent:	Gnus/5.1008 (Gnus v5.10.8) Emacs/22.3 (darwin)

Nordlöw <per.nordlow@gmail.com> writes:

> I am trying really hard to get a unified view about all the atomic
> types and containers/structures/links (e)lisp provides me with:

First, sorry, in my previous answer, I said that an obarray is a tree,
but it's actually a hash-table.  

> What is the best (efficient) way to store symbols (references)
> extracted from an obarray?
> Should I intern them in a new array or should I store them in a
> sequence?

You cannot intern a symbol (see the doc of the function intern).  

The emacs lisp reference manual says:

    Do not try to put symbols in an obarray yourself. This does not
    work—only intern can enter a symbol in an obarray properly.

> Does interning symbols in an obarray require more memory than storing
> them in a vector?

Why do you care?  obarray is not a user level data structure, it's an
internal data structure used by intern and find-symbol.  You shouldn't
use it directly.  It is leaking its implementation, because you have
to know that to allocate a new, empty obarray, you must use
make-vector, but an obarray IS NOT a vector.  You cannot store
anything directly in a obarray yourself,  you must use intern. 

> When should I intern symbols in an obarray?; My guess: When I need
> fast (hashed) lookup of them.

Notice that in emacs lisp, symbols can belong only to a single one
obarray.  The reason is that symbols contain an hidden reference to
the following symbol in the obarray bucket, a single reference.
Therefore a symbol can be stored only in one obarray, otherwise the
buckets could be shared and wreak havoc.

> If I store symbols in a list, say '(a b), are these symbols "free",
> compared to if I first intern them?

What do you mean by "free"?

> I have heard the we should always prefer symbols before strings, when
> we can. Why?;

Don't listen to old optimization advices.  The computers have changed
a lot over the time.  You cannot use the same thumb rules when you
have a 3 ton computer with 32,768 words of memory able to do 100,000
additions a second, than when you have a 0.25 gram computer that has
17,179,869,184 bytes of RAM, and that is able to do 3,100,000,000
additions per second.

> I guess symbols require less memory because they don't carry string
> attributes. Are the any other reasons?

A symbol has a name which is a string, therefore, if you keep a naive
approach, they need strictly more memory than a string.

But when you have a language such as lisp which doesn't actively
prevent you to use the functionnal programming style (ie. without
mutation), data can be shared amonst several data structures, and
therefore it is very difficult to account memory for a single
structure, since that structure may be shared, it's memory cost may be
shared too.

Really, you should not care about these questions while you have
gigabytes of free RAM! 

> If I want to build a tree or even a cyclic graph (I guess this is
> possible because there are rings) do we always need to construct these
> from cons-cells?

Of course not.  You can use vectors, structures, (eieio) objects,
anything.

Unfortunately, emacs lisp has no closure, since it has only dynamic
binding, no lexical binding, therefore it is not possible to use
lambdas to implement data structure like in scheme or Common Lisp.
But you still can use the other kind of data types to build your data
abstractions.

> If so how do we realize this?: I know that setcar,setcdr,setf can play
> a key-role here. What happens if I try to print such an expression
> using princ(), prin1()?

Why don't you try it yourself?

(let ((a (cons 1 2)))
  (setf (cdr a) a)
  a)
--> #1=(1 . #1#)

(let ((a (vector 1 2)))
  (setf (aref a 1) a)
  a)
--> #1=[1 #1#]

#= and ## is a notation used to read and print cyclic references.

> For example how should we efficiently store a parse tree of tokens
> having properties?

(defstruct token
   children
   properties)

(make-token :children '() :properties '(blue))
--> [cl-struct-token nil (blue)]

> My suggestion: As a cons-tree of symbols were the symbols are interned
> in an obarray having properties accessed using get() and put().

If you want to use conses to implement the above structure add :type list:

(defstruct (token (:type list))
   children
   properties)

(make-token :children '() :properties '(blue))
--> (nil (blue))

Notice that in emacs lisp, structures are not a native data type,
they're always built upon vectors or cons cells.

If you use the eieio package (from http://cedet.sourceforge.net), you
will get CLOS like objects, (also built upon vectors, but you
shouldn't care).

> If I assign the same long symbol 

Symbols are not short or long.  Symbols are.

They have:

- a name (which is a string),     symbol-name
- an optional value (anything),   symbol-value
- an optional function,           symbol-function
- a property list,                symbol-plist
- various source files,           symbol-file

(which is not to say that they use 5 slots, or more or less.  As I
wrote above, a naive implementation could use a structure to implement symbols 
as:

(defstruct symbol
  name value function plist files
  %hidden-next-in-bucket
  ;; ...
  )

But it is also possible to store things differently.  For example, for
symbol-file, instead of keeping the list of files where the various
aspect of a symbol are defined with each symbol, we could have a list
of aspects, and for each aspect, a list of file with the symbols
defined there.

> to many variables is it assigned by
> reference, kind of like pointers in C? Example:
>   (setq x 'loooooooooooooooooooong 
>         y 'loooooooooooooooooooong)

Yes.   And it'd make no difference if it was (setq x 'a y 'a).

> Is there some web-page out there that highlights the choices the (e)
> lisp designers made concerning these things?

There are tons of web-pages about lisp.

Lisp is the oldest (family of) programming language(s) still in use,
beside Fortran, therefore you will be able to find a lot of papers and
documentation about a lot of ways to implement lisp and what design
choices can be used and what their trade-offs are.

If you're asking about a specific version of gnu emacs, then the best
is to have a look at the source code of emacs.

-- 
__Pascal Bourguignon__

[Prev in Thread]

Current Thread

[Next in Thread]

Seeking Understanding about Symbols, Strings, Structure and Storage, Nordlöw, 2009/11/06
- Re: Seeking Understanding about Symbols, Strings, Structure and Storage, Pascal J. Bourguignon <=

Prev by Date: Re: Length of obarray vectors
Next by Date: Re: Problems with mumamo in org-mode
Previous by thread: Seeking Understanding about Symbols, Strings, Structure and Storage
Next by thread: Length of obarray vectors
Index(es):
- Date
- Thread