emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Converting CC Mode's obarrays to hash tables. [Was: new `obarray` ty


From: Stefan Monnier
Subject: Re: Converting CC Mode's obarrays to hash tables. [Was: new `obarray` type]
Date: Mon, 24 Jul 2017 10:06:05 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.0.50 (gnu/linux)

> As for c-keywords-obarray (an obarray which has C (etc.) keywords as its
> symbols, whose property lists contain the various keyword categories the
> keywords belong to) - what is the benefit of holding collections of
> symbols in hash tables rather than obarrays?

I'd return the question.  What's the benefit of using an extra
indirection through symbols to hold plists, instead of just using the
plist directly.

For reference, here's the relevant part of the patch:

    diff --git a/lisp/progmodes/cc-engine.el b/lisp/progmodes/cc-engine.el
    index a5ade09791..abddbdf5bf 100644
    --- a/lisp/progmodes/cc-engine.el
    +++ b/lisp/progmodes/cc-engine.el
    @@ -535,14 +535,14 @@ c-keyword-sym
       ;; Return non-nil if the string KEYWORD is a known keyword.  More
       ;; precisely, the value is the symbol for the keyword in
       ;; `c-keywords-obarray'.
    -  (intern-soft keyword c-keywords-obarray))
    +  (gethash keyword c-keywords-table))
     
     (defsubst c-keyword-member (keyword-sym lang-constant)
       ;; Return non-nil if the symbol KEYWORD-SYM, as returned by
       ;; `c-keyword-sym', is a member of LANG-CONSTANT, which is the name
       ;; of a language constant that ends with "-kwds".  If KEYWORD-SYM is
       ;; nil then the result is nil.
    -  (get keyword-sym lang-constant))
    +  (plist-get keyword-sym lang-constant))
     
     ;; String syntax chars, suitable for skip-syntax-(forward|backward).
     (defconst c-string-syntax (if (memq 'gen-string-delim c-emacs-features)
    
    diff --git a/lisp/progmodes/cc-langs.el b/lisp/progmodes/cc-langs.el
    index 3b455fc090..8b6562f7df 100644
    --- a/lisp/progmodes/cc-langs.el
    +++ b/lisp/progmodes/cc-langs.el
    @@ -2749,8 +2747,8 @@ 'c-opt-op-identitier-prefix
            (setcdr elem (cons lang-const (cdr elem))))))
           result-alist))
     
    -(c-lang-defvar c-keywords-obarray
    -  ;; An obarray containing all keywords as symbols.  The property list
    +(c-lang-defvar c-keywords-table
    +  ;; A hash table containing all keywords as symbols.  The property list
       ;; of each symbol has a non-nil entry for the specific `*-kwds'
       ;; lists it's a member of.
       ;;
    @@ -2767,22 +2765,23 @@ 'c-opt-op-identitier-prefix
     
       (let* ((alist (c-lang-const c-keyword-member-alist))
         kwd lang-const-list
    -    (obarray (make-vector (* (length alist) 2) 0)))
    +    (table (make-hash-table :test #'equal)))
         (while alist
           (setq kwd (caar alist)
            lang-const-list (cdar alist)
            alist (cdr alist))
    -      (setplist (intern kwd obarray)
    -           ;; Emacs has an odd bug that causes `mapcan' to fail
    -           ;; with unintelligible errors.  (XEmacs works.)
    -           ;; (2015-06-24): This bug has not yet been fixed.
    -           ;;(mapcan (lambda (lang-const)
    -           ;;            (list lang-const t))
    -           ;;          lang-const-list)
    -           (apply 'nconc (mapcar (lambda (lang-const)
    +      (puthash kwd
    +          ;; Emacs has an odd bug that causes `mapcan' to fail
    +          ;; with unintelligible errors.  (XEmacs works.)
    +          ;; (2015-06-24): This bug has not yet been fixed.
    +          ;;(mapcan (lambda (lang-const)
    +          ;;             (list lang-const t))
    +          ;;           lang-const-list)
    +          (apply #'nconc (mapcar (lambda (lang-const)
                                        (list lang-const t))
    -                                 lang-const-list))))
    -    obarray))
    +                                 lang-const-list))
    +          table))
    +    table))
     
     (c-lang-defconst c-regular-keywords-regexp
       ;; Adorned regexp matching all keywords that should be fontified

BTW, having re-looked at the code, I see that those plists are really
justs sets, so it could be optimized even further, as in:

    diff --git a/lisp/progmodes/cc-engine.el b/lisp/progmodes/cc-engine.el
    index a5ade09791..abddbdf5bf 100644
    --- a/lisp/progmodes/cc-engine.el
    +++ b/lisp/progmodes/cc-engine.el
    @@ -535,14 +535,14 @@ c-keyword-sym
       ;; Return non-nil if the string KEYWORD is a known keyword.  More
       ;; precisely, the value is the symbol for the keyword in
       ;; `c-keywords-obarray'.
    -  (intern-soft keyword c-keywords-obarray))
    +  (gethash keyword c-keywords-table))
     
     (defsubst c-keyword-member (keyword-sym lang-constant)
       ;; Return non-nil if the symbol KEYWORD-SYM, as returned by
       ;; `c-keyword-sym', is a member of LANG-CONSTANT, which is the name
       ;; of a language constant that ends with "-kwds".  If KEYWORD-SYM is
       ;; nil then the result is nil.
    -  (get keyword-sym lang-constant))
    +  (memq keyword-sym lang-constant))
     
     ;; String syntax chars, suitable for skip-syntax-(forward|backward).
     (defconst c-string-syntax (if (memq 'gen-string-delim c-emacs-features)
    
    diff --git a/lisp/progmodes/cc-langs.el b/lisp/progmodes/cc-langs.el
    index 3b455fc090..8b6562f7df 100644
    --- a/lisp/progmodes/cc-langs.el
    +++ b/lisp/progmodes/cc-langs.el
    @@ -2749,8 +2747,8 @@ 'c-opt-op-identitier-prefix
            (setcdr elem (cons lang-const (cdr elem))))))
           result-alist))
     
    -(c-lang-defvar c-keywords-obarray
    -  ;; An obarray containing all keywords as symbols.  The property list
    +(c-lang-defvar c-keywords-table
    +  ;; A hash table containing all keywords as symbols.  The value
       ;; of each keyword is a set of symbols indicating presence in
       ;;    each `*-kwds' lists.
       ;;
    @@ -2767,22 +2765,23 @@ 'c-opt-op-identitier-prefix
     
       (let* ((alist (c-lang-const c-keyword-member-alist))
         kwd lang-const-list
    -    (obarray (make-vector (* (length alist) 2) 0)))
    +    (table (make-hash-table :test #'equal)))
         (while alist
           (setq kwd (caar alist)
            lang-const-list (cdar alist)
            alist (cdr alist))
    -      (setplist (intern kwd obarray)
    -           ;; Emacs has an odd bug that causes `mapcan' to fail
    -           ;; with unintelligible errors.  (XEmacs works.)
    -           ;; (2015-06-24): This bug has not yet been fixed.
    -           ;;(mapcan (lambda (lang-const)
    -           ;;            (list lang-const t))
    -           ;;          lang-const-list)
    -           (apply 'nconc (mapcar (lambda (lang-const)
                                        (list lang-const t))
    -                                 lang-const-list))))
    -    obarray))
    +      (puthash kwd (mapcar #'car lang-const-list)
    +               table))
    +    table))
     
     (c-lang-defconst c-regular-keywords-regexp
       ;; Adorned regexp matching all keywords that should be fontified

An obarray, is just a hash-table indexed by strings where every hash
entry holds a 4-field struct (the fields are: `value`, `function`, and
`plist`, plus the read-only `name` field) with some restrictions (a
given such struct can only be placed in a single obarray; you can
remove such a symbol from an obarray, but you can't put it back in).

It's really rare for a program to need a hash-table where the values
carried by each entry happen to have just the shape of symbols.
It's common for the extra restrictions to be satisfied, but it's rare
for them to be useful.

c-lang-constants is one of the rare cases where you managed to make
fairly good use of all 3 fields.  Of course, rather than cl-structs, we
could use (uninterned) symbols in the hash-table to recover the speed
advantage of symbols (due to the fact that symbol-value,
get, and symbol-function have their own bytecode, and symbol-plist is
also hard-coded in C; so they are faster than cl-lib's struct-field
access which uses aref after manually checking that the object has the
expected type; and although `aref' also has its own bytecode it's made
slower by the need to handle various array types and to check array
bounds).

> If this makes a program faster, it suggests that the obarray
> implementation could be improved to match the speed of the hash
> table implementation.

Indeed.  But in most cases, the source code is made more clear when
using the hash-table API rather than the obarray API.

I think we'd be better off declaring the second arg of `intern' as obsolete.
Or make it accept a hash-table as second argument.

> Why don't we store the main Emacs obarray in a hash table, if this
> will increase speed?

Lack of time/motivation (at least on my part).


        Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]