bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#74547: 31.0.50; igc: assertion failed in buffer.c


From: Eli Zaretskii
Subject: bug#74547: 31.0.50; igc: assertion failed in buffer.c
Date: Sun, 01 Dec 2024 17:23:45 +0200

> Cc: 74547@debbugs.gnu.org,
>  Óscar Fuentes <oscarfv@telefonica.net>, geza.herman@gmail.com
> Date: Sun, 01 Dec 2024 14:58:10 +0000
> From:  Pip Cet via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> 
> > Pip Cet <pipcet@protonmail.com> writes:
> 
> >> Gerd Möllmann <gerd.moellmann@gmail.com> writes:
> >>> Pip Cet <pipcet@protonmail.com> writes:
> >>> Yes, exactly, json.c. First thing I saw when searching for xfree
> >>>
> >>> static void
> >>> json_parser_done (void *parser)
> >>> {
> >>>   struct json_parser *p = (struct json_parser *) parser;
> >>>   if (p->object_workspace != p->internal_object_workspace)
> >>>     xfree (p->object_workspace);
> >>>
> >>> That at least needs an explanation. I would have expected it to be
> >>> allocated as root.
> >>
> >> Well, the explanation is this comment:
> >>
> >>   /* Lisp_Objects are collected in this area during object/array
> >>      parsing.  To avoid allocations, initially
> >>      internal_object_workspace is used.  If it runs out of space then
> >>      we switch to allocated space.  Important note: with this design,
> >>      GC must not run during JSON parsing, otherwise Lisp_Objects in
> >>      the workspace may get incorrectly collected. */
> >
> > That explains it, indeed :-(.
> 
> Just to be clear, I think the mixed heap/stack allocation is the right
> thing to do here, but we need to let both garbage collectors know about
> the Lisp_Objects we allocated.
> 
> I think the best way to do that is to use a Lisp_Vector when we run out
> of stack space. That way, we don't have to worry about forgetting to GC
> it, and we can use standard functions rather than rolling our own.
> 
> >> Obviously, we cannot make any such guarantees when MPS is in use. (I
> >> don't think we can make the guarantee when MPS is not in use, but I'm
> >> not totally certain; we certainly allocate strings while parsing JSON,
> >> which is sufficient to trigger GC in the MPS case).
> >
> > If json.c calls something like maybe_quit, which I's expect it must,
> > then GC can indeed happen. See bug#56108 for an example in the regexp
> > code found with ASAN. It's not as risky in the old code as with
> > concurrent GC, but anyway.
> 
> There's a rarely_quit in json_parse_array, which, AFAICS, always
> triggers in the first loop iteration (when i == 0), but probably never
> reaches 65536 for the second trigger.
> 
> My proposal is to modify json.c so it uses a lisp vector if more than 64
> objects are needed, and to remove the home-grown symset hash set,
> replacing it by a standard hash table.
> 
> Note that the symset is only used to detect duplicate JSON keys. When
> such duplication is detected, we simply ignore the second plist entry.
> (I think it would be better to throw an error, but the documentation
> disagrees.)
> 
> So here's the patch with the old behavior, where
> 
>   (json-serialize '(a "test" a "ignored"))
> 
> doesn't throw an error and simply returns
> 
> "{\"a\":\"test\"}"

Adding Mattias and Géza, who were involved in implementing json.c.

> commit 85fbd342d3b4a8afabe8078e19be9b45fe3e20d2
> Author: Pip Cet <pipcet@protonmail.com>
> Date:   Sun Dec 1 12:46:08 2024 +0000
> 
>     Use standard Lisp objects in json.c (bug#74547)
>     
>     * src/json.c (json_out_t): Make the symset table a Lisp_Object.
>     (symset_t):
>     (pop_symset):
>     (cleanup_symset_tables):
>     (symset_hash):
>     (symset_expand):
>     (symset_size): Remove.
>     (make_symset_table): Use an ordinary hash table for the symset.
>     (push_symset): Don't return a value.
>     (symset_add): Use ordinary hash table accessors.
>     (cleanup_json_out): Remove.
>     (json_out_object_cons): Use ordinary hash table for symsets.
>     (json_serialize):
>     (json_parser_init):
>     (json_parser_done): Adjust to use ordinary hash table code.
>     (json_make_object_workspace_for_slow_path): Use an ordinary vector for
>     the workspace.
>     (json_parse_array): Avoid calling rarely_quit(0)
>     (json_parser_done): Remove manual memory management.
> 
> diff --git a/src/json.c b/src/json.c
> index eb446f5c221..0e17b893087 100644
> --- a/src/json.c
> +++ b/src/json.c
> @@ -28,7 +28,6 @@ Copyright (C) 2017-2024 Free Software Foundation, Inc.
>  #include "lisp.h"
>  #include "buffer.h"
>  #include "coding.h"
> -#include "igc.h"
>  
>  enum json_object_type
>    {
> @@ -111,161 +110,9 @@ json_parse_args (ptrdiff_t nargs, Lisp_Object *args,
>    ptrdiff_t chars_delta;      /* size - {number of characters in buf} */
>  
>    int maxdepth;
> -  struct symset_tbl *ss_table;       /* table used by containing object */
>    struct json_configuration conf;
>  } json_out_t;
>  
> -/* Set of symbols.  */
> -typedef struct
> -{
> -  ptrdiff_t count;           /* symbols in table */
> -  int bits;                  /* log2(table size) */
> -  struct symset_tbl *table;  /* heap-allocated table */
> -} symset_t;
> -
> -struct symset_tbl
> -{
> -  /* Table used by the containing object if any, so that we can free all
> -     tables if an error occurs.  */
> -  struct symset_tbl *up;
> -  /* Table of symbols (2**bits elements), Qunbound where unused.  */
> -  Lisp_Object entries[];
> -};
> -
> -static inline ptrdiff_t
> -symset_size (int bits)
> -{
> -  return (ptrdiff_t) 1 << bits;
> -}
> -
> -static struct symset_tbl *
> -make_symset_table (int bits, struct symset_tbl *up)
> -{
> -  int maxbits = min (SIZE_WIDTH - 2 - (word_size < 8 ? 2 : 3), 32);
> -  if (bits > maxbits)
> -    memory_full (PTRDIFF_MAX);       /* Will never happen in practice.  */
> -#ifdef HAVE_MPS
> -  struct symset_tbl *st = igc_xzalloc_ambig (sizeof *st + (sizeof 
> *st->entries << bits));
> -#else
> -  struct symset_tbl *st = xmalloc (sizeof *st + (sizeof *st->entries << 
> bits));
> -#endif
> -  st->up = up;
> -  ptrdiff_t size = symset_size (bits);
> -  for (ptrdiff_t i = 0; i < size; i++)
> -    st->entries[i] = Qunbound;
> -  return st;
> -}
> -
> -/* Create a new symset to use for a new object.  */
> -static symset_t
> -push_symset (json_out_t *jo)
> -{
> -  int bits = 4;
> -  struct symset_tbl *tbl = make_symset_table (bits, jo->ss_table);
> -  jo->ss_table = tbl;
> -  return (symset_t){ .count = 0, .bits = bits, .table = tbl };
> -}
> -
> -/* Destroy the current symset.  */
> -static void
> -pop_symset (json_out_t *jo, symset_t *ss)
> -{
> -  jo->ss_table = ss->table->up;
> -#ifdef HAVE_MPS
> -  igc_xfree (ss->table);
> -#else
> -  xfree (ss->table);
> -#endif
> -}
> -
> -/* Remove all heap-allocated symset tables, in case an error occurred.  */
> -static void
> -cleanup_symset_tables (struct symset_tbl *st)
> -{
> -  while (st)
> -    {
> -      struct symset_tbl *up = st->up;
> -#ifdef HAVE_MPS
> -      igc_xfree (st);
> -#else
> -      xfree (st);
> -#endif
> -      st = up;
> -    }
> -}
> -
> -static inline uint32_t
> -symset_hash (Lisp_Object sym, int bits)
> -{
> -  EMACS_UINT hash;
> -#ifdef HAVE_MPS
> -  hash = igc_hash (sym);
> -#else
> -  hash = XHASH (sym);
> -#endif
> -  return knuth_hash (reduce_emacs_uint_to_hash_hash (hash), bits);
> -}
> -
> -/* Enlarge the table used by a symset.  */
> -static NO_INLINE void
> -symset_expand (symset_t *ss)
> -{
> -  struct symset_tbl *old_table = ss->table;
> -  int oldbits = ss->bits;
> -  ptrdiff_t oldsize = symset_size (oldbits);
> -  int bits = oldbits + 1;
> -  ss->bits = bits;
> -  ss->table = make_symset_table (bits, old_table->up);
> -  /* Move all entries from the old table to the new one.  */
> -  ptrdiff_t mask = symset_size (bits) - 1;
> -  struct symset_tbl *tbl = ss->table;
> -  for (ptrdiff_t i = 0; i < oldsize; i++)
> -    {
> -      Lisp_Object sym = old_table->entries[i];
> -      if (!BASE_EQ (sym, Qunbound))
> -     {
> -       ptrdiff_t j = symset_hash (sym, bits);
> -       while (!BASE_EQ (tbl->entries[j], Qunbound))
> -         j = (j + 1) & mask;
> -       tbl->entries[j] = sym;
> -     }
> -    }
> -#ifdef HAVE_MPS
> -  igc_xfree (old_table);
> -#else
> -  xfree (old_table);
> -#endif
> -}
> -
> -/* If sym is in ss, return false; otherwise add it and return true.
> -   Comparison is done by strict identity.  */
> -static inline bool
> -symset_add (json_out_t *jo, symset_t *ss, Lisp_Object sym)
> -{
> -  /* Make sure we don't fill more than half of the table.  */
> -  if (ss->count >= (symset_size (ss->bits) >> 1))
> -    {
> -      symset_expand (ss);
> -      jo->ss_table = ss->table;
> -    }
> -
> -  struct symset_tbl *tbl = ss->table;
> -  ptrdiff_t mask = symset_size (ss->bits) - 1;
> -  for (ptrdiff_t i = symset_hash (sym, ss->bits); ; i = (i + 1) & mask)
> -    {
> -      Lisp_Object s = tbl->entries[i];
> -      if (BASE_EQ (s, sym))
> -     return false;           /* Previous occurrence found.  */
> -      if (BASE_EQ (s, Qunbound))
> -     {
> -       /* Not in set, add it.  */
> -       tbl->entries[i] = sym;
> -       ss->count++;
> -       return true;
> -     }
> -    }
> -}
> -
>  static NO_INLINE void
>  json_out_grow_buf (json_out_t *jo, ptrdiff_t bytes)
>  {
> @@ -283,7 +130,6 @@ cleanup_json_out (void *arg)
>    json_out_t *jo = arg;
>    xfree (jo->buf);
>    jo->buf = NULL;
> -  cleanup_symset_tables (jo->ss_table);
>  }
>  
>  /* Make room for `bytes` more bytes in buffer.  */
> @@ -442,8 +288,8 @@ json_out_unnest (json_out_t *jo)
>  static void
>  json_out_object_cons (json_out_t *jo, Lisp_Object obj)
>  {
> +  Lisp_Object symset = CALLN (Fmake_hash_table, QCtest, Qeq);
>    json_out_nest (jo);
> -  symset_t ss = push_symset (jo);
>    json_out_byte (jo, '{');
>    bool is_alist = CONSP (XCAR (obj));
>    bool first = true;
> @@ -469,8 +315,9 @@ json_out_object_cons (json_out_t *jo, Lisp_Object obj)
>        key = maybe_remove_pos_from_symbol (key);
>        CHECK_TYPE (BARE_SYMBOL_P (key), Qsymbolp, key);
>  
> -      if (symset_add (jo, &ss, key))
> +      if (NILP (Fgethash (key, symset, Qnil)))
>       {
> +       Fputhash (key, Qt, symset);
>         if (!first)
>           json_out_byte (jo, ',');
>         first = false;
> @@ -486,7 +333,6 @@ json_out_object_cons (json_out_t *jo, Lisp_Object obj)
>      }
>    CHECK_LIST_END (tail, obj);
>    json_out_byte (jo, '}');
> -  pop_symset (jo, &ss);
>    json_out_unnest (jo);
>  }
>  
> @@ -591,7 +437,6 @@ json_serialize (json_out_t *jo, Lisp_Object object,
>    jo->capacity = 0;
>    jo->chars_delta = 0;
>    jo->buf = NULL;
> -  jo->ss_table = NULL;
>    jo->conf.object_type = json_object_hashtable;
>    jo->conf.array_type = json_array_array;
>    jo->conf.null_object = QCnull;
> @@ -729,6 +574,7 @@ #define JSON_PARSER_INTERNAL_BYTE_WORKSPACE_SIZE 512
>    Lisp_Object internal_object_workspace
>    [JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE];
>    Lisp_Object *object_workspace;
> +  Lisp_Object object_workspace_vector;
>    size_t object_workspace_size;
>    size_t object_workspace_current;
>  
> @@ -796,6 +642,7 @@ json_parser_init (struct json_parser *parser,
>    parser->object_workspace_size
>      = JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE;
>    parser->object_workspace_current = 0;
> +  parser->object_workspace_vector = Qnil;
>  
>    parser->byte_workspace = parser->internal_byte_workspace;
>    parser->byte_workspace_end = (parser->byte_workspace
> @@ -806,8 +653,6 @@ json_parser_init (struct json_parser *parser,
>  json_parser_done (void *parser)
>  {
>    struct json_parser *p = (struct json_parser *) parser;
> -  if (p->object_workspace != p->internal_object_workspace)
> -    xfree (p->object_workspace);
>    if (p->byte_workspace != p->internal_byte_workspace)
>      xfree (p->byte_workspace);
>  }
> @@ -818,6 +663,11 @@ json_parser_done (void *parser)
>  json_make_object_workspace_for_slow_path (struct json_parser *parser,
>                                         size_t size)
>  {
> +  if (NILP (parser->object_workspace_vector))
> +    {
> +      parser->object_workspace_vector =
> +     Fvector(parser->object_workspace_current, parser->object_workspace);
> +    }
>    size_t needed_workspace_size
>      = (parser->object_workspace_current + size);
>    size_t new_workspace_size = parser->object_workspace_size;
> @@ -829,23 +679,13 @@ json_make_object_workspace_for_slow_path (struct 
> json_parser *parser,
>       }
>      }
>  
> -  Lisp_Object *new_workspace_ptr;
> -  if (parser->object_workspace_size
> -      == JSON_PARSER_INTERNAL_OBJECT_WORKSPACE_SIZE)
> -    {
> -      new_workspace_ptr
> -     = xnmalloc (new_workspace_size, sizeof (Lisp_Object));
> -      memcpy (new_workspace_ptr, parser->object_workspace,
> -           (sizeof (Lisp_Object)
> -            * parser->object_workspace_current));
> -    }
> -  else
> -    {
> -      new_workspace_ptr
> -     = xnrealloc (parser->object_workspace, new_workspace_size,
> -                  sizeof (Lisp_Object));
> -    }
> +  Lisp_Object new_workspace_vector =
> +    larger_vector (parser->object_workspace_vector,
> +                new_workspace_size - parser->object_workspace_size, -1);
> +
> +  Lisp_Object *new_workspace_ptr = XVECTOR (new_workspace_vector)->contents;
>  
> +  parser->object_workspace_vector = new_workspace_vector;
>    parser->object_workspace = new_workspace_ptr;
>    parser->object_workspace_size = new_workspace_size;
>  }
> @@ -1476,7 +1316,7 @@ json_parse_array (struct json_parser *parser)
>       result = make_vector (number_of_elements, Qnil);
>       for (size_t i = 0; i < number_of_elements; i++)
>         {
> -         rarely_quit (i);
> +         rarely_quit (~i);
>           ASET (result, i, parser->object_workspace[first + i]);
>         }
>       parser->object_workspace_current = first;
> 
> 
> 
> 
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]