emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#66674: closed (30.0.50; Upstream tree-sitter and treesit disagree ab


From: GNU bug Tracking System
Subject: bug#66674: closed (30.0.50; Upstream tree-sitter and treesit disagree about fields)
Date: Mon, 11 Dec 2023 01:04:02 +0000

Your message dated Sun, 10 Dec 2023 17:02:48 -0800
with message-id <b68dc004-f292-4dfc-bbce-3c8e38370903@gmail.com>
and subject line Re: bug#66674: 30.0.50; Upstream tree-sitter and treesit 
disagree about fields
has caused the debbugs.gnu.org bug report #66674,
regarding 30.0.50; Upstream tree-sitter and treesit disagree about fields
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs@gnu.org.)


-- 
66674: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=66674
GNU Bug Tracking System
Contact help-debbugs@gnu.org with problems
--- Begin Message --- Subject: 30.0.50; Upstream tree-sitter and treesit disagree about fields Date: Sat, 21 Oct 2023 22:36:30 +0200
Using tree-sitter's CLI as well as the publicly hosted playground
produce different parse trees than treesit in Emacs. Specifically, the
assignment of nodes to named fields differs.

Given the following C source:

    void main() {
      int x = // foo
        1+
        // comment
        2;
    }

treesit-explore-mode displays the following tree:

    (translation_unit
     (function_definition type: (primitive_type)
      declarator: 
       (function_declarator declarator: (identifier)
        parameters: (parameter_list ( )))
      body: 
       (compound_statement {
        (declaration type: (primitive_type)
         declarator: 
          (init_declarator declarator: (identifier) = value: (comment)
           (binary_expression left: (number_literal) operator: + right: 
(comment) (number_literal)))
         ;)
        })))

Note how in the init_declarator node, the 'value' field is a comment
node, and similarly for the 'right' field in the binary_expression node.

Running 'tree-sitter parse file.c', on the other hand, produces the
following tree:

    (translation_unit [0, 0] - [6, 0]
      (function_definition [0, 0] - [5, 1]
        type: (primitive_type [0, 0] - [0, 4])
        declarator: (function_declarator [0, 5] - [0, 11]
          declarator: (identifier [0, 5] - [0, 9])
          parameters: (parameter_list [0, 9] - [0, 11]))
        body: (compound_statement [0, 12] - [5, 1]
          (declaration [1, 2] - [4, 6]
            type: (primitive_type [1, 2] - [1, 5])
            declarator: (init_declarator [1, 6] - [4, 5]
              declarator: (identifier [1, 6] - [1, 7])
              (comment [1, 10] - [1, 16])
              value: (binary_expression [2, 4] - [4, 5]
                left: (number_literal [2, 4] - [2, 5])
                (comment [3, 4] - [3, 14])
                right: (number_literal [4, 4] - [4, 5])))))))

Here, the two comment nodes appear as unnamed nodes. IMHO the second
tree is a more useful one, as the named fields contain the semantically
important subtrees (e.g. a binary expression is made up of a left and
right subtree, not a left subtree, a right comment, and then some
unnamed subtree.)

Emacs's tree makes writing queries less convenient, as instead of being
able to refer to well-defined names, one has to rely on child indices to
account for comments.


Further mismatch arises from repeated fields and separators.

Consider the following Go source:

    package pkg
    
    var a, b, c = 1, 2, 3

treesit-explore-mode displays the following tree:

    (source_file
     (package_clause package (package_identifier))
     \n
     (var_declaration var
      (var_spec name: (identifier) name: , (identifier) value: , (identifier) =
       (expression_list (int_literal) , (int_literal) , (int_literal))))
     \n)

Here, the var_spec node has two fields named 'name' even though the
source specifies three names. Furthermore, The second 'name', as well as
'value' are set to the ',' separator between identifiers. Two of the three
identifiers aren't named.

'tree-sitter parse file.go', on the other hand, produces this more
accurate tree:

    (source_file [0, 0] - [2, 21]
      (package_clause [0, 0] - [0, 11]
        (package_identifier [0, 8] - [0, 11]))
      (var_declaration [2, 0] - [2, 21]
        (var_spec [2, 4] - [2, 21]
          name: (identifier [2, 4] - [2, 5])
          name: (identifier [2, 7] - [2, 8])
          name: (identifier [2, 10] - [2, 11])
          value: (expression_list [2, 14] - [2, 21]
            (int_literal [2, 14] - [2, 15])
            (int_literal [2, 17] - [2, 18])
            (int_literal [2, 20] - [2, 21])))))

This reproduces with 29.1 as well as 30.0.50.



--- End Message ---
--- Begin Message --- Subject: Re: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Date: Sun, 10 Dec 2023 17:02:48 -0800 User-agent: Mozilla Thunderbird


On 12/10/23 6:28 AM, Dominik Honnef wrote:
Yuan Fu <casouri@gmail.com> writes:

On 11/25/23 2:03 AM, Eli Zaretskii wrote:
Ping! Ping!  Yuan, please chime in.

Cc: 66674@debbugs.gnu.org, dominik@honnef.co
Date: Sun, 19 Nov 2023 12:08:08 +0200
From: Eli Zaretskii <eliz@gnu.org>

Ping!  Yuan, any comments?

Cc: 66674@debbugs.gnu.org
Date: Wed, 25 Oct 2023 16:03:10 +0300
From: Eli Zaretskii <eliz@gnu.org>

From: Dominik Honnef <dominik@honnef.co>
Date: Sat, 21 Oct 2023 22:36:30 +0200

Using tree-sitter's CLI as well as the publicly hosted playground
produce different parse trees than treesit in Emacs. Specifically, the
assignment of nodes to named fields differs.

Given the following C source:

      void main() {
        int x = // foo
          1+
          // comment
          2;
      }

treesit-explore-mode displays the following tree:

      (translation_unit
       (function_definition type: (primitive_type)
        declarator:
         (function_declarator declarator: (identifier)
          parameters: (parameter_list ( )))
        body:
         (compound_statement {
          (declaration type: (primitive_type)
           declarator:
            (init_declarator declarator: (identifier) = value: (comment)
             (binary_expression left: (number_literal) operator: + right: 
(comment) (number_literal)))
           ;)
          })))

Note how in the init_declarator node, the 'value' field is a comment
node, and similarly for the 'right' field in the binary_expression node.

Running 'tree-sitter parse file.c', on the other hand, produces the
following tree:

      (translation_unit [0, 0] - [6, 0]
        (function_definition [0, 0] - [5, 1]
          type: (primitive_type [0, 0] - [0, 4])
          declarator: (function_declarator [0, 5] - [0, 11]
            declarator: (identifier [0, 5] - [0, 9])
            parameters: (parameter_list [0, 9] - [0, 11]))
          body: (compound_statement [0, 12] - [5, 1]
            (declaration [1, 2] - [4, 6]
              type: (primitive_type [1, 2] - [1, 5])
              declarator: (init_declarator [1, 6] - [4, 5]
                declarator: (identifier [1, 6] - [1, 7])
                (comment [1, 10] - [1, 16])
                value: (binary_expression [2, 4] - [4, 5]
                  left: (number_literal [2, 4] - [2, 5])
                  (comment [3, 4] - [3, 14])
                  right: (number_literal [4, 4] - [4, 5])))))))

Here, the two comment nodes appear as unnamed nodes. IMHO the second
tree is a more useful one, as the named fields contain the semantically
important subtrees (e.g. a binary expression is made up of a left and
right subtree, not a left subtree, a right comment, and then some
unnamed subtree.)

Emacs's tree makes writing queries less convenient, as instead of being
able to refer to well-defined names, one has to rely on child indices to
account for comments.


Further mismatch arises from repeated fields and separators.

Consider the following Go source:

      package pkg
var a, b, c = 1, 2, 3

treesit-explore-mode displays the following tree:

      (source_file
       (package_clause package (package_identifier))
       \n
       (var_declaration var
        (var_spec name: (identifier) name: , (identifier) value: , (identifier) 
=
         (expression_list (int_literal) , (int_literal) , (int_literal))))
       \n)

Here, the var_spec node has two fields named 'name' even though the
source specifies three names. Furthermore, The second 'name', as well as
'value' are set to the ',' separator between identifiers. Two of the three
identifiers aren't named.

'tree-sitter parse file.go', on the other hand, produces this more
accurate tree:

      (source_file [0, 0] - [2, 21]
        (package_clause [0, 0] - [0, 11]
          (package_identifier [0, 8] - [0, 11]))
        (var_declaration [2, 0] - [2, 21]
          (var_spec [2, 4] - [2, 21]
            name: (identifier [2, 4] - [2, 5])
            name: (identifier [2, 7] - [2, 8])
            name: (identifier [2, 10] - [2, 11])
            value: (expression_list [2, 14] - [2, 21]
              (int_literal [2, 14] - [2, 15])
              (int_literal [2, 17] - [2, 18])
              (int_literal [2, 20] - [2, 21])))))

This reproduces with 29.1 as well as 30.0.50.
Yuan, any comments or suggestions?
Sorry sorry sorry, another missed report. I think this is a bug in
treesit-explore-mode, I'll work on fixing it!

Yuan
I don't think that's the case, at least not exclusively. I used
treesit-explore-mode to debug patterns that matched in the playground
but not in Emacs. The matching behavior seemed pretty in line with what
treesit-explore-mode reported.
I do find that treesit-node-field-name are returning wrong field names, that's why in the first example, you see the "value" field name given to the comment node, rather than the binary_expression behind it. In the actual parse tree, "value" belongs to binary_expression. With the fixed I just pushed to emacs-29, the explorer parse tree for the first example becomes

(translation_unit
 (function_definition type: (primitive_type)
  declarator:
   (function_declarator declarator: (identifier)
    parameters: (parameter_list ( )))
  body:
   (compound_statement {
    (declaration type: (primitive_type)
     declarator:
      (init_declarator declarator: (identifier) = (comment)
       value: (binary_expression left: (number_literal) operator: +
                                 operator: (comment)
               right: (number_literal)))
     ;)
    })))

which should match the playground.

If you can find the pattern that matches in the playground but doesn't in Emacs, do please post it and I can see if there's anything wrong.

Yuan


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]