bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22241: 25.0.50; etags Ruby parser problems


From: Dmitry Gutov
Subject: bug#22241: 25.0.50; etags Ruby parser problems
Date: Sat, 23 Jan 2016 21:23:57 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:44.0) Gecko/20100101 Thunderbird/44.0

On 01/23/2016 07:38 PM, Eli Zaretskii wrote:

I don't speak Ruby.  So please give a more detailed spec for the
features you want added.  I wrote some questions below, but I'm quite
sure there are more questions I should ask, but don't know about.  So
please provide as complete specification for each feature as you
possibly can, TIA.

There's no actual up-to-date language spec, and when in doubt, I fire up the REPL and try things out (and forget many of the results afterwards). So there's no "detailed spec" in my head. Let me just try my best answering your questions, for now.

- Constants are not indexed.

What is the full syntax of a "constant"?  Is it just

   IDENTIFIER "=" INTEGER-NUMBER

Pretty much. IDENTIFIER should be ALL_CAPS, or CamelCase, with underscores allowed.

INTEGER-NUMBER should be just EXPRESSION, because it can be any expression, possibly a multiline one.

CamelCase constants usually are assigned some "anonymous class" value, like in the following example:

SpecialError = Class.new(StandardError)

(Which is a metaprogramming-y way to define the class SpecialError).

But you probably shouldn't worry about ALL_CAPS vs CamelCase distinction here, and just treat them the same.

?  Is whitespace significant?  What about newlines?

No spaces around "=" is fine. Spaces can also be replaced by tabs. A newline before "=" is not allowed.

- Class methods (def self.foo) are given the wrong name ("self."
   shouldn't be included).

Is it enough to remove a single "self.", case-sensitive, at the
beginning of an identifier?  Can there be more than one, like
"self.self.SOMETHING"?

One one "self." is allowed. When you remove it, you should record that SOMETHING is a method defined on the current class (or module). In Java terms, say, it would be like "static" method.

The upshot is, it can be called on the class itself, but not on its instance:

irb(main):001:0> class C
irb(main):002:1> def self.foo
irb(main):003:2> 3
irb(main):004:2> end
irb(main):005:1> end
=> nil
irb(main):006:0> C.foo
=> 3
irb(main):007:0> C.new.foo
NoMethodError: undefined method `foo' for #<C:0x000000020141e8>

So the qualified name of that method should be "C.foo", as opposed to "C#foo" for an instance method.

Your other example, i.e.

   def ModuleExample.singleton_module_method

indicates that anything up to and including the period should be
removed, is that correct?

More or less. This is an "explicit syntax", which is equivalent to using "self.". These two declarations are equivalent:

module ModuleExample
  def ModuleExample.foo
  end
end

module ModuleExample
  def self.foo
  end
end

Is there only one, or can there be many?

There can be only one dot there. There could be a method resolution operator (::) in there, I suppose, but I'm not sure if you want to add support for that right now, or ever.

Should they all be removed for an unqualified name?

Yes.

- "class << self" blocks are given a separate entry.

What should be done instead?  Can't a class be named "<<"?

A class cannot be named "<<". You should not add that line to the index, but record that the method definitions inside the following scope are defined on the current class or module. These are equivalent:

class C
  def self.foo
  end
end

class C
  class << self
    def foo
    end
  end
end

- Qualified tag names are never generated.

(Etags never promised qualified names except for C and derived
languages, and also in Java.)

OK, that would be a nice bonus, but we can live without it. ctags doesn't define qualified names either.

Without qualified names, I suppose you should treat

def self.foo
end

and

def foo
end

and

def Class.foo
end

the same. Only record those as "foo".

How to know when a module's or a class's scope ends?  Is it enough to
count "end" lines?

Hmm, maybe? I'm guessing etags doesn't really handle heredoc syntax, or multiline strings defined with percent literals (examples here: https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Literals#.22Here_document.22_notation)

The result shouldn't be too bad if you do that, anyway. Except:

Can I assume that "end" will always appear by
itself on a line?

Unfortunately, no. It can also be on the same line, after a semicolon (or on any other line, I suppose, but nobody writes Ruby like that). Examples:

class SpecialError < StandardError; end

or

class MyStruct < Struct.new(:a, :b, :c); end

(One could also stick a method definition inside that, but I haven't seen that in practice yet). So, either:

- 'end' is on a separate line (after ^[ \t]*).
- class/module Name[< ]...; end$

'end' can also be followed by "# some comment" in both cases.

Can I disregard indentation of "end" (and of
everything else) when I determine where a scope begins and ends?

Probably, yes.

Indentation is not significant in Ruby, but heredocs can mess up the detection of 'end' keywords, so we could use indentation as a way to detect where each scope ends. But if etags doesn't normally do that, let's not go there now.

A
A::B
A::B::ABC
A::B#foo!
A::B.bar?
A::B.qux=

Why did 'foo!' get a '#' instead of a '.', as for '_bar'?

It's common to use '#' in the qualified names of instance methods, in Java, Ruby and JS docstrings. '.' is used for class methods (static methods, in Java), or methods defined on other singleton objects.

Examples:

http://usejsdoc.org/tags-inline-link.html (search for '#' there)
http://stackoverflow.com/questions/5915992/javadoc-writing-links-to-methods
http://docs.ruby-lang.org/en/2.1.0/RDoc/Markup.html#class-RDoc::Markup-label-Links (the documentation also says to use ":: for class methods", but let's not do that)

> Why doesn't
> "class << self" count as a class scope, and add something to qualified
> names?

It just served to turn 'qux=' into a class (static) method.

should become (the unqualified version):

A
foo
bar=
tee
tee=
qux

All attr_* methods can take a variable number of arguments. The parser
should take each argument, check that it's a symbol and not a variable
(starts with :), and if so, record the corresponding method name.

Why did 'bar' and 'tee' git a '=' appended?

Because 'attr_writer :bar' effectively expands to

def bar=(val)
  @bar = val
end

and 'attr_accessor :tee' expands into

def tee
  @tee
end

def tee=(val)
  @tee = val
end

Are there any other such "append rules"?

There are other macros (any code can define a macro), but let's not worry about them now.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]