Re: Emacs contributions, C and Lisp

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs contributions, C and Lisp

From:	Eric Ludlam
Subject:	Re: Emacs contributions, C and Lisp
Date:	Fri, 09 Jan 2015 23:06:35 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0

On 01/09/2015 10:09 AM, David Engster wrote:

Richard Stallman writes:

You and several others are trying to pressure me to decide to make
GCC output the full AST.  I have seen insults and harassment.


Not from me, and I haven't seen anything like it on this thread.

This is not the way to convince me.  It is the way to make me resent
your behavior.


I've no idea what I've done to earn your resentment. I think my behavior
was entirely reasonable, given that I've started with this only because
you asked to base our tooling efforts on GCC. Anyway, you don't have to
worry that I'll continue with this.

This conversation seems unnecessarily final. Richard has valid concernsthat need details, but since the AST (which I know almost nothing about)is so huge (based on parser's I've written), no matter how many detailswe may think of that are needed, someone else can think of a bit thatcould indeed be unnecessary.

I wrote the first run for the "smart completion" engine that currentlyships in Emacs, the parts of CEDET that includes EDE and Semantic.While I personally think it is pretty awesome, it really isn't hard tofool it which is where a lot of this GCC interest comes from. It tookmany years of my part-time work (and contributions from others likeDavid) to assemble what is there now into a robust well tested system.

The basic pieces of the system which is implemented in Emacs Lispconsists of a parser generator plus some parsers written in a bison-likesyntax including a C++ parser. Due to limitations of Emacs'performance, only the parts of the language that handle definitions areimplemented. (ie - tags for functions, variables, structures, etc.)The parser outputs a tag table with lots of details. Having a fullparser generator and the parser is what makes this convenient to do.Hacks like etags, GNU Global, etc can't produce enough information forthe next step.

The next step is the completion engine. This is where regexp hacksexist to "parse" a statement like:


  i = foo.bar.substring

which peels it apart into a variable "i", a notion of assignment, and("foo" "bar" "substring") via several assumptions, such as that usersdon't write code like this:


  i /* some variable */
  = /* equals */
  foo. /* mystruct */
  bar.substring

The engine then goes and looks up i in reverse to see what it is. Itthen looks up foo in various tables that get built of known symbols,derives the data type, and thus members of foo. It iterates downthrough the "." symbols dereferencing each symbol by data type to get tothe next step. This depends on the fact that most projects compile alltheir headers "the same way" so that tables parsed from some headerincluded in this C file will have the same symbols when included in adifferent C file.

With that background, there are a couple options for a GCC plugin. Oneoption would be to have one plugin that outputs tags compatible withsome standard. Naturally I suggest the one already in Emacs. A secondplugin could be used to figure out all the state I mentioned earlierwhen looking up symbols, and provide completions directly (ie - a listof text strings to offer as completions.) That plugin would ONLY beused for completion, and all the internal logic couldn't be reused foranother purpose in Emacs.

The alternative is to dump out the AST into an Emacs friendly form, andwrite the above logic in Emacs instead. This is convenient becauseEmacs is easy to hack, and gcc plugins (based on what I've been reading)are really complicated. In terms of "get up and running quickly",dumping a big scary data structure out of a scary environment into afriendly easy to hack environment is a desirable path for us, and asRichard points out, for non-free software.

I personally think that if there were a good way to bridge the gap sothat gcc could directly output tags for the existing Semantic engine,then there is an incremental benefit of nearly perfect tag generationfor the existing tool AND a performance boost. It won't solve the wholeproblem though. To solve the rest of it, we'd need a gcc plugin toparse a file up to a chosen point. For a file with 1000 lines of codeand included headers, gcc needs the WHOLE AST to make sense of "the lastline", or the part that needs the completion. This is because we can'tguess at what isn't needed until you've actually processed it all. Thisis where the boundary between gcc and Emacs comes in. In theory, theGCC plugin could process the AST and output ONLY the completions, orONLY whatever was asked for (local types, scope information, refactoringdata or what not.) An alternative might be to output a subset of theAST for processing in Emacs that is local to the the completion area,and depend on our old Emacs code to do the type lookup, etc. This wouldimprove the current completion, but still could be fooled based on thequality of the Emacs data which is now, by definition, incomplete. Inthe past (call it year 2000) people thought my smart completion was lame(ie - inaccurate) and slow compared to dynamic abbreviation completionwhere claims of "dabbrevs is good enough" were stated. This proposalcould be "good enough" for this single feature.

So, I've laid out some scenarios that are "not full AST" friendly.There are some benefits (performance), and tradeoffs (difficulty). Evenso, we've only touched on one feature. There are lots of other featuresin the existing Semantic tool already in Emacs derived from having aparser built right into Emacs, such as highlighting code with syntaxerrors (but only the code for definitions, not the logic.) I have along list of other things I'd love to do to such as redo font-lock withthe many "hints" about what your code is doing that only the compilercould know, but can't because writing a parser from scratch is actuallypretty hard and error prone regardless of doing so in Emacs. Many folkshave touched on those features in a myriad of other thread replies inthis mailing list. I've taken my best stab at some of them that seemedattainable, but feel I've gone as far as I can aside from someincremental improvements, or just adding new languages.

I've been very thankful for David's help with the CEDET project, and themany improvements in our existing smart completion engine he's made.For myself, and I imagine David, having gone through that and perseveredsimulating a compiler for so long to try and get these features only tohave dabbrev people scoff on one side and clang users sneer on the otheris disheartening. The hope of having a "real compiler" to lean on couldopen so many doors for us we just can't get to right now it is hard notto be discouraged by non technical issues.

I would hope that David, who is looking into the gcc plugin route, andRichard can find a reasonable compromise that enables Emacs to have datafrom gcc that would enabled our existing tools to grow in its accuracy,and would encourage contributions from folks who do not have the skillsto hack gcc plugins create new features. I suspect that isn't possibleuntil someone learns more about gcc's AST and thinks about what a goodabstraction model for Emacs is, and how it could be applied to theexisting pretty good smart completion system.


Eric

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Emacs contributions, C and Lisp, (continued)

Prev by Date: Re: Shrinking EIEIO objects
Next by Date: vc-dir default directory: repository root?
Previous by thread: Re: Emacs contributions, C and Lisp
Next by thread: Re: Emacs contributions, C and Lisp
Index(es):
- Date
- Thread