Re: dataframe dereferencing

On Fri, Sep 3, 2010 at 2:55 AM, Jaroslav Hajek <address@hidden> wrote:

> I see your point--you think it's a performance issue, but I think it is
> incorrect to assume that subsetting a dataframe is necessarily
> inefficient. Really, that's a question of implementation not semantics. I
> don't think that linguistic novelty is a good approach to optimization. Two
> competing semantic models is a bad thing.

Competing? Oh no, these would be just happily co-existing :) Besides,
for a dataframe df there are actually two cell conversions, df.cell
and df.as.cell, and you need to distinguish between them.

Let's forget about "{}" indexing for now--I need to study the cs-list stuff more. However, my condensed opinion is that the emulated postfix OOP is a terrible idea. Honestly, it accomplishes nothing and only ads complexity because the language is designed otherwise. What I'm saying is do something like ascell(df) or frame2cell(df) instead of df.as.cell() that follows patterns used elsewhere in the language and practice. I do not think anything is gained by pretending that octave follows postfix-style OO, it's just confusing and it's easy to fall into the mental trap of thinking that foo.changestate() actually does something to foo. It's more true to reality to require foo=changestate(foo). Whether we like it or not, prefix notation has been chosen. The builtin mechanisms for operator overloading polymorphism rely on prefix notation. Injecting postfix notation seems pointless and my feeling is it will lead to unforeseen difficulties and suffering.

> Personally, I think octave's internal function dispatch is
> always going to be faster than a cobbled-together m-file-based dispatch.

The dispatch is not the problem, the intermediate object is.

Dispatch is the problem, but I conflated things by also suggesting using "{}" indexing. It's mostly the emulated postfix notation that bothers me. "{}" indexing is a separate issue. Following standard octave OOP practice is best because the language supports polymorphism based on prefix notation. Creating derived classes with extended functionality will just be easier. Emulating postfix syntax is going to either require a bunch of extra effort or complexity. The preference for df.as.cell() as apposed to the more idiomatically aligned ascell(df) seems purely stylistic.

Expressions like A{I} and A(I).B may generate a cs-list. This is
especially important in assignment, where the cs-list length needs to
be evaluated *prior* to the right hand side (and hence prior to the
subsasgn call).

Ok. I'm going to think about this.

--judd

From:	Judd Storrs
Subject:	Re: dataframe dereferencing
Date:	Fri, 3 Sep 2010 11:40:11 -0400