octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GSoC/SoCiS: general maximum entropy algorithm


From: Tomasz Ożański
Subject: Re: GSoC/SoCiS: general maximum entropy algorithm
Date: Thu, 23 Apr 2015 01:36:02 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

W dniu 31.03.2015 o 17:45, Olaf Till pisze:

I've undertaken some effort in studying the paper. In places, I've
difficulties to follow the derivations or ideas, to a degree that I
couldn't verify them. To mentor, I'd think it necessary to check the
plausibility of all the details of the algorithm before an application
for this project. But I see now that this would cost me too much time
and work. I'm sorry.

That's fine. But just to be clear there aren't any chances to go with this topic on SoCiS? Perhaps I could try with one of the proposed topics instead.

Anyway, I am going to get involved again in maximum entropy sooner or later so I could try to work this out later.

Meanwhile I've found my old code and will try to prepare some draft algorithm.

I still want to address the issues you mention
Some issues have occured to me. I report them here, for the case that
they should still be useful.

General:

- I'm not sure that your characterization of the algorithm is always
   correct. Chi-squared is not the objective function, and the
   algorithm is not least squares with entropy as a
   regularizer. Rather, entropy (of the image) is the objective
   function, and the normalized sum of squared deviations from observed
   values of a (linear) user function (applied to the image) is an
   equality constraint (formally an inequality constraint, but taken as
   equality since otherwise entropy would be too large anyway (?)).
Removing the entropy criterium from the algorithm would yield the unconstrained least squares. I do not know much about the regularization, but it seems you usually take the l2 norm of the vector, multiply by some factor, add it to the objective function and optimize them both. This algorithm seems to be very similar, except you substitute the norm with entropy and maybe have a bit different rules for adjusting the Lagrange multiplier.

- Having this as an extra algorithm is justified by its applicability
   to a large scale (a million parameters (pixels)).

- The value of the (in-)equality constraint (a still "plausible"
   normalized sum of squared deviations) depends on the standard error
   of the observations (due to the normalization). The paper does not
   mention how this standard error is provided. By the user?
That is how I understand it -- they are given by the user based on the measurement error estimate.

Description of algorithm details in the paper:

- In equation (5), 'A' is just given, it does not follow from the
   derivation, not even considering (6) and (7). Maybe this is not so
   important.

I agree -- this is not important since it is absent in the main algorithm.

- Possibly more important, but not sure if I've misunderstood it: Do
   they want to apply the metric 'g_ij = 1/f^i(i=j)' (18) for limiting
   the step by computing the step with the (later defined) quadratic
   model with suitably normalized subspace base-vectors and a suitable
   quadratic term? In that context: in the paragraph before (19), do
   they mean that 'f(Nabla Q)' is dQ/df, its components (subscript i)
   multiplied with the components 1/f_i ? This seems reasonable for the
   linear term in the quadratic model (23) (linear term produces a
   difference of S given by a sum (subindex i) of ((dS/df_i)^2/f_i) as
   possibly intended (?) considering (18)), however, its quadratic term
   produces a sum of ((dS/df_i)^2/(f_i)^2), while a metric (18) applied
   to steps would yield a sum of ((dS/df_i)^2/f_i) ... ?

That's the same as I understand it -- they say that the 'big quadratic model' (i.e. the gradients and Hessians of entropy and chi2 in the problem) are in the space with metric given by g^ij = f^i(i=j). (so if you want to go to the space with unitary metric, you need to multiply by the inverse i.e. g_ij, as you write). The gradients are covariant vectors (lower index) here and if you want to compute the distance you need to multiply one of them by g^ij to raise the index so that you have one vector with lower index and one with upper index -- so the formula for the lengths of gradients looks fine for me. I do not really know what do you mean by quadratic term -- they are all second derivatives here. Anyway, during the implementation one can check if all the metric transformations are fine by comparing numerical values.
Olaf

greetings,
Tomasz Ożański



reply via email to

[Prev in Thread] Current Thread [Next in Thread]