[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GSoC/SoCiS: general maximum entropy algorithm
From: |
Tomasz Ożański |
Subject: |
Re: GSoC/SoCiS: general maximum entropy algorithm |
Date: |
Thu, 23 Apr 2015 01:36:02 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 |
W dniu 31.03.2015 o 17:45, Olaf Till pisze:
I've undertaken some effort in studying the paper. In places, I've
difficulties to follow the derivations or ideas, to a degree that I
couldn't verify them. To mentor, I'd think it necessary to check the
plausibility of all the details of the algorithm before an application
for this project. But I see now that this would cost me too much time
and work. I'm sorry.
That's fine. But just to be clear there aren't any chances to go with
this topic on SoCiS? Perhaps I could try with one of the proposed topics
instead.
Anyway, I am going to get involved again in maximum entropy sooner or
later so I could try to work this out later.
Meanwhile I've found my old code and will try to prepare some draft
algorithm.
I still want to address the issues you mention
Some issues have occured to me. I report them here, for the case that
they should still be useful.
General:
- I'm not sure that your characterization of the algorithm is always
correct. Chi-squared is not the objective function, and the
algorithm is not least squares with entropy as a
regularizer. Rather, entropy (of the image) is the objective
function, and the normalized sum of squared deviations from observed
values of a (linear) user function (applied to the image) is an
equality constraint (formally an inequality constraint, but taken as
equality since otherwise entropy would be too large anyway (?)).
Removing the entropy criterium from the algorithm would yield the
unconstrained least squares. I do not know much about the
regularization, but it seems you usually take the l2 norm of the vector,
multiply by some factor, add it to the objective function and optimize
them both. This algorithm seems to be very similar, except you
substitute the norm with entropy and maybe have a bit different rules
for adjusting the Lagrange multiplier.
- Having this as an extra algorithm is justified by its applicability
to a large scale (a million parameters (pixels)).
- The value of the (in-)equality constraint (a still "plausible"
normalized sum of squared deviations) depends on the standard error
of the observations (due to the normalization). The paper does not
mention how this standard error is provided. By the user?
That is how I understand it -- they are given by the user based on the
measurement error estimate.
Description of algorithm details in the paper:
- In equation (5), 'A' is just given, it does not follow from the
derivation, not even considering (6) and (7). Maybe this is not so
important.
I agree -- this is not important since it is absent in the main algorithm.
- Possibly more important, but not sure if I've misunderstood it: Do
they want to apply the metric 'g_ij = 1/f^i(i=j)' (18) for limiting
the step by computing the step with the (later defined) quadratic
model with suitably normalized subspace base-vectors and a suitable
quadratic term? In that context: in the paragraph before (19), do
they mean that 'f(Nabla Q)' is dQ/df, its components (subscript i)
multiplied with the components 1/f_i ? This seems reasonable for the
linear term in the quadratic model (23) (linear term produces a
difference of S given by a sum (subindex i) of ((dS/df_i)^2/f_i) as
possibly intended (?) considering (18)), however, its quadratic term
produces a sum of ((dS/df_i)^2/(f_i)^2), while a metric (18) applied
to steps would yield a sum of ((dS/df_i)^2/f_i) ... ?
That's the same as I understand it -- they say that the 'big quadratic
model' (i.e. the gradients and Hessians of entropy and chi2 in the
problem) are in the space with metric given by g^ij = f^i(i=j). (so if
you want to go to the space with unitary metric, you need to multiply by
the inverse i.e. g_ij, as you write).
The gradients are covariant vectors (lower index) here and if you want
to compute the distance you need to multiply one of them by g^ij to
raise the index so that you have one vector with lower index and one
with upper index -- so the formula for the lengths of gradients looks
fine for me.
I do not really know what do you mean by quadratic term -- they are all
second derivatives here.
Anyway, during the implementation one can check if all the metric
transformations are fine by comparing numerical values.
Olaf
greetings,
Tomasz Ożański
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: GSoC/SoCiS: general maximum entropy algorithm,
Tomasz Ożański <=