[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnuastro-commits] master d281fb9a: Book: new section explaining the dif
From: |
Mohammad Akhlaghi |
Subject: |
[gnuastro-commits] master d281fb9a: Book: new section explaining the difference between std and error |
Date: |
Mon, 15 May 2023 19:17:28 -0400 (EDT) |
branch: master
commit d281fb9aaa4352c777d3e5c95d47c3966f892c54
Author: Raul Infante-Sainz <infantesainz@gmail.com>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Book: new section explaining the difference between std and error
Until this commit, we did not have a section explaining the difference
between the standard deviation and the error, and sometimes this causes
confusion.
With this commit, a new section has been added under MakeCatalog for this
purpose. Using a practial example, we show the different concepts and how
they can be derived from each other.
---
NEWS | 6 ++
doc/gnuastro.texi | 176 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 179 insertions(+), 3 deletions(-)
diff --git a/NEWS b/NEWS
index c7537884..ba8e9e61 100644
--- a/NEWS
+++ b/NEWS
@@ -8,6 +8,12 @@ See the end of the file for license conditions.
** New features
+ Book:
+ - New "Standard deviation vs. error" sub-section added under the
+ MakeCatalog section. It uses real examples to clearly show the
+ fundamental difference between the two (which are sometimes confused
+ with each other). This was written with the help of Raul Infante-Sainz.
+
Configuration files
- To separate the option name and value, you can now also use the '='
character. This allows your custom configuration files to also be
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index 95d21a65..886f9acd 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -653,7 +653,8 @@ MakeCatalog
Quantifying measurement limits
-* Magnitude measurement error of each detection:: Derivation of mag error
equation
+* Standard deviation vs. error:: The std is not a measure of the error.
+* Magnitude measurement error of each detection:: Error in measuring
magnitude.
* Surface brightness error of each detection:: Error in measuring the Surface
brightness.
* Completeness limit of each detection:: Possibility of detecting similar
objects?
* Upper limit magnitude of each detection:: How reliable is your magnitude?
@@ -25763,7 +25764,8 @@ In astronomy, it is common to use the magnitude (a
unit-less scale) and physical
Therefore the measurements discussed here are commonly used in units of
magnitudes.
@menu
-* Magnitude measurement error of each detection:: Derivation of mag error
equation
+* Standard deviation vs. error:: The std is not a measure of the error.
+* Magnitude measurement error of each detection:: Error in measuring
magnitude.
* Surface brightness error of each detection:: Error in measuring the Surface
brightness.
* Completeness limit of each detection:: Possibility of detecting similar
objects?
* Upper limit magnitude of each detection:: How reliable is your magnitude?
@@ -25772,7 +25774,175 @@ Therefore the measurements discussed here are
commonly used in units of magnitud
* Upper limit magnitude of image:: Measure the noise-level for a certain
aperture.
@end menu
-@node Magnitude measurement error of each detection, Surface brightness error
of each detection, Quantifying measurement limits, Quantifying measurement
limits
+@node Standard deviation vs. error, Magnitude measurement error of each
detection, Quantifying measurement limits, Quantifying measurement limits
+@subsubsection Standard deviation vs. error
+The error and the standard deviation are sometimes confused with each other.
+Therefore, before continuing with the various measurement limits below, let's
review these two fundamental concepts.
+Instead of going into the theoretical defitions of the two (which you can see
in their resepctive Wikipedia pages), we'll discuss the concepts in a hands-on
and practical way here.
+
+Let's simulate an observation of the sky, but without any astronomical sources!
+In other words, where we only a background flux level (from the sky emission).
+With the first command below, let's make an image called @file{1.fits} that
contains @mymath{200\times200} pixels that are filled with random noise from a
Poisson distribution with a mean of 100 counts (the flux from the background
sky).
+Recall that the Poisson distribution is equal to a normal distribution for
larger mean values (as in this case).
+
+The standard deviation (@mymath{\sigma}) of the Poisson distribution is the
square root of the mean, see @ref{Photon counting noise}.
+With the second command, we'll have a look at the image.
+Note that due to the random nature of the noise, the values reported in the
next steps on your computer will be very slightly different.
+To reproducible exactly the same values in different runs, see @ref{Generating
random numbers}, and for more on the first command, see @ref{Arithmetic}.
+
+@example
+$ astarithmetic 200 200 2 makenew 100 mknoise-poisson \
+ --output=1.fits
+
+$ astscript-fits-view 1.fits
+@end example
+
+Each pixel shows the result of one sampling from the Poisson distribution.
+In other words, assuming the sky emission in our simulation is constant over
our field of view, each pixel's value shows one measurement of the sky emission.
+Statistically speaking, a ``measurement'' is a sampling from an underlying
distribution of values.
+Through our measurements, we aim to identfy that underlying distribution (the
``truth'')!
+With the command below, let's look at the pixel statistics of @file{1.fits}
(output is shown immediately under it).
+
+@c If you change this output, replace the standard deviation (10.09) below
+@c in the text.
+@example
+$ aststatistics 1.fits
+Statistics (GNU Astronomy Utilities) @value{VERSION}
+-------
+Input: 1.fits (hdu: 1)
+-------
+ Number of elements: 40000
+ Minimum: -4.72824245470431e+01
+ Maximum: 4.24861780263050e+01
+ Mode: 0.09274776246
+ Mode quantile: 0.5004125103
+ Median: 8.36190404450713e-02
+ Mean: 0.098637593
+ Standard deviation: 10.09065298
+-------
+Histogram:
+ | * ****
+ | *********
+ | ************
+ | **************
+ | *****************
+ | ********************
+ | ***********************
+ | **************************
+ | ******************************
+ | **************************************
+ |* * *********************************************************** * *
+ |----------------------------------------------------------------------
+@end example
+
+As expected, you see that the ASCII histogram nicely resembles a normal
distribution.
+The measured mean and standard deviation (@mymath{\sigma_x}) are also very
similar to the input (mean of 100, standard deviation of @mymath{\sigma=10}).
+But the measured mean (and standard deviation) aren't exactly equal to the
input!
+
+Every time we make a different simulated image from the same distribution, the
measured mean and standrad deviation will slightly differ.
+With the second command below, let's build 500 images like above and measure
their mean and standard deviation.
+The outputs will be written into a file (@file{mean-stds.txt}; in the first
command we are deleting it to make sure we write into an empty file within the
loop).
+With the third command, let's view the top 10 rows:
+
+@example
+$ rm -f mean-stds.txt
+$ for i in $(seq 500); do \
+ astarithmetic 200 200 2 makenew 100 mknoise-poisson \
+ --output=$i.fits --quiet; \
+ aststatistics $i.fits --mean --std >> mean-stds.txt; \
+ echo "$i: complete"; \
+ done
+
+$ asttable mean-stds.txt -Y --head=10
+99.989381 9.936407
+100.036622 10.059997
+100.006054 9.985470
+99.944535 9.960069
+100.050318 9.970116
+100.002718 9.905395
+100.067555 9.964038
+100.027167 10.018562
+100.051951 9.995859
+100.000212 9.970293
+@end example
+
+From this table, you see that each simulation has produced a slightly
different measured mean and measured standard deviation (@mymath{\sigma_x})
that are just fluctuating around the input mean (which was 100) and input
standard deviation (@mymath{\sigma=10}).
+Let's have a look at the distribution of mean measurements:
+
+@example
+$ aststatistics mean-stds.txt -c1
+Statistics (GNU Astronomy Utilities) @value{VERSION}
+-------
+Input: mean-stds.txt
+Column: 1
+-------
+ Number of elements: 500
+ Minimum: 9.98183528700191e+01
+ Maximum: 1.00146490891332e+02
+ Mode: 99.99709739
+ Mode quantile: 0.49498998
+ Median: 9.99977393190436e+01
+ Mean: 99.99891826
+ Standard deviation: 0.04901635275
+-------
+Histogram:
+ | *
+ | * **
+ | ****** **** * *
+ | ****** **** * * *
+ | * * ************* * *
+ | * ****************** **
+ | * ********************* *** *
+ | * ***************************** ***
+ | *** ********************************** *
+ | *** ******************************************* **
+ | * ************************************************* ** *
+ |----------------------------------------------------------------------
+@end example
+
+@cindex Standard error of mean
+The standard deviation of the various mean measurements above shows the
scatter in measuring the mean with an image of this size from this underlying
distribution.
+This is therefore defined as the @emph{standard error of the mean}, or
``error'' for short (since most measurements are actually the mean of a
population) and shown with @mymath{\widehat\sigma_{\bar{x}}}.
+
+From the example above, you see that the error is smaller than the standard
deviation (smaller when you have a larger sample).
+In fact, @url{https://en.wikipedia.org/wiki/Standard_error#Derivation, it can
be shown} that this ``error of the mean'' (@mymath{\sigma_{\bar{x}}}) is
related to the distribution standard deviation (@mymath{\sigma}) through the
following equation.
+Where @mymath{N} is the number of points used to measure the mean in one
sample (@mymath{200\times200=40000} in this case).
+Note that the @mymath{10.09} below was reported as ``standard deviation'' in
the first run of @code{aststatistics} on @file{1.fits} above):
+
+@c The 10.09 depends on the 'aststatistics 1.fits' command above.
+@dispmath{\sigma_{\bar{x}}=\frac{\sigma}{\sqrt{N}} \quad\quad {\rm or}
\quad\quad \widehat\sigma_{\bar{x}}\approx\frac{\sigma_x}{\sqrt{N}} =
\frac{10.09}{200} = 0.05}
+
+@noindent
+Taking the considerations above into account, we should clearly distinguish
the following concepts when talking about the standard deviation or error:
+
+@table @asis
+@item Standard deviation of population
+This is the standard deviation of the underlying distribution (10 in the
example above), and shown by @mymath{\sigma}.
+This is something you can never measure, and is just the ideal value.
+
+@item Standard deviation of mean
+Ideal error of measuring the mean (assuming we know @mymath{\sigma}).
+
+@item Standard deviation of sample (i.e., @emph{Standard deviation})
+Measured Standard deviation from a sampling of the ideal distribution.
+This is the second column of @file{mean-stds.txt} above and is shown with
@mymath{\sigma_x} above.
+In astronomical literature, this is simply referred to as the ``standard
deviation''.
+
+In other words, the standard deviation is computed on the input itself and
MakeCatalog just needs a ``values'' file.
+For example, when measuring the standard deviation of an astronomical object
using MakeCatalog it is computed directly from the input values.
+
+@item Standard error (i.e., @emph{error})
+Measurable scatter of measuring the mean (@mymath{\widehat\sigma_{\bar{x}}})
that can be estimated from the size of the sample and the measured standard
deviation (@mymath{\sigma_x}).
+In astronomical literature, this is simply referred to as the ``error''.
+
+In other words, when asking for an ``error'' measurement with MakeCatalog, a
separate standard deviation dataset should be always provided.
+This dataset should take into account all sources of scatter.
+For example, during the reduction of an image, the standard deviation dataset
should take into account the dispersion of each pixel that cames from the bias,
dark, flat fielding, etc.
+If this image is not available, it is possible to use the @code{SKY_STD}
extension from NoiseChisel as an estimation.
+For more see @ref{NoiseChisel output}.
+@end table
+
+@node Magnitude measurement error of each detection, Surface brightness error
of each detection, Standard deviation vs. error, Quantifying measurement limits
@subsubsection Magnitude measurement error of each detection
The raw error in measuring the magnitude is only meaningful when the object's
magnitude is brighter than the upper-limit magnitude (see below).
As discussed in @ref{Brightness flux magnitude}, the magnitude (@mymath{M}) of
an object with brightness @mymath{B} and zero point magnitude @mymath{z} can be
written as:
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [gnuastro-commits] master d281fb9a: Book: new section explaining the difference between std and error,
Mohammad Akhlaghi <=