[gnuastro-commits] master 45775c65: MakeCatalog: new --mean-error option

gnuastro-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnuastro-commits] master 45775c65: MakeCatalog: new --mean-error option

From:	Mohammad Akhlaghi
Subject:	[gnuastro-commits] master 45775c65: MakeCatalog: new --mean-error option and re-write of std vs. std error
Date:	Thu, 8 Aug 2024 15:19:21 -0400 (EDT)
branch: master
commit 45775c651974736af82800f02f74c606f98de560
Author: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>

    MakeCatalog: new --mean-error option and re-write of std vs. std error
    
    Until now, the section that described the difference between standard
    deviation and standard error was confusing because it called the "standard
    error" as "error" (which is the other name of standard deviation when
    dealing with a normal distribution). Also, MakeCatalog didn't have any
    column to measure the standard error, and there was no way to tell
    MakeCatalog to avoid pixel values when measuring the error columns (when
    the standard deviation image already contains the signal sources of error).
    
    With this commit, all these issues are addressed: the title of the section
    of the book was edited to be more clear and the text was also fully edited
    and re-written where necessary. Also, the '--mean-error' option was added
    to let people measure the standard error and the '--novalinerror' option
    was defined so the pixel values are ignored when measuring errors.
    
    The issue of the confusing text of that section in the book was raised by
    Rahna Payyasseri Thanduparackal.
---
 NEWS                    |  11 ++++
 bin/mkcatalog/args.h    |  29 +++++++++-
 bin/mkcatalog/columns.c |  42 +++++++++++++--
 bin/mkcatalog/main.h    |   1 +
 bin/mkcatalog/parse.c   |  36 ++++++-------
 bin/mkcatalog/ui.c      |   5 +-
 bin/mkcatalog/ui.h      |   2 +
 doc/gnuastro.texi       | 138 ++++++++++++++++++++++++++++--------------------
 8 files changed, 181 insertions(+), 83 deletions(-)

diff --git a/NEWS b/NEWS
index 808be250..3553709d 100644
--- a/NEWS
+++ b/NEWS
@@ -6,6 +6,17 @@ See the end of the file for license conditions.
 * Noteworthy changes in release X.XX (library XX.X.X) (YYYY-MM-DD)
 ** New publications
 ** New features
+*** MakeCatalog
+
+  --novalinerror: ignore pixel in the values image when calculating the
+    variance of each label's pixel: only use the pixel's value in the
+    '--instd' image (which can also be the variance with the '--variance'
+    option). This is useful in scenarios where the STD/variance image is
+    not just the sky STD/Variance, but also has the signal's contribution.
+
+  --mean-error: Measure the error of the mean for each label (object or
+    clump).
+
 *** astscript-radial-profile
 
   --azimuth option can be called multiple times so the profile is generated
diff --git a/bin/mkcatalog/args.h b/bin/mkcatalog/args.h
index 3cf1bd5f..9110d606 100644
--- a/bin/mkcatalog/args.h
+++ b/bin/mkcatalog/args.h
@@ -162,6 +162,19 @@ struct argp_option program_options[] =
       GAL_OPTIONS_NOT_MANDATORY,
       GAL_OPTIONS_NOT_SET
     },
+    {
+      "novalinerror",
+      UI_KEY_NOVALINERRORS,
+      0,
+      0,
+      "Ignore pixel values when estimating errors.",
+      GAL_OPTIONS_GROUP_INPUT,
+      &p->novalinerror,
+      GAL_OPTIONS_NO_ARG_TYPE,
+      GAL_OPTIONS_RANGE_0_OR_1,
+      GAL_OPTIONS_NOT_MANDATORY,
+      GAL_OPTIONS_NOT_SET
+    },
     {
       "forcereadstd",
       UI_KEY_FORCEREADSTD,
@@ -1079,7 +1092,7 @@ struct argp_option program_options[] =
       UI_KEY_SUMERROR,
       0,
       0,
-      "Error (1-sigma) in measuring sum.",
+      "Standard deviation (error) in measuring sum.",
       UI_GROUP_COLUMNS_BRIGHTNESS,
       0,
       GAL_TYPE_INVALID,
@@ -1130,6 +1143,20 @@ struct argp_option program_options[] =
       GAL_OPTIONS_NOT_SET,
       ui_column_codes_ll
     },
+    {
+      "mean-error",
+      UI_KEY_MEANERROR,
+      0,
+      0,
+      "Error of the mean in object/clump.",
+      UI_GROUP_COLUMNS_BRIGHTNESS,
+      0,
+      GAL_TYPE_INVALID,
+      GAL_OPTIONS_RANGE_ANY,
+      GAL_OPTIONS_NOT_MANDATORY,
+      GAL_OPTIONS_NOT_SET,
+      ui_column_codes_ll
+    },
     {
       "std",
       UI_KEY_STD,
diff --git a/bin/mkcatalog/columns.c b/bin/mkcatalog/columns.c
index 6e47ac54..b0dfb806 100644
--- a/bin/mkcatalog/columns.c
+++ b/bin/mkcatalog/columns.c
@@ -1260,7 +1260,7 @@ columns_define_alloc(struct mkcatalogparams *p)
         case UI_KEY_SUMERROR:
           name           = "SUM_ERROR";
           unit           = MKCATALOG_NO_UNIT;
-          ocomment       = "Error (1-sigma) in measuring sum.";
+          ocomment       = "Standard deviation (error) of in measuring sum.";
           ccomment       = ocomment;
           otype          = GAL_TYPE_FLOAT32;
           ctype          = GAL_TYPE_FLOAT32;
@@ -1318,6 +1318,23 @@ columns_define_alloc(struct mkcatalogparams *p)
                                    ciflag[ CCOL_RIV_SUM ] = 1;
           break;
 
+        case UI_KEY_MEANERROR:
+          name           = "MEAN_ERROR";
+          unit           = MKCATALOG_NO_UNIT;
+          ocomment       = "Error of mean of sky subtracted values";
+          ccomment       = ocomment;
+          otype          = GAL_TYPE_FLOAT32;
+          ctype          = GAL_TYPE_FLOAT32;
+          disp_fmt       = GAL_TABLE_DISPLAY_FMT_GENERAL;
+          disp_width     = 10;
+          disp_precision = 5;
+          oiflag[ OCOL_NUM         ] = ciflag[ CCOL_NUM         ] = 1;
+          oiflag[ OCOL_SUM_VAR     ] = ciflag[ CCOL_SUM_VAR     ] = 1;
+          oiflag[ OCOL_SUM_VAR_NUM ] = ciflag[ CCOL_SUM_VAR_NUM ] = 1;
+                                       ciflag[ CCOL_RIV_NUM     ] = 1;
+                                       ciflag[ CCOL_RIV_SUM_VAR ] = 1;
+          break;
+
         case UI_KEY_STD:
           name           = "STD";
           unit           = MKCATALOG_NO_UNIT;
@@ -2365,7 +2382,7 @@ columns_define_alloc(struct mkcatalogparams *p)
    to find variance and number of pixels used to find brightness are the
    same). */
 static double
-columns_sum_error(struct mkcatalogparams *p, double *row, int o0c1)
+columns_sum_std(struct mkcatalogparams *p, double *row, int o0c1)
 {
   size_t numind = o0c1 ? CCOL_NUM         : OCOL_NUM;
   double V = row[ o0c1 ? CCOL_SUM_VAR     : OCOL_SUM_VAR ];
@@ -2397,7 +2414,7 @@ columns_sn(struct mkcatalogparams *p, double *row, int 
o0c1)
                : 0.0 );
 
   /* Return the derived value. */
-  return (I-O) / columns_sum_error(p, row, o0c1);
+  return (I-O) / columns_sum_std(p, row, o0c1);
 }
 
 
@@ -2932,7 +2949,7 @@ columns_fill(struct mkcatalog_passparams *pp)
           break;
 
         case UI_KEY_SUMERROR:
-          ((float *)colarr)[oind] = columns_sum_error(p, oi, 0);
+          ((float *)colarr)[oind] = columns_sum_std(p, oi, 0);
           break;
 
         case UI_KEY_CLUMPSSUM:
@@ -2947,6 +2964,15 @@ columns_fill(struct mkcatalog_passparams *pp)
                                       : NAN );
           break;
 
+        /* In 'columns_sum_std', we take the square root of the sum of
+           variances. So here, we just need to divide by the total number
+           of elements used: since the mean is the sum/number and number is
+           fixed, so e(mean)=e(sum)/number. */
+        case UI_KEY_MEANERROR:
+          ((float *)colarr)[oind] = ( columns_sum_std(p, oi, 0)
+                                      / oi[OCOL_NUM] );
+          break;
+
         case UI_KEY_STD:
           ((float *)colarr)[oind] =
             gal_statistics_std_from_sums(oi[ OCOL_SUM ], oi[ OCOL_SUMP2 ],
@@ -3331,7 +3357,7 @@ columns_fill(struct mkcatalog_passparams *pp)
             break;
 
           case UI_KEY_SUMERROR:
-            ((float *)colarr)[cind] = columns_sum_error(p, ci, 1);
+            ((float *)colarr)[cind] = columns_sum_std(p, ci, 1);
             break;
 
           case UI_KEY_SUMNORIVER:
@@ -3344,6 +3370,12 @@ columns_fill(struct mkcatalog_passparams *pp)
                                         /ci[CCOL_NUM] );
             break;
 
+          case UI_KEY_MEANERROR:
+            ((float *)colarr)[cind] = ( columns_sum_std(p, ci, 1)
+                                        / (  oi[CCOL_NUM]
+                                           + oi[CCOL_RIV_NUM] ) );
+            break;
+
           case UI_KEY_STD:
             ((float *)colarr)[cind] =
               gal_statistics_std_from_sums(oi[ CCOL_SUM   ],
diff --git a/bin/mkcatalog/main.h b/bin/mkcatalog/main.h
index 221898f9..976e9847 100644
--- a/bin/mkcatalog/main.h
+++ b/bin/mkcatalog/main.h
@@ -262,6 +262,7 @@ struct mkcatalogparams
   uint8_t         noclumpsort;  /* Don't sort the clumps catalog.       */
   float             zeropoint;  /* Zero-point magnitude of object.      */
   uint8_t            variance;  /* Input STD file is actually variance. */
+  uint8_t        novalinerror;  /* Ignore values when estimating errors.*/
   uint8_t        forcereadstd;  /* Read STD even if not needed.         */
   uint8_t         subtractsky;  /* ==1: subtract the Sky from values.   */
   float           sfmagnsigma;  /* Surface brightness multiple of sigma.*/
diff --git a/bin/mkcatalog/parse.c b/bin/mkcatalog/parse.c
index 25a00269..ede28844 100644
--- a/bin/mkcatalog/parse.c
+++ b/bin/mkcatalog/parse.c
@@ -169,11 +169,12 @@ parse_vector_dim3(struct mkcatalog_passparams *pp, 
gal_data_t *xybin)
   double var;
   int needsvar;
   gal_data_t *vector=pp->vector;
+  uint8_t vine=!p->novalinerror;
   float *std=p->std?p->std->array:NULL;
   size_t c[3], *dsize=p->objects->dsize;
   size_t sind=0, pind=0, num_increment=1;
+  float sval, *st_v, *st_std, *V=NULL, *ST=NULL;
   uint8_t *xybinarr = xybin ? xybin->array : NULL;
-  float st, sval, *st_v, *st_std, *V=NULL, *ST=NULL;
   int32_t *st_o, *O, *OO, *objarr=p->objects->array;
   size_t tid, *tsize, increment=0, start_end_inc[2], ndim=p->objects->ndim;
 
@@ -193,7 +194,8 @@ parse_vector_dim3(struct mkcatalog_passparams *pp, 
gal_data_t *xybin)
 
   /* Prepare the parsing information. Also, if tile-id isn't necessary, set
      'tid' to a blank value to cause a crash with a mistake. */
-  tsize=parse_vector_dim3_prepare(pp, start_end_inc, &st_o, &st_v, &st_std);
+  tsize=parse_vector_dim3_prepare(pp, start_end_inc, &st_o, &st_v,
+                                  &st_std);
   tid = (p->std && p->std->size>1 && st_std == NULL)?0:GAL_BLANK_SIZE_T;
 
   /* Check if we need the variance. */
@@ -242,8 +244,7 @@ parse_vector_dim3(struct mkcatalog_passparams *pp, 
gal_data_t *xybin)
                      we are given a variance dataset already, there is no
                      need to use 'st*st', we can directly use 'sval'. */
                   sval = st_std ? *ST : (p->std->size>1?std[tid]:std[0]);
-                  st = p->variance ? sqrt(sval) : sval;
-                  var = (p->variance ? sval : st*st) + fabs(*V);
+                  var = (p->variance ? sval : sval*sval) + (vine?*V:0);
                 }
               else var = NAN;
 
@@ -325,11 +326,11 @@ parse_objects(struct mkcatalog_passparams *pp)
   double *oi=pp->oi;
   gal_data_t *xybin=NULL;
   size_t *tsize=pp->tile->dsize;
-  uint8_t *u, *uf, goodvalue, *xybinarr=NULL;
   double minima_v=FLT_MAX, maxima_v=-FLT_MAX;
   size_t d, pind=0, increment=0, num_increment=1;
   int32_t *O, *OO, *C=NULL, *objarr=p->objects->array;
-  float var, sval, varval, skyval, *V=NULL, *SK=NULL, *ST=NULL;
+  float var, sval, skyval, *V=NULL, *SK=NULL, *ST=NULL;
+  uint8_t *u, *uf, goodvalue, vine=!p->novalinerror, *xybinarr=NULL;
   float *std=p->std?p->std->array:NULL, *sky=p->sky?p->sky->array:NULL;
 
   /* If tile processing isn't necessary, set 'tid' to a blank value. */
@@ -589,17 +590,13 @@ parse_objects(struct mkcatalog_passparams *pp)
                      standard deviation of the signal (independent of the
                      sky) is 'sqrt(*V)'. Therefore the total variance of
                      this pixel is the variance of the sky added with the
-                     absolute value of its sky-subtracted flux. We use the
-                     absolute value, because especially as the signal gets
-                     noisy there will be negative values, and we don't want
-                     them to decrease the variance. */
+                     absolute value of its sky-subtracted flux. */
                   if(oif[ OCOL_SUM_VAR ] && goodvalue)
                     {
-                      varval=p->variance ? var : sval;
-                      if(!isnan(varval))
+                      if(!isnan(var))
                         {
                           oi[ OCOL_SUM_VAR_NUM  ]++;
-                          oi[ OCOL_SUM_VAR      ] += varval + fabs(*V);
+                          oi[ OCOL_SUM_VAR      ] += var + (vine?*V:0);
                         }
                     }
                 }
@@ -725,13 +722,14 @@ parse_clumps(struct mkcatalog_passparams *pp)
 
   double *ci, *cir;
   gal_data_t *xybin=NULL;
+  uint8_t vine=!p->novalinerror;
   int32_t *O, *OO, *C=NULL, nlab;
   size_t cind, *tsize=pp->tile->dsize;
   double *minima_v=NULL, *maxima_v=NULL;
   uint8_t *u, *uf, goodvalue, *cif=p->ciflag;
   size_t nngb=gal_dimension_num_neighbors(ndim);
   size_t i, ii, d, pind=0, increment=0, num_increment=1;
-  float var, sval, varval, skyval, *V=NULL, *SK=NULL, *ST=NULL;
+  float var, sval, skyval, *V=NULL, *SK=NULL, *ST=NULL;
   int32_t *objects=p->objects->array, *clumps=p->clumps->array;
   float *std=p->std?p->std->array:NULL, *sky=p->sky?p->sky->array:NULL;
 
@@ -995,11 +993,10 @@ parse_clumps(struct mkcatalog_passparams *pp)
                         }
                       if(cif[ CCOL_SUM_VAR ] && goodvalue)
                         {
-                          varval=p->variance ? var : sval;
-                          if(!isnan(varval))
+                          if(!isnan(var))
                             {
                               ci[ CCOL_SUM_VAR_NUM ]++;
-                              ci[ CCOL_SUM_VAR     ] += varval + fabs(*V);
+                              ci[ CCOL_SUM_VAR     ] += var+(vine?*V:0);
                             }
                         }
                     }
@@ -1077,8 +1074,9 @@ parse_clumps(struct mkcatalog_passparams *pp)
                                             : ( p->std->size>1
                                                 ? std[tid]
                                                 : std[0] )     );
-                                   cir[ CCOL_RIV_SUM_VAR ] += fabs(*V)
-                                     + (p->variance ? sval : sval*sval);
+                                   cir[ CCOL_RIV_SUM_VAR ] +=
+                                     (p->variance ? sval : sval*sval)
+                                     + (vine?*V:0);
                                  }
                              }
                          }
diff --git a/bin/mkcatalog/ui.c b/bin/mkcatalog/ui.c
index f6517ccb..c7c2c4e5 100644
--- a/bin/mkcatalog/ui.c
+++ b/bin/mkcatalog/ui.c
@@ -382,7 +382,7 @@ ui_check_only_options(struct mkcatalogparams *p)
           "If you want the upperlimit check table for an object, only "
           "give one value (the object's label) to '--checkuplim'.");
 
-  /* See if '--skyin' is a filename or a value. When the string is ONLY a
+  /* See if '--insky' is a filename or a value. When the string is ONLY a
      number (and nothing else), 'tailptr' will point to the end of the
      string ('\0'). */
   if(p->skyfile)
@@ -1981,6 +1981,9 @@ ui_read_check_inputs_setup(int argc, char *argv[],
           if(p->subtractsky)
             printf("    - Sky has been subtracted from values "
                    "internally.\n");
+          else
+            printf("    - Sky NOT SUBTRACTED from values, only used "
+                   "in needed columns.\n");
         }
 
       if(p->std)
diff --git a/bin/mkcatalog/ui.h b/bin/mkcatalog/ui.h
index c511689b..b092862c 100644
--- a/bin/mkcatalog/ui.h
+++ b/bin/mkcatalog/ui.h
@@ -84,6 +84,7 @@ enum option_keys_enum
   UI_KEY_ZEROPOINT,
   UI_KEY_SIGMACLIP,
   UI_KEY_VARIANCE,
+  UI_KEY_NOVALINERRORS,
   UI_KEY_SUBTRACTSKY,
   UI_KEY_SFMAGNSIGMA,
   UI_KEY_SFMAGAREA,
@@ -153,6 +154,7 @@ enum option_keys_enum
   UI_KEY_SUMNORIVER,
   UI_KEY_STD,
   UI_KEY_MEAN,
+  UI_KEY_MEANERROR,
   UI_KEY_MEDIAN,
   UI_KEY_MAXIMUM,
   UI_KEY_MAGNITUDEERROR,
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index 21505c7b..ea899289 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -696,7 +696,7 @@ MakeCatalog
 
 Quantifying measurement limits
 
-* Standard deviation vs error::  The std is not a measure of the error.
+* Standard deviation vs Standard error::  Differnece between these important 
measures.
 * Magnitude measurement error of each detection::  Error in measuring 
magnitude.
 * Surface brightness error of each detection::  Error in measuring the Surface 
brightness.
 * Completeness limit of each detection::  Possibility of detecting similar 
objects?
@@ -29371,8 +29371,10 @@ It might even be so intertwined with its processing, 
that adding new columns mig
 @node Brightness flux magnitude, Quantifying measurement limits, Detection and 
catalog production, MakeCatalog
 @subsection Brightness, Flux, Magnitude and Surface brightness
 
-After taking an image with your smartphone camera (or a large telescope), the 
value in each pixel of the output file is a proxy for the amount of 
@emph{energy} that accumulated in it (while it was exposed to light).
-In astrophysics, the centimetre–gram–second (cgi) units are commonly used so 
energy is commonly reported in units of @mymath{erg}.
+@cindex cgs units (centimeter–gram–second)
+@cindex centimeter–gram–second (cgs) units
+After taking an image with your camera (or a large telescope), the value in 
each pixel of the output file is a proxy for the amount of @emph{energy} that 
accumulated in it (while it was exposed to light).
+In astrophysics, the centimeter–gram–second (cgs) units are commonly used so 
energy is commonly reported in units of @mymath{erg}.
 In an image, the energy of a galaxy for example will be distributed over many 
pixels.
 Therefore, the collected energy of an object in the image is the total sum of 
the values in the pixels associated to the object.
 
@@ -29386,8 +29388,8 @@ To be able to compare with data taken with different 
exposure times, we define t
 
 @item Flux (@mymath{erg/s/cm^2})
 @cindex Flux
-To be able to compare with data from different telescope collecting areas, we 
define the @emph{flux} which is defined in units of brightness per 
collecting-area.
-This area is historically reported in units of @mymath{cm^2}.
+To be able to compare with data from different telescopes (with differnet 
collecting areas), we define the @emph{flux} which is defined in units of 
brightness per collecting-area.
+Because we are using the cgs units, the collecting area is reported in 
@mymath{cm^2}.
 
 @cindex Luminosity
 Knowing the flux (@mymath{f}) and distance to the object (@mymath{r}), we can 
define its @emph{luminosity}: @mymath{L=4{\pi}r^2f}.
@@ -29405,11 +29407,11 @@ To take into account the spectral coverage of our 
data, we define the @emph{spec
 Like other objects in nature, astronomical objects do not emit or reflect the 
same flux at all wavelengths.
 On the other hand, our detector techologies are different for different 
wavelength ranges.
 Therefore, even if we wanted to, there is no way to measure the ``total'' (at 
all wavelengths) brightness of an object with a single tool.
-To be able to analyze objects with different spectral features, it is 
therefore important to account for the wavelength (or frequency) range of the 
photons that we measured through the spectral flux density.
+To be able to analyze objects with different spectral features (compare 
measurements of the same object taken in different spectral regimes), it is 
therefore important to account for the wavelength (or frequency) range of the 
photons that we measured through the spectral flux density.
 
 @item Jansky (@mymath{10^{-23}erg/s/cm^2/Hz})
 @cindex Jansky (Jy)
-A ``Jansky'' is a certain value of frequency flux density (commonly used in 
radio astronomy).
+A ``Jansky'' is a certain level of frequency flux density (commonly used in 
radio astronomy).
 Janskys can be converted to wavelength flux density using the 
@code{jy-to-wavelength-flux-density} operator of Gnuastro's Arithmetic program, 
see the derivation under this operator's description in @ref{Unit conversion 
operators}.
 @end table
 
@@ -29568,7 +29570,7 @@ In astronomy, it is common to use the magnitude (a 
unit-less scale) and physical
 Therefore the measurements discussed here are commonly used in units of 
magnitudes.
 
 @menu
-* Standard deviation vs error::  The std is not a measure of the error.
+* Standard deviation vs Standard error::  Differnece between these important 
measures.
 * Magnitude measurement error of each detection::  Error in measuring 
magnitude.
 * Surface brightness error of each detection::  Error in measuring the Surface 
brightness.
 * Completeness limit of each detection::  Possibility of detecting similar 
objects?
@@ -29578,14 +29580,19 @@ Therefore the measurements discussed here are 
commonly used in units of magnitud
 * Upper limit surface brightness of image::  Measure the noise-level for a 
certain aperture.
 @end menu
 
-@node Standard deviation vs error, Magnitude measurement error of each 
detection, Quantifying measurement limits, Quantifying measurement limits
-@subsubsection Standard deviation vs error
-The error and the standard deviation are sometimes confused with each other.
-Therefore, before continuing with the various measurement limits below, let's 
review these two fundamental concepts.
-Instead of going into the theoretical definitions of the two (which you can 
see in their respective Wikipedia pages), we'll discuss the concepts in a 
hands-on and practical way here.
+@node Standard deviation vs Standard error, Magnitude measurement error of 
each detection, Quantifying measurement limits, Quantifying measurement limits
+@subsubsection Standard deviation vs Standard error
+The standard deviation and standard error are different concepts to convey 
different aspects of a measurement, but they can be easily confused with each 
other: for example, the standard deviation is also called as the ``error''.
+A nice description of this difference is given in the following quote from 
Wikipedia.
+In this section, we'll show the concept described in this quote with a 
hands-on and practical set of commands to clarify this important distinction.
 
-Let's simulate an observation of the sky, but without any astronomical sources!
-In other words, we only have a background flux level (from the sky emission).
+@quotation
+The standard deviation of the sample data is a description of the variation in 
measurements, while the standard error of the mean is a probabilistic statement 
about how the sample size will provide a better bound on estimates of the 
population mean, in light of the central limit theorem. Put simply, the 
@emph{standard error} of the sample mean is an estimate of how far the sample 
mean is likely to be from the population mean, whereas the @emph{standard 
deviation} of the sample is the deg [...]
+@author Wikipedia page on ``Standard error'' (retrieved on 2024-08-09)
+@end quotation
+
+Let's simulate an observation of the sky, but without any astronomical sources 
to simplify the logic.
+In other words, we only have a background flux level (assume you want to 
measure the brightness of the twilight sky that is not yet faint enough to let 
stars be visible).
 With the first command below, let's make an image called @file{1.fits} that 
contains @mymath{200\times200} pixels that are filled with random noise from a 
Poisson distribution with a mean of 100 counts (the flux from the background 
sky).
 With the second command, we'll have a look at the image.
 Recall that the Poisson distribution is equal to a normal distribution for 
large mean values (as in this case).
@@ -29598,11 +29605,11 @@ $ astscript-fits-view 1.fits
 @end example
 
 The standard deviation (@mymath{\sigma}) of the Poisson distribution is the 
square root of the mean, see @ref{Photon counting noise}.
-Note that due to the random nature of the noise, the values reported in the 
next steps on your computer will be very slightly different.
+Note that due to the random nature of the noise, the values reported in the 
next steps on your computer will be slightly different (which is one of the 
points of this section!).
 To reproducible exactly the same values in different runs, see @ref{Generating 
random numbers}, and for more on the first command, see @ref{Arithmetic}.
 
 Each pixel shows the result of one sampling from the Poisson distribution.
-In other words, assuming the sky emission in our simulation is constant over 
our field of view, each pixel's value shows one measurement of the sky emission.
+In other words, assuming the sky emission in our simulation is constant over 
our field of view, each pixel's value shows one measurement of the same sky 
emission.
 Statistically speaking, a ``measurement'' is a sampling from an underlying 
distribution of values.
 Through our measurements, we aim to identify that underlying distribution (the 
``truth'')!
 With the command below, let's look at the pixel statistics of @file{1.fits} 
(output is shown immediately under it).
@@ -29638,12 +29645,15 @@ Histogram:
 @end example
 
 As expected, you see that the ASCII histogram nicely resembles a normal 
distribution.
-The measured mean and standard deviation (@mymath{\sigma_x}) are also very 
similar to the input (mean of 100, standard deviation of @mymath{\sigma=10}).
+The measured mean and standard deviation are also very similar to the input 
(mean of 100, standard deviation of 10).
 But the measured mean (and standard deviation) aren't exactly equal to the 
input!
 
 Every time we make a different simulated image from the same distribution, the 
measured mean and standard deviation will slightly differ.
-With the second command below, let's build 500 images like above and measure 
their mean and standard deviation.
-The outputs will be written into a file (@file{mean-stds.txt}; in the first 
command we are deleting it to make sure we write into an empty file within the 
loop).
+Run the commands above one more time (this time calling the output 
@file{2.fits}) and check for your self (actually open the two FITS images and 
check visually, don't just rely on the statistics).
+
+Now that you have a good feeling of the change, let's automate this and scale 
it up for some nice statistics.
+With the commands below, you will build 500 images like above and measure 
their mean and standard deviation and save each measurment into a file 
(@file{mean-stds.txt}.
+In the first command we are deleting it to make sure we write into an empty 
file within the first run of the loop.
 With the third command, let's view the top 10 rows:
 
 @example
@@ -29668,7 +29678,8 @@ $ asttable mean-stds.txt -Y --head=10
 100.000212              9.970293
 @end example
 
-From this table, you see that each simulation has produced a slightly 
different measured mean and measured standard deviation (@mymath{\sigma_x}) 
that are just fluctuating around the input mean (which was 100) and input 
standard deviation (@mymath{\sigma=10}).
+From this table, you see that each simulation has produced a slightly 
different measured mean and measured standard deviation.
+They are just fluctuating around the input mean (which was 100) and input 
standard deviation (which was 10).
 Let's have a look at the distribution of mean measurements:
 
 @example
@@ -29703,48 +29714,46 @@ Histogram:
 @end example
 
 @cindex Standard error of mean
-The standard deviation of the various mean measurements above shows the 
scatter in measuring the mean with an image of this size from this underlying 
distribution.
-This is therefore defined as the @emph{standard error of the mean}, or 
``error'' for short (since most measurements are actually the mean of a 
population) and shown with @mymath{\widehat\sigma_{\bar{x}}}.
+The standard deviation you see above (approximately @mymath{0.05}) shows the 
scatter in measuring the mean with an image of this size and is different from 
the standard deviation of the Poisson distribution that the values were drawn 
from.
+This is therefore defined as the @emph{standard error of the mean}, or 
``standard error'' for short (since most measurements are actually the mean of 
a population).
 
 From the example above, you see that the error is smaller than the standard 
deviation (smaller when you have a larger sample).
-In fact, @url{https://en.wikipedia.org/wiki/Standard_error#Derivation, it can 
be shown} that this ``error of the mean'' (@mymath{\sigma_{\bar{x}}}) is 
related to the distribution standard deviation (@mymath{\sigma}) through the 
following equation.
-Where @mymath{N} is the number of points used to measure the mean in one 
sample (@mymath{200\times200=40000} in this case).
+In fact, @url{https://en.wikipedia.org/wiki/Standard_error#Derivation, it can 
be shown} that this ``error of the mean'' (@mymath{\sigma_{\bar{x}}}; recall 
that @mymath{\bar{x}} represents the mean) is related to the distribution 
standard deviation (@mymath{\sigma}) through the following equation.
+Where @mymath{N} is the number of points used to measure the mean in one 
sample (@mymath{N=200\times200=40000} in this case).
 Note that the @mymath{10.0} below was reported as ``standard deviation'' in 
the first run of @code{aststatistics} on @file{1.fits} above):
 
-@c The 10.0 depends on the 'aststatistics 1.fits' command above.
-@dispmath{\sigma_{\bar{x}}=\frac{\sigma}{\sqrt{N}} \quad\quad {\rm or} 
\quad\quad \widehat\sigma_{\bar{x}}\approx\frac{\sigma_x}{\sqrt{N}} = 
\frac{10.0}{200} = 0.05}
+@dispmath{\sigma_{\bar{x}}\approx\frac{\sigma}{\sqrt{N}} = \frac{10.0}{200} = 
0.05}
 
-@noindent
-Taking the considerations above into account, we should clearly distinguish 
the following concepts when talking about the standard deviation or error:
+Therefore the standard error of the mean is directly related to the number of 
pixels you used to measure the mean
+You can test this by changing the @code{200}s in the commands above to smaller 
or larger values.
+As you make larger and larger images, you will be able to measure the mean 
much more precisely (the standard error of the mean will go to zero).
+But no matter how many pixels you use, the standard deviation will always be 
the same.
 
+Within MakeCatalog, the options related to dispersion/error in the 
measurements have the following conventions:
 @table @asis
-@item Standard deviation of population
-This is the standard deviation of the underlying distribution (10 in the 
example above), and shown by @mymath{\sigma}.
-This is something you can never measure, and is just the ideal value.
-
-@item Standard deviation of mean
-Ideal error of measuring the mean (assuming we know @mymath{\sigma}).
-
-@item Standard deviation of sample (i.e., @emph{Standard deviation})
-Measured Standard deviation from a sampling of the ideal distribution.
-This is the second column of @file{mean-stds.txt} above and is shown with 
@mymath{\sigma_x} above.
-In astronomical literature, this is simply referred to as the ``standard 
deviation''.
-
-In other words, the standard deviation is computed on the input itself and 
MakeCatalog just needs a ``values'' file.
-For example, when measuring the standard deviation of an astronomical object 
using MakeCatalog it is computed directly from the input values.
-
-@item Standard error (i.e., @emph{error})
-Measurable scatter of measuring the mean (@mymath{\widehat\sigma_{\bar{x}}}) 
that can be estimated from the size of the sample and the measured standard 
deviation (@mymath{\sigma_x}).
-In astronomical literature, this is simply referred to as the ``error''.
-
-In other words, when asking for an ``error'' measurement with MakeCatalog, a 
separate standard deviation dataset should be always provided.
-This dataset should take into account all sources of scatter.
-For example, during the reduction of an image, the standard deviation dataset 
should take into account the dispersion of each pixel that comes from the bias, 
dark, flat fielding, etc.
-If this image is not available, it is possible to use the  @code{SKY_STD} 
extension from NoiseChisel as an estimation.
-For more see @ref{NoiseChisel output}.
+@item @option{--*std}
+For example @option{--std} or @option{--sigclip-std}.
+These return the standard deviation of the values within a label.
+If the underlying object (in the noise) is flat, then this will be the 
@option{\sigma} that is mentioned above.
+However, no object in astronomy in flat!
+So this option should be used with extreme care!
+It only makes sense in special contexts like measuring the radial profile 
where we assume that the values at a certain radius have the same flux (see 
@ref{Generate radial profile}).
+
+@item @option{--*error}
+For example @option{--mag-error}, @option{--mean-error} or 
@option{--sum-error}.
+These options should be used when when pixel values are different.
+When the pixels do not have the same value (for example different parts of one 
galaxy), their standard deviation is meaningless.
+To measure the total error in such cases, we need to know the standard 
deviation of each pixel separately.
+Therefore, for these columns, MakeCatalog needs a separate dataset that 
contains the underlying sky standard deviation for those pixels.
+That dataset should have the same size (number and dimension of pixels) as the 
values dataset.
+You can use @ref{NoiseChisel} to generate such an image.
+
+If the underlying profile and sky standard deviations is flat, then 
@option{--sum-error} will be the standard deviation that we discussed in this 
section and @option{--mean-error} will be the standard error.
+When the values are different, the combined error is calculated by adding the 
variances (second power of the standard deviation) of each pixel, added with 
its value.
+When the values are smaller than one a correction is applied (that is defined 
in Section 3.3 of Akhlaghi and Ichikawa @url{https://arxiv.org/abs/1505.01664, 
2015}).
 @end table
 
-@node Magnitude measurement error of each detection, Surface brightness error 
of each detection, Standard deviation vs error, Quantifying measurement limits
+@node Magnitude measurement error of each detection, Surface brightness error 
of each detection, Standard deviation vs Standard error, Quantifying 
measurement limits
 @subsubsection Magnitude measurement error of each detection
 The raw error in measuring the magnitude is only meaningful when the object's 
magnitude is brighter than the upper-limit magnitude (see below).
 As discussed in @ref{Brightness flux magnitude}, the magnitude (@mymath{M}) of 
an object with brightness @mymath{B} and zero point magnitude @mymath{z} can be 
written as:
@@ -30415,6 +30424,7 @@ Within an image, pixels have both a position and a 
value.
 In the sections above all the measurements involved position (see 
@ref{Position measurements in pixels} or @ref{Position measurements in WCS}).
 The measurements in this section only deal with pixel values and ignore the 
pixel positions completely.
 In other words, for the options of this section each labeled region within the 
input is just a group of values (and their associated error values if 
necessary), and they let you do various types of measurements on the resulting 
distribution of values.
+For more on the difference between the @option{--*error} or @option{--*std} 
columns see @ref{Standard deviation vs Standard error}.
 
 @table @option
 @item --sum
@@ -30426,8 +30436,14 @@ So the sum of all the clump-sums in the clump catalog 
of one object will be smal
 If no usable pixels are present over the clump or object (for example, they 
are all blank), the returned value will be NaN (note that zero is meaningful).
 
 @item --sum-error
-The error in measuring the sum of values of a label (objects or clumps).
+The standard deviation of the sum of values of a label (objects or clumps).
 The value is calculated by using the values image (for signal above the sky 
level) and the sky standard deviation image (extension @option{--stdhdu} of 
file given to @option{--instd}); which you can derive for any image using 
@ref{NoiseChisel}.
+This column is internally used to measure the signal-to-noise (@option{--sn}).
+
+For objects this is calculated by adding the sky variance (second power of the 
sky standard deviation image) of each pixel in the label, with the value of the 
pixel if the value is not negative (error only increases).
+This is done to account for brighter pixels which have higher noise in the 
Poisson distribution (its side effect is that the error will always be slightly 
over-estimated due to the positive values close to the noise).
+A correction may be applied if the sky standard deviation is negative; see 
Section 3.3 of Akhlaghi & Ichikawa @url{https://arxiv.org/abs/1505.01664, 2015}.
+For clumps, the variance of the rivers (which are subtracted from the value of 
pixels in calclulating the sum) are also added to generate the final standard 
deviation.
 
 The returned value will be NaN when the label covers only NaN pixels in the 
values image, or a pixel is NaN in the @option{--instd} image, but non-NaN in 
the values image.
 The latter situation usually happens when there is a bug in the previous steps 
of your analysis.
@@ -30450,6 +30466,10 @@ If no usable pixels are present over the clump or 
object (for example, they are
 The mean sky subtracted value of pixels within the object or clump.
 For clumps, the average river flux is subtracted from the sky subtracted mean.
 
+@item --mean-error
+The error in measuring the mean; using both the values file and the sky 
standard deviation image.
+In case the given standard deviation or variance image already contains the 
contributions from the pixel values (it is not just the sky standard 
deviation), use @option{--novalinerror}).
+
 @item --std
 The standard deviation of the pixels within the object or clump.
 For clumps, the river pixels are not subtracted because that is a constant 
(per pixel) value and should not affect the standard deviation.
@@ -30957,7 +30977,7 @@ For the full list of available measurements, see 
@ref{MakeCatalog measurements}.
 
 The ``values'' dataset is used for measurements like brightness/magnitude, or 
flux-weighted positions.
 If it is a real image, by default it is assumed to be already Sky-subtracted 
prior to running MakeCatalog.
-If it is not, you use the @option{--subtractsky} option to, so MakeCatalog 
reads and subtracts the Sky dataset before any processing.
+If it is not, you should use the @option{--subtractsky} option so MakeCatalog 
reads and subtracts the Sky dataset before any processing.
 To obtain the Sky value, you can use the @option{--sky} option of 
@ref{Statistics}, but the best recommended method is @ref{NoiseChisel}, see 
@ref{Sky value}.
 
 MakeCatalog can also do measurements on sub-structures of detections.
@@ -31044,7 +31064,11 @@ Therefore if the input standard deviation (or 
variance) image also contains the
 The HDU of the Sky value standard deviation image.
 
 @item --variance
-The dataset given to @option{--instd} (and @option{--stdhdu} has the Sky 
variance of every pixel, not the Sky standard deviation.
+The value/file given to @option{--instd} (and @option{--stdhdu} has the Sky 
variance of every pixel, not the Sky standard deviation.
+
+@item --novalinerror
+The value/file given to @option{--instd} is not just due to the sky (noise), 
but also contains the contribution of the signal to each pixel's standard 
deviation or variance.
+If this option is given, the pixel values will be ignored when measuring the 
@option{--*-error} columns.
 
 @item --forcereadstd
 Read the input STD image even if it is not required by any of the requested 
columns.
[Prev in Thread]
Current Thread
[Next in Thread]
[gnuastro-commits] master 45775c65: MakeCatalog: new --mean-error option and re-write of std vs. std error, Mohammad Akhlaghi <=
Prev by Date: [gnuastro-commits] master 7d61e871: Book: further edits on brightness, flux and magnitude section
Previous by thread: [gnuastro-commits] master 7d61e871: Book: further edits on brightness, flux and magnitude section
Index(es):
- Date
- Thread