[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[gnuastro-commits] master 7354e33 09/19: Library (match.h): match_coordi
From: |
Mohammad Akhlaghi |
Subject: |
[gnuastro-commits] master 7354e33 09/19: Library (match.h): match_coordinate_ replaced by match_sort_based_ |
Date: |
Sun, 14 Nov 2021 20:40:59 -0500 (EST) |
branch: master
commit 7354e3361899e4c0481b48ba09123cc1767f77a4
Author: Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Library (match.h): match_coordinate_ replaced by match_sort_based_
Until now, there was only a single match algorithm in Gnuastro, so the name
of the respective functions in the library had a 'match_coordinate_'
prefix. However, in this branch we are adding a new k-d tree based
matching, so that name for the initial algorithm could cause confusion
(because k-d tree also uses coordinates!).
With this commit, those same function names are now prefixed with
'match_sort_based_'. This change is done in all the relevant places:
bin/match/match.c, lib/gnuastro/match.h, lib/match.c and in
doc/gnuastro.texi.
Furthermore, a 'make check' test has been added for the k-dtree based
matching which does simple matching based on the predefined inputs used for
the old matching algorithm.
Finally, some scripts have been added in the temporary directory for
benchmarking tests to see the efficiency of k-d tree over sort-based
matching. Two scripts are written in '/during-dev-test-data/scripts'
(namely 'kdtree-gen.py', which generates the pseudo random output tests,
and 'benchmark.py') which will calculate the run time for the scripts and
generate a graph for number of times the program is executed vs. time
taken. The script for benchmarking is simple and might take long time for
>10 executions. A possible efficient method is multithreading, which will
be implemented soon. After this we can implement the --kdtree=automatic
option based on the result.
---
bin/match/match.c | 7 +--
doc/gnuastro.texi | 2 +-
during-dev-test-data/match-query.txt | 13 ++++--
during-dev-test-data/scripts/benchmark.py | 69 ++++++++++++++++++++++++++++++
during-dev-test-data/scripts/kdtree-gen.py | 31 ++++++++++++++
lib/gnuastro/match.h | 3 +-
lib/match.c | 31 +++++++-------
tests/Makefile.am | 4 +-
tests/match/kdtree-match.sh | 59 +++++++++++++++++++++++++
9 files changed, 195 insertions(+), 24 deletions(-)
diff --git a/bin/match/match.c b/bin/match/match.c
index f421e0a..c398146 100644
--- a/bin/match/match.c
+++ b/bin/match/match.c
@@ -5,6 +5,7 @@ Match is part of GNU Astronomy Utilities (Gnuastro) package.
Original author:
Mohammad Akhlaghi <mohammad@akhlaghi.org>
Contributing author(s):
+ Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
Copyright (C) 2017-2021, Free Software Foundation, Inc.
Gnuastro is free software: you can redistribute it and/or modify it
@@ -497,7 +498,7 @@ match_catalog_kdtree_auto(struct matchparams *p)
/* Wrapper over the k-d tree library to return an output in the same format
- as 'gal_match_coordinates'. */
+ as 'gal_match_sort_based'. */
static gal_data_t *
match_catalog_kdtree(struct matchparams *p, size_t *nummatched)
{
@@ -563,7 +564,7 @@ match_catalog(struct matchparams *p)
place. Incase it was decided not to use a k-d tree at all
(in 'automatic' mode), then we need to use the classic mode. */
if(mcols==NULL)
- mcols=gal_match_coordinates(p->cols1, p->cols2, p->aperture->array,
+ mcols=gal_match_sort_based(p->cols1, p->cols2, p->aperture->array,
0, 1, p->cp.minmapsize, p->cp.quietmmap,
&nummatched);
@@ -618,7 +619,7 @@ match_catalog(struct matchparams *p)
mcols=tmp;
/* We also want everything to be incremented by one. In a C
- program, counting starts with zero, so 'gal_match_coordinates'
+ program, counting starts with zero, so 'gal_match_sort_based'
will return indexs starting from zero. But outside a C
program, on the command-line people expect counting to start
from 1 (for example with AWK). */
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index 5ddd4bb..cb1accb 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -29375,7 +29375,7 @@ When the aperture is an ellipse, distances between the
points are also calculate
-@deftypefun {gal_data_t *} gal_match_coordinates (gal_data_t @code{*coord1},
gal_data_t @code{*coord2}, double @code{*aperture}, int @code{sorted_by_first},
int @code{inplace}, size_t @code{minmapsize}, int @code{quietmmap}, size_t
@code{*nummatched})
+@deftypefun {gal_data_t *} gal_match_sort_based (gal_data_t @code{*coord1},
gal_data_t @code{*coord2}, double @code{*aperture}, int @code{sorted_by_first},
int @code{inplace}, size_t @code{minmapsize}, int @code{quietmmap}, size_t
@code{*nummatched})
Use a basic sort-based match to find the matching points of two input
coordinates.
See the descriptions above on the format of the inputs and outputs.
diff --git a/during-dev-test-data/match-query.txt
b/during-dev-test-data/match-query.txt
index f88c129..394bf51 100644
--- a/during-dev-test-data/match-query.txt
+++ b/during-dev-test-data/match-query.txt
@@ -1,3 +1,10 @@
-7.2 2.1
-7.1 2.01
-4.2 7.01
+14.55495528995644 13.952740903340256 14.512879995960738
+14.156180015003525 12.80845342418684 14.826212427784835
+13.505267742341026 13.013443271481133 11.183241471767612
+13.09715467421369 10.936844869289065 10.692733635319755
+11.855001056969742 14.100516383294387 11.839440238220615
+11.766844757741728 10.752701002494069 12.958678702895487
+12.455862588902148 10.943892162638429 11.946732314414476
+13.386374627834787 11.49741032889326 13.000603833689892
+14.49239538261452 11.133665168041762 12.959051224814937
+10.968887621940603 13.55609990177711 14.863758607257214
diff --git a/during-dev-test-data/scripts/benchmark.py
b/during-dev-test-data/scripts/benchmark.py
new file mode 100755
index 0000000..830e642
--- /dev/null
+++ b/during-dev-test-data/scripts/benchmark.py
@@ -0,0 +1,69 @@
+#! /usr/bin/env python
+# Do benchmark test and generate resulting graphs.
+# TODO: Use multithreading for faster parallel execution.
+
+import time
+import subprocess
+import matplotlib.pyplot as plt
+
+
+######## Set the variable and execute the script. #########
+script = "/home/sachin/gnuastro_dev/gnuastro/tests/during-dev.sh"
+maximum_executions = 10
+maximun_jump_factor = 10
+graph_name = "KD-tree"
+###########################################################
+
+
+# Wrapper function for calculating runtime.
+def timed_execution(function):
+ def wrapper(args):
+ start = time.time()
+ function(args)
+ end = time.time()
+ return (end-start)
+ return wrapper
+
+
+@timed_execution
+def execute_script(script_):
+ subprocess.call([script_], stdout=subprocess.DEVNULL,
stderr=subprocess.STDOUT)
+
+
+# Make the list of number of times the script is to be executed.
+def make_num_execution_list(num, jump_factor):
+ num_execution_list = []
+ i = 1
+ while(i <= num):
+ num_execution_list.append(i)
+ i *= jump_factor
+
+ return num_execution_list
+
+
+# Make the list for the time taken by the script to run for a particular
number of times.
+def make_time_list(num_execution_list):
+ time_list = []
+ for num in num_execution_list:
+ time_taken = 0
+ for _ in range(int(num)):
+ t = execute_script(script)
+ time_taken += t
+ time_list.append(time_taken)
+
+ return time_list
+
+
+# Make a graph between the number of times the script is executed and the time
taken for it.
+def make_graph(name, time_list, num_execution_list):
+ plt.plot(num_execution_list, time_list, marker='o', markerfacecolor='red',
markersize=10)
+ plt.xlabel("number of executions")
+ plt.ylabel("time required for executions")
+ plt.title(name)
+ plt.show()
+
+
+# Interface
+if __name__ == "__main__":
+ num_execution_list = make_num_execution_list(maximum_execution,
maximum_jump_factor)
+ make_graph(graph_name, make_time_list(num_execution_list),
num_execution_list)
diff --git a/during-dev-test-data/scripts/kdtree-gen.py
b/during-dev-test-data/scripts/kdtree-gen.py
new file mode 100755
index 0000000..8583506
--- /dev/null
+++ b/during-dev-test-data/scripts/kdtree-gen.py
@@ -0,0 +1,31 @@
+#! /usr/bin/env python
+# Generate random floating numbers to make a kd-tree.
+# Usage: ./kdtree-gen.py [filename] [number of rows] [number of columns]
+
+import sys
+import random
+
+
+if len(sys.argv) == 1:
+ row_num = 10
+ col_num = 2
+else:
+ row_num = sys.argv[2]
+ col_num = sys.argv[3]
+
+
+def format_row(ncols=col_num, seed=10):
+ row = ""
+ for _ in range(int(ncols)):
+ col = random.uniform(seed, seed+5)
+ row += f"{col}\t"
+ row += "\n"
+
+ return row
+
+
+if __name__ == "__main__":
+ with open(f"../{sys.argv[1]}", "w+") as input_file:
+ for _ in range(int(row_num)):
+ input_file.write(format_row())
+
diff --git a/lib/gnuastro/match.h b/lib/gnuastro/match.h
index 1921464..c0b3c44 100644
--- a/lib/gnuastro/match.h
+++ b/lib/gnuastro/match.h
@@ -5,6 +5,7 @@ This is part of GNU Astronomy Utilities (Gnuastro) package.
Original author:
Mohammad Akhlaghi <mohammad@akhlaghi.org>
Contributing author(s):
+ Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
Copyright (C) 2017-2021, Free Software Foundation, Inc.
Gnuastro is free software: you can redistribute it and/or modify it
@@ -45,7 +46,7 @@ along with Gnuastro. If not, see
<http://www.gnu.org/licenses/>.
gal_data_t *
-gal_match_coordinates(gal_data_t *coord1, gal_data_t *coord2,
+gal_match_sort_based(gal_data_t *coord1, gal_data_t *coord2,
double *aperture, int sorted_by_first,
int inplace, size_t minmapsize, int quietmmap,
size_t *nummatched);
diff --git a/lib/match.c b/lib/match.c
index 6a0f9fb..dbea82d 100644
--- a/lib/match.c
+++ b/lib/match.c
@@ -5,6 +5,7 @@ This is part of GNU Astronomy Utilities (Gnuastro) package.
Original author:
Mohammad Akhlaghi <mohammad@akhlaghi.org>
Contributing author(s):
+ Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
Copyright (C) 2017-2021, Free Software Foundation, Inc.
Gnuastro is free software: you can redistribute it and/or modify it
@@ -47,7 +48,7 @@ along with Gnuastro. If not, see
<http://www.gnu.org/licenses/>.
/**********************************************************************/
-/***************** Coordinate match custom list *******************/
+/***************** Sort-Based match custom list *******************/
/**********************************************************************/
struct match_sfll
{
@@ -501,12 +502,12 @@ gal_match_output(gal_data_t *A, gal_data_t *B, size_t
*A_perm,
/********************************************************************/
-/************* Coordinate matching *************/
+/************* Sort-Based matching *************/
/********************************************************************/
/* Since these checks are repetative, its easier to have a separate
function for both inputs. */
static void
-match_coordinate_sanity_check_columns(gal_data_t *coord, char *info,
+match_sort_based_sanity_check_columns(gal_data_t *coord, char *info,
int inplace, int *allf64)
{
gal_data_t *tmp;
@@ -545,7 +546,7 @@ match_coordinate_sanity_check_columns(gal_data_t *coord,
char *info,
/* To keep the main function clean, we'll do the sanity checks here. */
static void
-match_coordinates_sanity_check(gal_data_t *coord1, gal_data_t *coord2,
+match_sort_based_sanity_check(gal_data_t *coord1, gal_data_t *coord2,
double *aperture, int inplace, int *allf64)
{
size_t ncoord1=gal_list_data_number(coord1);
@@ -565,8 +566,8 @@ match_coordinates_sanity_check(gal_data_t *coord1,
gal_data_t *coord2,
"maximum of 3 dimensions", __func__, ncoord1);
/* Check the column properties. */
- match_coordinate_sanity_check_columns(coord1, "first", inplace, allf64);
- match_coordinate_sanity_check_columns(coord2, "second", inplace, allf64);
+ match_sort_based_sanity_check_columns(coord1, "first", inplace, allf64);
+ match_sort_based_sanity_check_columns(coord2, "second", inplace, allf64);
/* Check the aperture values. */
if(aperture[0]<=0)
@@ -607,7 +608,7 @@ match_coordinates_sanity_check(gal_data_t *coord1,
gal_data_t *coord2,
/* To keep things clean, the sorting of each input array will be done in
this function. */
static size_t *
-match_coordinates_prepare_sort(gal_data_t *coords, size_t minmapsize)
+match_sort_based_prepare_sort(gal_data_t *coords, size_t minmapsize)
{
size_t i;
double *darr;
@@ -662,7 +663,7 @@ match_coordinates_prepare_sort(gal_data_t *coords, size_t
minmapsize)
/* Do the preparations for matching of coordinates. */
static void
-match_coordinates_prepare(gal_data_t *coord1, gal_data_t *coord2,
+match_sort_based_prepare(gal_data_t *coord1, gal_data_t *coord2,
int sorted_by_first, int inplace, int allf64,
gal_data_t **A_out, gal_data_t **B_out,
size_t **A_perm, size_t **B_perm,
@@ -714,8 +715,8 @@ match_coordinates_prepare(gal_data_t *coord1, gal_data_t
*coord2,
}
/* Sort each dataset by the first coordinate. */
- *A_perm = match_coordinates_prepare_sort(*A_out, minmapsize);
- *B_perm = match_coordinates_prepare_sort(*B_out, minmapsize);
+ *A_perm = match_sort_based_prepare_sort(*A_out, minmapsize);
+ *B_perm = match_sort_based_prepare_sort(*B_out, minmapsize);
}
}
@@ -727,7 +728,7 @@ match_coordinates_prepare(gal_data_t *coord1, gal_data_t
*coord2,
catalog (catalog b) are within the acceptable distance of each record in
the first (a). */
static void
-match_coordinates_second_in_first(gal_data_t *A, gal_data_t *B,
+match_sort_based_second_in_first(gal_data_t *A, gal_data_t *B,
double *aperture,
struct match_sfll **bina)
{
@@ -883,7 +884,7 @@ match_coordinates_second_in_first(gal_data_t *A, gal_data_t
*B,
Node 2: Second catalog index (counting from zero).
Node 3: Distance between the match. */
gal_data_t *
-gal_match_coordinates(gal_data_t *coord1, gal_data_t *coord2,
+gal_match_sort_based(gal_data_t *coord1, gal_data_t *coord2,
double *aperture, int sorted_by_first,
int inplace, size_t minmapsize, int quietmmap,
size_t *nummatched)
@@ -895,9 +896,9 @@ gal_match_coordinates(gal_data_t *coord1, gal_data_t
*coord2,
/* Do a small sanity check and make the preparations. After this point,
we'll call the two arrays 'a' and 'b'.*/
- match_coordinates_sanity_check(coord1, coord2, aperture, inplace,
+ match_sort_based_sanity_check(coord1, coord2, aperture, inplace,
&allf64);
- match_coordinates_prepare(coord1, coord2, sorted_by_first, inplace,
+ match_sort_based_prepare(coord1, coord2, sorted_by_first, inplace,
allf64, &A, &B, &A_perm, &B_perm,
minmapsize);
@@ -914,7 +915,7 @@ gal_match_coordinates(gal_data_t *coord1, gal_data_t
*coord2,
/* All records in 'b' that match each 'a' (possibly duplicate). */
- match_coordinates_second_in_first(A, B, aperture, bina);
+ match_sort_based_second_in_first(A, B, aperture, bina);
/* Two re-arrangings will fix the issue. */
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 31182ff..0240bb4 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -126,10 +126,12 @@ if COND_FITS
fits/copyhdu.sh: fits/write.sh.log mkprof/mosaic2.sh.log
endif
if COND_MATCH
- MAYBE_MATCH_TESTS = match/positions.sh match/merged-cols.sh
+ MAYBE_MATCH_TESTS = match/positions.sh match/merged-cols.sh \
+ match/kdtree-match.sh
match/positions.sh: prepconf.sh.log
match/merged-cols.sh: prepconf.sh.log
+ match/kdtree-match.sh: prepconf.sh.log
endif
if COND_MKCATALOG
MAYBE_MKCATALOG_TESTS = mkcatalog/detections.sh mkcatalog/simple-3d.sh \
diff --git a/tests/match/kdtree-match.sh b/tests/match/kdtree-match.sh
new file mode 100644
index 0000000..2e8ac0e
--- /dev/null
+++ b/tests/match/kdtree-match.sh
@@ -0,0 +1,59 @@
+# Match the two input catalogs based on kdtree matching
+#
+# See the Tests subsection of the manual for a complete explanation
+# (in the Installing gnuastro section).
+#
+# Original author:
+# Sachin Kumar Singh <sachinkumarsingh092@gmail.com>
+# Contributing author(s):
+# Copyright (C) 2015-2021, Free Software Foundation, Inc.
+#
+# Copying and distribution of this file, with or without modification,
+# are permitted in any medium without royalty provided the copyright
+# notice and this notice are preserved. This file is offered as-is,
+# without any warranty.
+
+
+
+
+
+# Preliminaries
+# =============
+#
+# Set the variables (The executable is in the build tree). Do the
+# basic checks to see if the executable is made or if the defaults
+# file exists (basicchecks.sh is in the source tree).
+prog=match
+execname=../bin/$prog/ast$prog
+cat1=$topsrc/tests/$prog/positions-1.txt
+cat2=$topsrc/tests/$prog/positions-2.txt
+
+
+
+
+
+# Skip?
+# =====
+#
+# If the dependencies of the test don't exist, then skip it. There are two
+# types of dependencies:
+#
+# - The executable was not made (for example due to a configure option),
+#
+# - The input data was not made (for example the test that created the
+# data file failed).
+if [ ! -f $execname ]; then echo "$execname not created."; exit 77; fi
+
+
+
+
+
+# Actual test script
+# ==================
+#
+# 'check_with_program' can be something like Valgrind or an empty
+# string. Such programs will execute the command if present and help in
+# debugging when the developer doesn't have access to the user's system.
+$check_with_program $execname $cat1 $cat2 --aperture=0.5 --ccol1=2,3 \
+ --ccol2=2,3 --kdtree=internal \
+ --output=match-kdtree.fits
- [gnuastro-commits] master updated (33b7b70 -> f5d7d1a), Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master 0dcaf02 02/19: Match: Added build functionalty for kdtree, Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master c36f753 01/19: Match: Option to work with k-d trees has been defined, Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master e04f4ac 03/19: First implementation of k-d tree matching, not complete, Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master 20ad77d 07/19: Final output for kdtree matching, Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master 7354e33 09/19: Library (match.h): match_coordinate_ replaced by match_sort_based_,
Mohammad Akhlaghi <=
- [gnuastro-commits] master 03d1a25 15/19: Library (fits.h): corrected segmentation fault in parallel reading, Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master 336ddee 04/19: Error in `match_kdtree_worker` in lib/match.c, Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master 2877acb 05/19: Added fits files for testing in `during-dev-test-data`, Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master 48d760d 12/19: Library (match.h): k-d tree method discards regions with no overlap, Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master d0b19b8 10/19: Match: make check script for k-d tree matching now executable, Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master 095c788 13/19: Match: matched rows aren't permuted, but new column allocated, Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master 9e258a8 14/19: Library (fits.h): reading table columns now done in parallel, Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master 29f8b20 16/19: Table and Arithmetic: corrected some memory leaks, Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master 8b17675 06/19: Corrected segmentation fault, included aperture, Mohammad Akhlaghi, 2021/11/14
- [gnuastro-commits] master a7bfa5b 11/19: Match: k-d tree matching UI fixed, to do: finalizing docs and tests, Mohammad Akhlaghi, 2021/11/14