[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
join with header line support
From: |
Assaf Gordon |
Subject: |
join with header line support |
Date: |
Fri, 30 Oct 2009 19:02:53 -0400 |
User-agent: |
Mozilla-Thunderbird 2.0.0.22 (X11/20090707) |
Hello,
I'd like to suggest a small feature for 'join':
"--header" makes join join the first line from each file regardless of the join
field and ordering.
This allows joining files which have header lines in them.
Example:
===============
$ cat 1.txt
ID Color Name
1 green Alice
2 red Bob
3 blue Carol
4 black Dave
$ cat 2.txt
ID Age
2 55
4 24
$ join --check-order --header -j 1 -a 1 -e unknown -o "0 1.3 2.2" 1.txt 2.txt
ID Name Age
1 Alice unknown
2 Bob 55
3 Carol unknown
4 Dave 24
===============
Although the above can be accomplished by using several other utilities (cut, head,
paste, sed or similar combination), having this feature built-in in join makes life a lot
easier - especially if I'm joining severals files ( using pipes ), or using specific
output fields (with "-o") - join will thus take care of extracting the right
field header into the header line.
The following patch adds the "--header" feature. If "--header" is not used -
there are no changes to the regular program flow.
Comments are welcomed. This patch is released under GPLv3 or later.
If you're willing to accept this patch, I'll be happy to assign copyright to
GNU, etc.
thanks,
gordon
=============================
--- join.orig.c 2009-09-23 04:25:44.000000000 -0400
+++ join.c 2009-10-30 19:00:01.000000000 -0400
@@ -146,6 +146,7 @@ static struct option const longopts[] =
{"ignore-case", no_argument, NULL, 'i'},
{"check-order", no_argument, NULL, CHECK_ORDER_OPTION},
{"nocheck-order", no_argument, NULL, NOCHECK_ORDER_OPTION},
+ {"header", no_argument, NULL, 'H'},
{GETOPT_HELP_OPTION_DECL},
{GETOPT_VERSION_OPTION_DECL},
{NULL, 0, NULL, 0}
@@ -157,6 +158,10 @@ static struct line uni_blank;
/* If nonzero, ignore case when comparing join fields. */
static bool ignore_case;
+/* If nonzero, treat the first line of each file as column headers -
+ join them without checking for ordering */
+static bool join_header_lines;
+
void
usage (int status)
{
@@ -191,6 +196,7 @@ by whitespace. When FILE1 or FILE2 (not
--check-order check that the input is correctly sorted, even\n\
if all input lines are pairable\n\
--nocheck-order do not check that the input is correctly sorted\n\
+ --header treat first line in each file as field header line.\n\
"), stdout);
fputs (HELP_OPTION_DESCRIPTION, stdout);
fputs (VERSION_OPTION_DESCRIPTION, stdout);
@@ -616,6 +622,15 @@ join (FILE *fp1, FILE *fp2)
initseq (&seq2);
getseq (fp2, &seq2, 2);
+ if (join_header_lines && seq1.count && seq2.count)
+ {
+ prjoin(seq1.lines[0], seq2.lines[0]);
+ prevline[0] = NULL ;
+ prevline[1] = NULL ;
+ advance_seq (fp1, &seq1, true, 1);
+ advance_seq (fp2, &seq2, true, 2);
+ }
+
while (seq1.count && seq2.count)
{
size_t i;
@@ -1052,6 +1067,10 @@ main (int argc, char **argv)
&nfiles, &prev_optc_status, &optc_status);
break;
+ case 'H':
+ join_header_lines = true ;
+ break;
+
case_GETOPT_HELP_CHAR;
case_GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS);
- join with header line support,
Assaf Gordon <=