bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#36130: split bug


From: Assaf Gordon
Subject: bug#36130: split bug
Date: Fri, 7 Jun 2019 19:29:42 -0600
User-agent: Mutt/1.11.4 (2019-03-13)

Hello,

On Fri, Jun 07, 2019 at 02:23:15PM -0400, Heather Wick wrote:
> I am using split to split up some large, paired fastq files [...]:
>
>   zcat MH1_R1.fastq.gz | split - -l 40000000 DHT_R1_
>   zcat MH1_R2.fastq.gz | split - -l 40000000 DHT_R2_
>
> This creates 96 chunks for the R1 and 95 chunks for R2, even though the
> orignal fastq files have the same number of reads.
>
> Do you have any suggestions for how to proceed? Perhaps zcatting and piping
> the files is not the best way to call split?

To help diagnose to issue better, please run the following commands
and tell us what are the results:

1. number of lines in each file:

   zcat MH1_R1.fastq.gz | wc -l
   zcat MH1_R2.fastq.gz | wc -l

2. The first two sequence IDs:

   zcat MH1_R1.fastq.gz | head -n8 | grep ^@
   zcat MH1_R2.fastq.gz | head -n8 | grep ^@

3. Last two sequence IDs:

   zcat MH1_R1.fastq.gz | tail -n8 | grep ^@
   zcat MH1_R2.fastq.gz | tail -n8 | grep ^@

These will just verify the FASTQ files are indeed paired with no
surprises. The files should have the same number of lines,
and matching sequence IDs in the first and last lines.

regards,
 - assaf






reply via email to

[Prev in Thread] Current Thread [Next in Thread]