[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#36130: split bug
From: |
Heather Wick |
Subject: |
bug#36130: split bug |
Date: |
Fri, 7 Jun 2019 21:48:44 -0400 |
Hi,
Yes, sorry, I should have specified that I already checked that the
original fastq files are indeed paired and sorted with the same number of
lines and same starting/ending IDs, narrowing down the issue to a problem
with split.
~ Heather
(base) [hwick@zappalogin ~]$ zcat MH2_R2.fastq.gz | wc -l
3778103832
(base) [hwick@zappalogin ~]$ zcat MH2_R1.fastq.gz | wc -l
3778103832
(base) [hwick@zappalogin test_2019]$ zcat MH2_R1.fastq.gz | head -n8 | grep
^@
@A00197:48:HF2GWDMXX:1:1101:1741:1000 1:N:0:GATCAG+TCTTTCCC
@A00197:48:HF2GWDMXX:1:1101:2754:1000 1:N:0:GATCAG+TCTTTCCC
(base) [hwick@zappalogin test_2019]$ zcat MH2_R2.fastq.gz | head -n8 | grep
^@
@A00197:48:HF2GWDMXX:1:1101:1741:1000 2:N:0:GATCAG+TCTTTCCC
@A00197:48:HF2GWDMXX:1:1101:2754:1000 2:N:0:GATCAG+TCTTTCCC
(base) [hwick@zappalogin test_2019]$ zcat MH2_R1.fastq.gz | tail -n8 | grep
^@
@E00489:288:HMFWCCCXY:2:2224:29305:73106 1:N:0:GATCAG
@E00489:288:HMFWCCCXY:2:2224:29325:73106 1:N:0:GATCAG
(base) [hwick@zappalogin test_2019]$ zcat MH2_R2.fastq.gz | tail -n8 | grep
^@
@E00489:288:HMFWCCCXY:2:2224:29305:73106 2:N:0:GATCAG
@E00489:288:HMFWCCCXY:2:2224:29325:73106 2:N:0:GATCAG
On Fri, Jun 7, 2019 at 9:29 PM Assaf Gordon <address@hidden> wrote:
> Hello,
>
> On Fri, Jun 07, 2019 at 02:23:15PM -0400, Heather Wick wrote:
> > I am using split to split up some large, paired fastq files [...]:
> >
> > zcat MH1_R1.fastq.gz | split - -l 40000000 DHT_R1_
> > zcat MH1_R2.fastq.gz | split - -l 40000000 DHT_R2_
> >
> > This creates 96 chunks for the R1 and 95 chunks for R2, even though the
> > orignal fastq files have the same number of reads.
> >
> > Do you have any suggestions for how to proceed? Perhaps zcatting and
> piping
> > the files is not the best way to call split?
>
> To help diagnose to issue better, please run the following commands
> and tell us what are the results:
>
> 1. number of lines in each file:
>
> zcat MH1_R1.fastq.gz | wc -l
> zcat MH1_R2.fastq.gz | wc -l
>
> 2. The first two sequence IDs:
>
> zcat MH1_R1.fastq.gz | head -n8 | grep ^@
> zcat MH1_R2.fastq.gz | head -n8 | grep ^@
>
> 3. Last two sequence IDs:
>
> zcat MH1_R1.fastq.gz | tail -n8 | grep ^@
> zcat MH1_R2.fastq.gz | tail -n8 | grep ^@
>
> These will just verify the FASTQ files are indeed paired with no
> surprises. The files should have the same number of lines,
> and matching sequence IDs in the first and last lines.
>
> regards,
> - assaf
>
>
--
Heather Wick
PhD Candidate, Human Genetics
Labs of Sarah Wheelan and Vasan Yegnasubramanian
Institute of Genetic Medicine
Johns Hopkins University School of Medicine
address@hidden
- bug#36130: split bug, Heather Wick, 2019/06/07
- bug#36130: split bug, Assaf Gordon, 2019/06/07
- bug#36130: split bug,
Heather Wick <=
- bug#36130: split bug, Assaf Gordon, 2019/06/07
- bug#36130: split bug, Heather Wick, 2019/06/10
- bug#36130: split bug, Pádraig Brady, 2019/06/10
- bug#36130: split bug, Assaf Gordon, 2019/06/10
- bug#36130: split bug, Assaf Gordon, 2019/06/26
- bug#36130: split bug, Heather Wick, 2019/06/26