gatk HaplotypeCaller gives me an empty vcf
AnsweredHi, I am using GATK Version 4.0.0 on mouse data.
for fpath in `ls *_bqsr.bam`do
fname=${fpath%_bqsr.bam}
gatk HaplotypeCaller -R ${REF} --emit-ref-confidence GVCF \
-I ${fname}_bqsr.bam -O ${fname}.g.vcf
done
Empty VCF file
##fileformat=VCFv4.2
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --emit-ref-confidence GVCF --output Ghr-0077_chr15.g.vcf --input Ghr-0077_chr15_bqsr.bam --reference mm10.fa --annotation-group StandardAnnotation --annotation-group StandardHCAnnotation --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvcf-gq-bands 8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-gq-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvcf-gq-bands 18 --gvcf-gq-bands 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --gvcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 --gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 --gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvcf-gq-bands 40 --gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq-bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands 51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-bands 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --indel-size-to-eliminate-in-ref-model 10 --use-alleles-trigger false --dont-trim-active-regions false --max-disc-ar-extension 25 --max-gga-ar-extension 300 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --recover-dangling-heads false --do-not-recover-dangling-branches false --min-dangling-branch-length 4 --consensus false --max-num-haplotypes-in-population 128 --error-correct-kmers false --min-pruning 2 --debug-graph-transformations false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --debug false --use-filtered-reads-for-annotations false --bam-writer-type CALLED_HAPLOTYPES --disable-optimizations false --just-determine-active-regions false --dont-genotype false --dont-use-soft-clipped-bases false --capture-assembly-failure-bam false --error-correct-reads false --do-not-run-physical-phasing false --min-base-quality-score 10 --smith-waterman JAVA --use-new-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 10.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --genotyping-mode DISCOVERY --contamination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --disable-tool-default-read-filters false --minimum-mapping-quality 20",Version=4.0.0.0,Date="January 22, 2021 10:01:34 AM EST">
##GVCFBlock0-1=minGQ=0(inclusive),maxGQ=1(exclusive)
##GVCFBlock1-2=minGQ=1(inclusive),maxGQ=2(exclusive)
##GVCFBlock10-11=minGQ=10(inclusive),maxGQ=11(exclusive)
##GVCFBlock11-12=minGQ=11(inclusive),maxGQ=12(exclusive)
##GVCFBlock12-13=minGQ=12(inclusive),maxGQ=13(exclusive)
##GVCFBlock13-14=minGQ=13(inclusive),maxGQ=14(exclusive)
##GVCFBlock14-15=minGQ=14(inclusive),maxGQ=15(exclusive)
##GVCFBlock15-16=minGQ=15(inclusive),maxGQ=16(exclusive)
##GVCFBlock16-17=minGQ=16(inclusive),maxGQ=17(exclusive)
##GVCFBlock17-18=minGQ=17(inclusive),maxGQ=18(exclusive)
##GVCFBlock18-19=minGQ=18(inclusive),maxGQ=19(exclusive)
##GVCFBlock19-20=minGQ=19(inclusive),maxGQ=20(exclusive)
##GVCFBlock2-3=minGQ=2(inclusive),maxGQ=3(exclusive)
##GVCFBlock20-21=minGQ=20(inclusive),maxGQ=21(exclusive)
##GVCFBlock21-22=minGQ=21(inclusive),maxGQ=22(exclusive)
##GVCFBlock22-23=minGQ=22(inclusive),maxGQ=23(exclusive)
##GVCFBlock23-24=minGQ=23(inclusive),maxGQ=24(exclusive)
##GVCFBlock24-25=minGQ=24(inclusive),maxGQ=25(exclusive)
##GVCFBlock25-26=minGQ=25(inclusive),maxGQ=26(exclusive)
##GVCFBlock26-27=minGQ=26(inclusive),maxGQ=27(exclusive)
##GVCFBlock27-28=minGQ=27(inclusive),maxGQ=28(exclusive)
##GVCFBlock28-29=minGQ=28(inclusive),maxGQ=29(exclusive)
##GVCFBlock29-30=minGQ=29(inclusive),maxGQ=30(exclusive)
##GVCFBlock3-4=minGQ=3(inclusive),maxGQ=4(exclusive)
##GVCFBlock30-31=minGQ=30(inclusive),maxGQ=31(exclusive)
##GVCFBlock31-32=minGQ=31(inclusive),maxGQ=32(exclusive)
##GVCFBlock32-33=minGQ=32(inclusive),maxGQ=33(exclusive)
##GVCFBlock33-34=minGQ=33(inclusive),maxGQ=34(exclusive)
##GVCFBlock34-35=minGQ=34(inclusive),maxGQ=35(exclusive)
##GVCFBlock35-36=minGQ=35(inclusive),maxGQ=36(exclusive)
##GVCFBlock36-37=minGQ=36(inclusive),maxGQ=37(exclusive)
##GVCFBlock37-38=minGQ=37(inclusive),maxGQ=38(exclusive)
##GVCFBlock38-39=minGQ=38(inclusive),maxGQ=39(exclusive)
##GVCFBlock39-40=minGQ=39(inclusive),maxGQ=40(exclusive)
##GVCFBlock4-5=minGQ=4(inclusive),maxGQ=5(exclusive)
##GVCFBlock40-41=minGQ=40(inclusive),maxGQ=41(exclusive)
##GVCFBlock41-42=minGQ=41(inclusive),maxGQ=42(exclusive)
##GVCFBlock42-43=minGQ=42(inclusive),maxGQ=43(exclusive)
##GVCFBlock43-44=minGQ=43(inclusive),maxGQ=44(exclusive)
##GVCFBlock44-45=minGQ=44(inclusive),maxGQ=45(exclusive)
##GVCFBlock45-46=minGQ=45(inclusive),maxGQ=46(exclusive)
##GVCFBlock46-47=minGQ=46(inclusive),maxGQ=47(exclusive)
##GVCFBlock47-48=minGQ=47(inclusive),maxGQ=48(exclusive)
##GVCFBlock48-49=minGQ=48(inclusive),maxGQ=49(exclusive)
##GVCFBlock49-50=minGQ=49(inclusive),maxGQ=50(exclusive)
##GVCFBlock5-6=minGQ=5(inclusive),maxGQ=6(exclusive)
##GVCFBlock50-51=minGQ=50(inclusive),maxGQ=51(exclusive)
##GVCFBlock51-52=minGQ=51(inclusive),maxGQ=52(exclusive)
##GVCFBlock52-53=minGQ=52(inclusive),maxGQ=53(exclusive)
##GVCFBlock53-54=minGQ=53(inclusive),maxGQ=54(exclusive)
##GVCFBlock54-55=minGQ=54(inclusive),maxGQ=55(exclusive)
##GVCFBlock55-56=minGQ=55(inclusive),maxGQ=56(exclusive)
##GVCFBlock56-57=minGQ=56(inclusive),maxGQ=57(exclusive)
##GVCFBlock57-58=minGQ=57(inclusive),maxGQ=58(exclusive)
##GVCFBlock58-59=minGQ=58(inclusive),maxGQ=59(exclusive)
##GVCFBlock59-60=minGQ=59(inclusive),maxGQ=60(exclusive)
##GVCFBlock6-7=minGQ=6(inclusive),maxGQ=7(exclusive)
##GVCFBlock60-70=minGQ=60(inclusive),maxGQ=70(exclusive)
##GVCFBlock7-8=minGQ=7(inclusive),maxGQ=8(exclusive)
##GVCFBlock70-80=minGQ=70(inclusive),maxGQ=80(exclusive)
##GVCFBlock8-9=minGQ=8(inclusive),maxGQ=9(exclusive)
##GVCFBlock80-90=minGQ=80(inclusive),maxGQ=90(exclusive)
##GVCFBlock9-10=minGQ=9(inclusive),maxGQ=10(exclusive)
##GVCFBlock90-99=minGQ=90(inclusive),maxGQ=99(exclusive)
##GVCFBlock99-100=minGQ=99(inclusive),maxGQ=100(exclusive)
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##contig=<ID=chrchr1,length=195471971>
##contig=<ID=chrchr10,length=130694993>
##contig=<ID=chrchr11,length=122082543>
##contig=<ID=chrchr12,length=120129022>
##contig=<ID=chrchr13,length=120421639>
##contig=<ID=chrchr14,length=124902244>
##contig=<ID=chr15,length=104043685>
##contig=<ID=chrchr16,length=98207768>
##contig=<ID=chrchr17,length=94987271>
##contig=<ID=chrchr18,length=90702639>
##contig=<ID=chrchr19,length=61431566>
##contig=<ID=chrchr1_GL456210_random,length=169725>
##contig=<ID=chrchr1_GL456211_random,length=241735>
##contig=<ID=chrchr1_GL456212_random,length=153618>
##contig=<ID=chrchr1_GL456213_random,length=39340>
##contig=<ID=chrchr1_GL456221_random,length=206961>
##contig=<ID=chrchr2,length=182113224>
##contig=<ID=chrchr3,length=160039680>
##contig=<ID=chrchr4,length=156508116>
##contig=<ID=chrchr4_GL456216_random,length=66673>
##contig=<ID=chrchr4_JH584292_random,length=14945>
##contig=<ID=chrchr4_GL456350_random,length=227966>
##contig=<ID=chrchr4_JH584293_random,length=207968>
##contig=<ID=chrchr4_JH584294_random,length=191905>
##contig=<ID=chrchr4_JH584295_random,length=1976>
##contig=<ID=chrchr5,length=151834684>
##contig=<ID=chrchr5_JH584296_random,length=199368>
##contig=<ID=chrchr5_JH584297_random,length=205776>
##contig=<ID=chrchr5_JH584298_random,length=184189>
##contig=<ID=chrchr5_GL456354_random,length=195993>
##contig=<ID=chrchr5_JH584299_random,length=953012>
##contig=<ID=chrchr6,length=149736546>
##contig=<ID=chrchr7,length=145441459>
##contig=<ID=chrchr7_GL456219_random,length=175968>
##contig=<ID=chrchr8,length=129401213>
##contig=<ID=chrchr9,length=124595110>
##contig=<ID=chrchrM,length=16299>
##contig=<ID=chrchrX,length=171031299>
##contig=<ID=chrchrX_GL456233_random,length=336933>
##contig=<ID=chrchrY,length=91744698>
##contig=<ID=chrchrY_JH584300_random,length=182347>
##contig=<ID=chrchrY_JH584301_random,length=259875>
##contig=<ID=chrchrY_JH584302_random,length=155838>
##contig=<ID=chrchrY_JH584303_random,length=158099>
##contig=<ID=chrchrUn_GL456239,length=40056>
##contig=<ID=chrchrUn_GL456367,length=42057>
##contig=<ID=chrchrUn_GL456378,length=31602>
##contig=<ID=chrchrUn_GL456381,length=25871>
##contig=<ID=chrchrUn_GL456382,length=23158>
##contig=<ID=chrchrUn_GL456383,length=38659>
##contig=<ID=chrchrUn_GL456385,length=35240>
##contig=<ID=chrchrUn_GL456390,length=24668>
##contig=<ID=chrchrUn_GL456392,length=23629>
##contig=<ID=chrchrUn_GL456393,length=55711>
##contig=<ID=chrchrUn_GL456394,length=24323>
##contig=<ID=chrchrUn_GL456359,length=22974>
##contig=<ID=chrchrUn_GL456360,length=31704>
##contig=<ID=chrchrUn_GL456396,length=21240>
##contig=<ID=chrchrUn_GL456372,length=28664>
##contig=<ID=chrchrUn_GL456387,length=24685>
##contig=<ID=chrchrUn_GL456389,length=28772>
##contig=<ID=chrchrUn_GL456370,length=26764>
##contig=<ID=chrchrUn_GL456379,length=72385>
##contig=<ID=chrchrUn_GL456366,length=47073>
##contig=<ID=chrchrUn_GL456368,length=20208>
##contig=<ID=chrchrUn_JH584304,length=114452>
##source=HaplotypeCaller
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Ghr-0077
without ERC, it gave me a properly-looking vcf but led to another error at GenomicsDBImport.
VCF file without ERC option
##fileformat=VCFv4.2
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --output Ghr-0077_chr15_g.vcf --input Ghr-0077_chr15_bqsr.bam --reference mm10.fa --annotation-group StandardAnnotation --annotation-group StandardHCAnnotation --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvcf-gq-bands 8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-gq-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvcf-gq-bands 18 --gvcf-gq-bands 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --gvcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 --gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 --gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvcf-gq-bands 40 --gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq-bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands 51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-bands 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --indel-size-to-eliminate-in-ref-model 10 --use-alleles-trigger false --dont-trim-active-regions false --max-disc-ar-extension 25 --max-gga-ar-extension 300 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --recover-dangling-heads false --do-not-recover-dangling-branches false --min-dangling-branch-length 4 --consensus false --max-num-haplotypes-in-population 128 --error-correct-kmers false --min-pruning 2 --debug-graph-transformations false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --debug false --use-filtered-reads-for-annotations false --emit-ref-confidence NONE --bam-writer-type CALLED_HAPLOTYPES --disable-optimizations false --just-determine-active-regions false --dont-genotype false --dont-use-soft-clipped-bases false --capture-assembly-failure-bam false --error-correct-reads false --do-not-run-physical-phasing false --min-base-quality-score 10 --smith-waterman JAVA --use-new-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 10.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --genotyping-mode DISCOVERY --contamination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --disable-tool-default-read-filters false --minimum-mapping-quality 20",Version=4.0.0.0,Date="January 22, 2021 10:28:45 AM EST">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">
##contig=<ID=chrchr1,length=195471971>
##contig=<ID=chrchr10,length=130694993>
##contig=<ID=chrchr11,length=122082543>
##contig=<ID=chrchr12,length=120129022>
##contig=<ID=chrchr13,length=120421639>
##contig=<ID=chrchr14,length=124902244>
##contig=<ID=chr15,length=104043685>
##contig=<ID=chrchr16,length=98207768>
##contig=<ID=chrchr17,length=94987271>
##contig=<ID=chrchr18,length=90702639>
##contig=<ID=chrchr19,length=61431566>
##contig=<ID=chrchr1_GL456210_random,length=169725>
##contig=<ID=chrchr1_GL456211_random,length=241735>
##contig=<ID=chrchr1_GL456212_random,length=153618>
##contig=<ID=chrchr1_GL456213_random,length=39340>
##contig=<ID=chrchr1_GL456221_random,length=206961>
##contig=<ID=chrchr2,length=182113224>
##contig=<ID=chrchr3,length=160039680>
##contig=<ID=chrchr4,length=156508116>
##contig=<ID=chrchr4_GL456216_random,length=66673>
##contig=<ID=chrchr4_JH584292_random,length=14945>
##contig=<ID=chrchr4_GL456350_random,length=227966>
##contig=<ID=chrchr4_JH584293_random,length=207968>
##contig=<ID=chrchr4_JH584294_random,length=191905>
##contig=<ID=chrchr4_JH584295_random,length=1976>
##contig=<ID=chrchr5,length=151834684>
##contig=<ID=chrchr5_JH584296_random,length=199368>
##contig=<ID=chrchr5_JH584297_random,length=205776>
##contig=<ID=chrchr5_JH584298_random,length=184189>
##contig=<ID=chrchr5_GL456354_random,length=195993>
##contig=<ID=chrchr5_JH584299_random,length=953012>
##contig=<ID=chrchr6,length=149736546>
##contig=<ID=chrchr7,length=145441459>
##contig=<ID=chrchr7_GL456219_random,length=175968>
##contig=<ID=chrchr8,length=129401213>
##contig=<ID=chrchr9,length=124595110>
##contig=<ID=chrchrM,length=16299>
##contig=<ID=chrchrX,length=171031299>
##contig=<ID=chrchrX_GL456233_random,length=336933>
##contig=<ID=chrchrY,length=91744698>
##contig=<ID=chrchrY_JH584300_random,length=182347>
##contig=<ID=chrchrY_JH584301_random,length=259875>
##contig=<ID=chrchrY_JH584302_random,length=155838>
##contig=<ID=chrchrY_JH584303_random,length=158099>
##contig=<ID=chrchrUn_GL456239,length=40056>
##contig=<ID=chrchrUn_GL456367,length=42057>
##contig=<ID=chrchrUn_GL456378,length=31602>
##contig=<ID=chrchrUn_GL456381,length=25871>
##contig=<ID=chrchrUn_GL456382,length=23158>
##contig=<ID=chrchrUn_GL456383,length=38659>
##contig=<ID=chrchrUn_GL456385,length=35240>
##contig=<ID=chrchrUn_GL456390,length=24668>
##contig=<ID=chrchrUn_GL456392,length=23629>
##contig=<ID=chrchrUn_GL456393,length=55711>
##contig=<ID=chrchrUn_GL456394,length=24323>
##contig=<ID=chrchrUn_GL456359,length=22974>
##contig=<ID=chrchrUn_GL456360,length=31704>
##contig=<ID=chrchrUn_GL456396,length=21240>
##contig=<ID=chrchrUn_GL456372,length=28664>
##contig=<ID=chrchrUn_GL456387,length=24685>
##contig=<ID=chrchrUn_GL456389,length=28772>
##contig=<ID=chrchrUn_GL456370,length=26764>
##contig=<ID=chrchrUn_GL456379,length=72385>
##contig=<ID=chrchrUn_GL456366,length=47073>
##contig=<ID=chrchrUn_GL456368,length=20208>
##contig=<ID=chrchrUn_JH584304,length=114452>
##source=HaplotypeCaller
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Ghr-0077
chr15 3050192 . C T 234.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=4.998;ClippingRankSum=0.000;DP=46;ExcessHet=3.0103;FS=3.755;MLEAC=1;MLEAF=0.500;MQ=47.38;MQRankSum=-5.317;QD=5.10;ReadPosRankSum=2.220;SOR=1.417 GT:AD:DP:GQ:PL 0/1:35,11:46:99:263,0,1314
chr15 3050282 . A G 303.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=5.484;ClippingRankSum=0.000;DP=69;ExcessHet=3.0103;FS=11.290;MLEAC=1;MLEAF=0.500;MQ=50.89;MQRankSum=-6.722;QD=4.40;ReadPosRankSum=1.235;SOR=0.324 GT:AD:DP:GQ:PL 0/1:52,17:69:99:332,0,1972
chr15 3050374 . C T 666.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-3.558;ClippingRankSum=0.000;DP=80;ExcessHet=3.0103;FS=6.517;MLEAC=1;MLEAF=0.500;MQ=52.93;MQRankSum=-8.425;QD=8.33;ReadPosRankSum=1.970;SOR=0.261 GT:AD:DP:GQ:PL 0/1:54,26:80:99:695,0,2072
chr15 3050437 . T A 620.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=2.666;ClippingRankSum=0.000;DP=93;ExcessHet=3.0103;FS=14.133;MLEAC=1;MLEAF=0.500;MQ=53.97;MQRankSum=-8.983;QD=6.67;ReadPosRankSum=-2.935;SOR=1.119 GT:AD:DP:GQ:PL 0/1:66,27:93:99:649,0,2406
chr15 3050504 . G A 781.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-2.918;ClippingRankSum=0.000;DP=108;ExcessHet=3.0103;FS=14.658;MLEAC=1;MLEAF=0.500;MQ=54.19;MQRankSum=-8.424;QD=7.24;ReadPosRankSum=3.486;SOR=1.616 GT:AD:DP:GQ:PL 0/1:76,32:108:99:810,0,2887
chr15 3050605 . G A 136.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.043;ClippingRankSum=0.000;DP=79;ExcessHet=3.0103;FS=2.845;MLEAC=1;MLEAF=0.500;MQ=50.80;MQRankSum=-5.400;QD=1.73;ReadPosRankSum=-2.310;SOR=1.276 GT:AD:DP:GQ:PL 0/1:67,12:79:99:165,0,2646
chr15 3051101 . C T 580.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-4.787;ClippingRankSum=0.000;DP=74;ExcessHet=3.0103;FS=24.022;MLEAC=1;MLEAF=0.500;MQ=53.11;MQRankSum=-4.828;QD=7.85;ReadPosRankSum=-0.306;SOR=3.258 GT:AD:DP:GQ:PL 0/1:52,22:74:99:609,0,2013
chr15 3051120 . G A 719.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=6.766;ClippingRankSum=0.000;DP=82;ExcessHet=3.0103;FS=25.992;MLEAC=1;MLEAF=0.500;MQ=52.61;MQRankSum=-4.095;QD=8.78;ReadPosRankSum=2.528;SOR=3.331 GT:AD:DP:GQ:PL 0/1:57,25:82:99:748,0,1901
chr15 3051207 . G A 24.78 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.999;ClippingRankSum=0.000;DP=125;ExcessHet=3.0103;FS=1.067;MLEAC=1;MLEAF=0.500;MQ=45.18;MQRankSum=-4.390;QD=0.20;ReadPosRankSum=1.895;SOR=0.519 GT:AD:DP:GQ:PL 0/1:110,15:125:53:53,0,3553
chr15 3051289 . A T 1156.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=2.136;ClippingRankSum=0.000;DP=113;ExcessHet=3.0103;FS=7.741;MLEAC=1;MLEAF=0.500;MQ=38.15;MQRankSum=-0.091;QD=10.24;ReadPosRankSum=-0.361;SOR=0.553 GT:AD:DP:GQ:PL 0/1:72,41:113:99:1185,0,2139
chr15 3051417 . G A 2285.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=3.853;ClippingRankSum=0.000;DP=131;ExcessHet=3.0103;FS=1.382;MLEAC=1;MLEAF=0.500;MQ=51.46;MQRankSum=-4.294;QD=17.45;ReadPosRankSum=-0.392;SOR=0.569 GT:AD:DP:GQ:PL 0/1:73,58:131:99:2314,0,4696
chr15 3051422 . C T 2286.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=3.465;ClippingRankSum=0.000;DP=133;ExcessHet=3.0103;FS=0.641;MLEAC=1;MLEAF=0.500;MQ=52.30;MQRankSum=-3.903;QD=17.19;ReadPosRankSum=-0.160;SOR=0.623 GT:AD:DP:GQ:PL 0/1:74,59:133:99:2315,0,4739
chr15 3051456 . C T 1637.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=-6.985;ClippingRankSum=0.000;DP=130;ExcessHet=3.0103;FS=12.886;MLEAC=1;MLEAF=0.500;MQ=55.94;MQRankSum=-6.077;QD=12.60;ReadPosRankSum=0.167;SOR=0.964 GT:AD:DP:GQ:PL 0/1:73,57:130:99:1666,0,2738
chr15 3051546 . G A 1012.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=5.102;ClippingRankSum=0.000;DP=96;ExcessHet=3.0103;FS=14.892;MLEAC=1;MLEAF=0.500;MQ=55.24;MQRankSum=-5.830;QD=10.55;ReadPosRankSum=-0.394;SOR=1.704 GT:AD:DP:GQ:PL 0/1:63,33:96:99:1041,0,2089
chr15 3051576 . C T 708.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=5.392;ClippingRankSum=0.000;DP=68;ExcessHet=3.0103;FS=5.519;MLEAC=1;MLEAF=0.500;MQ=53.17;MQRankSum=-5.915;QD=10.42;ReadPosRankSum=-2.180;SOR=1.440 GT:AD:DP:GQ:PL 0/1:47,21:68:99:737,0,1944
chr15 3051581 . C T 708.77 .
Then I next ran:
gatk --java-options "-Xmx4g -Xms4g"
GenomicsDBImport -R mm10.fa\
-V Ghr-0008_chr15_g1.vcf.gz \
-V Ghr-0063_chr15_g1.vcf.gz \
-V Ghr-0077_chr15_g1.vcf.gz \
--genomicsdb-workspace-path database2 \
-L chr15
A USER ERROR has occurred: The list of input alleles must contain <NON_REF> as an allele but that is not the case at position 3050192; please use the Haplotype Caller with gVCF output to generate appropriate records
One possible concern is that I failed the chromosome re-annotation of the bam files and some chromosomes are labeled differently between reference and bam files, but I need the data only on chr15, which I properly relabelled.
Thank you very much for your help. I appreciate your comments and suggestions.
-
I am facing the same issue with Arabidopsis RNA seq data. I checked my bam file using ValidateSam command but it did not yield any error or warning. I am not able to understand the problem. I don't think there is any problem with bam file. I am just getting an empty VCF file everytime after using Haplotype Caller. I also used Mutect2 but it's also resulting an empty VCF file with just the headers.
Can anyone help me understand the problem? Any suggestions/comments will be of great help.
-
Hi Marie Saitou,
First I would recommend using a more recent version of GATK because there were quite a few issues in 4.0.0 that have since been resolved. We are currently in version 4.1.9.0, which has many great changes.
You can see more information about our releases here: https://github.com/broadinstitute/gatk/releases
Genevieve
-
Will try, thank you very much!
-
Hi Candace Grimes,
I'm looking at these screenshots and I'm not sure your issue is from the same cause, since the other two users said that they only had a header and no variants. It looks like there are variants in your file. So there may be a problem with your HaplotypeCaller or SelectVariants commands.
Can you open a new post to look into that issue?
Thank you,
Genevieve
-
Yes, I will. Thank you!
-
Hi!
We are having the same issue with HC and Mutect2. I tried to follow this thread but never saw how you resolved the issue. Can someone repeat the answer or point me in the right direction?
-
Here is the resolution for Candace Grimes' issue: https://gatk.broadinstitute.org/hc/en-us/community/posts/360077747051-Receiving-zero-variants-processed-after-HaplotypeCaller-and-Select-Variants
Please let me know if this is not what you were looking for.
-
It's not what I'm looking for. My vcfs are truly empty, with just a header.
My command:
java -Xmx4g -jar ${GATK_DIR}/gatk-package-4.1.8.0-local.jar HaplotypeCaller -ERC GVCF -R $REF -I ${BAMDIR}/515010.bam --tmp-dir ${TMPDIR} -O ${OUTPUTDIR}/515010.a.vcf.gz --intervals $WORK2/references/xaa.bed
The output:
15:05:31.628 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/work2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 21, 2021 3:05:32 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:05:32.497 INFO HaplotypeCaller - ------------------------------------------------------------
15:05:32.497 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.8.0
15:05:32.497 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
15:05:32.497 INFO HaplotypeCaller - Executing as sprakash@c205-003.frontera.tacc.utexas.edu on Linux v3.10.0-1127.19.1.el7.x86_64 amd64
15:05:32.498 INFO HaplotypeCaller - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_262-b10
15:05:32.498 INFO HaplotypeCaller - Start Date/Time: June 21, 2021 3:05:31 PM CDT
15:05:32.498 INFO HaplotypeCaller - ------------------------------------------------------------
15:05:32.498 INFO HaplotypeCaller - ------------------------------------------------------------
15:05:32.498 INFO HaplotypeCaller - HTSJDK Version: 2.22.0
15:05:32.498 INFO HaplotypeCaller - Picard Version: 2.22.8
15:05:32.498 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:05:32.498 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:05:32.498 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:05:32.498 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:05:32.498 INFO HaplotypeCaller - Deflater: IntelDeflater
15:05:32.498 INFO HaplotypeCaller - Inflater: IntelInflater
15:05:32.498 INFO HaplotypeCaller - GCS max retries/reopens: 20
15:05:32.498 INFO HaplotypeCaller - Requester pays: disabled
15:05:32.498 INFO HaplotypeCaller - Initializing engine
15:05:33.092 INFO FeatureManager - Using codec BEDCodec to read file file:///work2/03437/sprakash/lonestar/references/xaa.bed
15:05:33.343 INFO IntervalArgumentCollection - Processing 6198806 bp from intervals
15:05:33.394 INFO HaplotypeCaller - Done initializing engine
15:05:33.395 INFO HaplotypeCallerEngine - Tool is in reference confidence mode and the annotation, the following changes will be made to any specified annotations: 'StrandBiasBySample' will be enabled. 'ChromosomeCounts', 'FisherStrand', 'StrandOddsRatio' and 'QualByDepth' annotations have been disabled
15:05:33.413 INFO HaplotypeCallerEngine - Standard Emitting and Calling confidence set to 0.0 for reference-model confidence output
15:05:33.413 INFO HaplotypeCallerEngine - All sites annotated with PLs forced to true for reference-model confidence output
15:05:33.424 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/work2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
15:05:33.456 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/work2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
15:05:33.546 INFO IntelPairHmm - Using CPU-supported AVX-512 instructions
15:05:33.546 INFO IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
15:05:33.546 INFO IntelPairHmm - Available threads: 1
15:05:33.546 INFO IntelPairHmm - Requested threads: 4
15:05:33.546 WARN IntelPairHmm - Using 1 available threads, but 4 were requested
15:05:33.546 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
15:05:33.571 INFO ProgressMeter - Starting traversal
15:05:33.571 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
15:05:43.571 INFO ProgressMeter - 1:7895855 0.2 2150 12900.0
15:05:53.572 INFO ProgressMeter - 1:17599820 0.3 4630 13889.3
15:06:03.602 INFO ProgressMeter - 1:27995085 0.5 7360 14704.8
15:06:13.625 INFO ProgressMeter - 1:40661208 0.7 10240 15339.3
15:06:23.626 INFO ProgressMeter - 1:53723984 0.8 13000 15582.9
15:06:33.645 INFO ProgressMeter - 1:91297263 1.0 16110 16090.2
15:06:43.661 INFO ProgressMeter - 1:117127741 1.2 19190 16427.7
15:06:53.729 INFO ProgressMeter - 1:152282107 1.3 22310 16699.5
15:07:03.754 INFO ProgressMeter - 1:157789896 1.5 24840 16526.4
15:07:13.766 INFO ProgressMeter - 1:174670118 1.7 27640 16551.7
15:07:23.816 INFO ProgressMeter - 1:201982273 1.8 30520 16610.3
15:07:33.822 INFO ProgressMeter - 1:222711985 2.0 33350 16640.2
15:07:43.826 INFO ProgressMeter - 1:241846767 2.2 36230 16688.8
15:07:50.370 INFO HaplotypeCaller - 684519 read(s) filtered by: MappingQualityReadFilter
0 read(s) filtered by: MappingQualityAvailableReadFilter
0 read(s) filtered by: MappedReadFilter
12170 read(s) filtered by: NotSecondaryAlignmentReadFilter
651896 read(s) filtered by: NotDuplicateReadFilter
0 read(s) filtered by: PassesVendorQualityCheckReadFilter
0 read(s) filtered by: NonZeroReferenceLengthAlignmentReadFilter
0 read(s) filtered by: GoodCigarReadFilter
0 read(s) filtered by: WellformedReadFilter
1348585 total reads filtered
15:07:50.370 INFO ProgressMeter - 2:11332470 2.3 38024 16677.3
15:07:50.370 INFO ProgressMeter - Traversal complete. Processed 38024 total regions in 2.3 minutes.
15:07:50.383 INFO VectorLoglessPairHMM - Time spent in setup for JNI call : 0.0
15:07:50.383 INFO PairHMM - Total compute time in PairHMM computeLogLikelihoods() : 0.0
15:07:50.383 INFO SmithWatermanAligner - Total compute time in java Smith-Waterman : 0.00 sec
15:07:50.383 INFO HaplotypeCaller - Shutting down engine
[June 21, 2021 3:07:50 PM CDT] org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCaller done. Elapsed time: 2.32 minutes.
Runtime.totalMemory()=1734868992Essentially, all of my reads are getting filtered.
I ran ValidateSamFile on the target .bam and got this:
java -Xmx4g -jar ${GATK_DIR}/gatk-package-4.1.8.0-local.jar ValidateSamFile --INPUT $WORK2/apps/baf_analysis/515010.bam --MODE SUMMARY
15:11:05.400 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/work2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Mon Jun 21 15:11:05 CDT 2021] ValidateSamFile --INPUT /work2/03437/sprakash/lonestar/apps/baf_analysis/515010.bam --MODE SUMMARY --MAX_OUTPUT 100 --IGNORE_WARNINGS false --VALIDATE_INDEX true --INDEX_VALIDATION_STRINGENCY EXHAUSTIVE --IS_BISULFITE_SEQUENCED false --MAX_OPEN_TEMP_FILES 8000 --SKIP_MATE_VALIDATION false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Jun 21, 2021 3:11:05 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Mon Jun 21 15:11:05 CDT 2021] Executing as sprakash@c205-003.frontera.tacc.utexas.edu on Linux 3.10.0-1127.19.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_262-b10; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.8.0
WARNING 2021-06-21 15:11:05 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
INFO 2021-06-21 15:11:57 SamFileValidator Validated Read 10,000,000 records. Elapsed time: 00:00:51s. Time for last 10,000,000: 51s. Last read position: 1:197,234,413
INFO 2021-06-21 15:12:50 SamFileValidator Validated Read 20,000,000 records. Elapsed time: 00:01:44s. Time for last 10,000,000: 52s. Last read position: 2:220,497,821
INFO 2021-06-21 15:13:43 SamFileValidator Validated Read 30,000,000 records. Elapsed time: 00:02:38s. Time for last 10,000,000: 53s. Last read position: 4:68,919,521
INFO 2021-06-21 15:14:37 SamFileValidator Validated Read 40,000,000 records. Elapsed time: 00:03:32s. Time for last 10,000,000: 54s. Last read position: 6:33,281,603
INFO 2021-06-21 15:15:31 SamFileValidator Validated Read 50,000,000 records. Elapsed time: 00:04:25s. Time for last 10,000,000: 53s. Last read position: 7:149,076,451
INFO 2021-06-21 15:16:26 SamFileValidator Validated Read 60,000,000 records. Elapsed time: 00:05:21s. Time for last 10,000,000: 55s. Last read position: 10:409,196
INFO 2021-06-21 15:17:20 SamFileValidator Validated Read 70,000,000 records. Elapsed time: 00:06:14s. Time for last 10,000,000: 53s. Last read position: 11:87,030,402
INFO 2021-06-21 15:18:15 SamFileValidator Validated Read 80,000,000 records. Elapsed time: 00:07:09s. Time for last 10,000,000: 54s. Last read position: 13:103,387,827
INFO 2021-06-21 15:19:11 SamFileValidator Validated Read 90,000,000 records. Elapsed time: 00:08:06s. Time for last 10,000,000: 56s. Last read position: 16:3,652,424
INFO 2021-06-21 15:20:05 SamFileValidator Validated Read 100,000,000 records. Elapsed time: 00:08:59s. Time for last 10,000,000: 53s. Last read position: 17:56,272,511
INFO 2021-06-21 15:21:00 SamFileValidator Validated Read 110,000,000 records. Elapsed time: 00:09:54s. Time for last 10,000,000: 55s. Last read position: 19:45,377,091
INFO 2021-06-21 15:21:57 SamFileValidator Validated Read 120,000,000 records. Elapsed time: 00:10:52s. Time for last 10,000,000: 57s. Last read position: X:591,820
## HISTOGRAM java.lang.String
Error Type Count
ERROR:INVALID_PLATFORM_VALUE 2
ERROR:MATES_ARE_SAME_END 660
ERROR:MISMATCH_FLAG_MATE_NEG_STRAND 948
ERROR:MISMATCH_FLAG_MATE_UNMAPPED 576
ERROR:MISMATCH_MATE_CIGAR_STRING 948[Mon Jun 21 15:23:18 CDT 2021] picard.sam.ValidateSamFile done. Elapsed time: 12.22 minutes.
Runtime.totalMemory()=1766850560
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Tool returned:
3Any suggestions?
-
Follow up:
I checked the .bam file that I used as target for the previous command in igv. This .bam was created from paired end fastq files using the recommended pipeline in GATK 4.1.8, which I had used previously with success. However, the visualized alignment makes no sense. This explains why the sequences were filtered. What is going on? Did my bwa step fail?
-
Siddharth Prakash yes, I would recommend going back to your alignment and pre-processing steps to check for errors there. Make sure you are keeping the reference consistent!
-
I went back to my alignment and preprocessing steps and found no errors. I confirmed that my reference hs37d5 is consistent. This is my workflow:
module use /work2/03437/sprakash/lonestar/apps/modulefiles; module load bwa/ctr-0.7.17--pl5.22.0_2;module load tacc-singularity
/3.7.2;module load cutadapt/ctr-3.1--py37h14c3975_1; java -Xmx8g -jar /work2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1
.8.0-local.jar FastqToSam -F1 /corral-secure/uth/Sex-Chromosome-Loss/BGI/fastq/511458/V300087608_L02_HUMftlX009649-670_1.fq.gz
-F2 /corral-secure/uth/Sex-Chromosome-Loss/BGI/fastq/511458/V300087608_L02_HUMftlX009649-670_2.fq.gz --TMP_DIR /scratch1/03437/
sprakash/tmp -SM 511458 -RG 670 -O /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.unmapped.
bam; cutadapt -a AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA -A AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG --minimum-length 1 --buffer-siz
e=10000000 --interleaved -u 7 -U 7 -j 0 -o /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.c
ut.fq.gz /corral-secure/uth/Sex-Chromosome-Loss/BGI/fastq/511458/V300087608_L02_HUMftlX009649-670_1.fq.gz /corral-secure/uth/Se
x-Chromosome-Loss/BGI/fastq/511458/V300087608_L02_HUMftlX009649-670_2.fq.gz; bwa mem -p -M -t 136 /work2/03437/sprakash/lonesta
r/references/hs37d5.fa /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.cut.fq.gz > /corral-s
ecure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.aligned.sam; java -Xmx8g -jar /work2/03437/sprakash/l
onestar/apps/gatk/gatk-package-4.1.8.0-local.jar MergeBamAlignment -ALIGNED /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V
300087608_L02_HUMftlX009649-670.aligned.sam -UNMAPPED /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX0
09649-670.unmapped.bam --TMP_DIR /scratch1/03437/sprakash/tmp -R /work2/03437/sprakash/lonestar/references/hs37d5.fa -CREATE_IN
DEX true -O /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.merged.bam; java -Xmx8g -jar /wo
rk2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar AddOrReplaceReadGroups -I /corral-secure/uth/Sex-Chromosom
e-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.merged.bam --RGID 670 --RGLB HUMftlX009649 --RGPL NIMBLEGEN --RGPU V30008760
8 --RGSM 511458 --TMP_DIR /scratch1/03437/sprakash/tmp -O /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMf
tlX009649-670.fix_read_group.bam; java -Xmx8g -jar /work2/03437/sprakash/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar Mark
Duplicates -I /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.fix_read_group.bam -M /corral-
secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.marked_dup_metrics.txt --TMP_DIR /scratch1/03437/spr
akash/tmp -O /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.marked_dup.bam; rm /corral-secu
re/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.fix_read_group.bam; java -Xmx8g -jar /work2/03437/spraka
sh/lonestar/apps/gatk/gatk-package-4.1.8.0-local.jar SortSam -I /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L0
2_HUMftlX009649-670.marked_dup.bam --SORT_ORDER coordinate --TMP_DIR /scratch1/03437/sprakash/tmp -O /corral-secure/uth/Sex-Chr
omosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.sorted.bam; java -Xmx8g -jar /work2/03437/sprakash/lonestar/apps/gatk/
gatk-package-4.1.8.0-local.jar BaseRecalibrator -I /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX0096
49-670.sorted.bam -R /work2/03437/sprakash/lonestar/references/hs37d5.fa --known-sites /work2/03437/sprakash/lonestar/reference
s/dbSNP.151.vcf.gz --tmp-dir /scratch1/03437/sprakash/tmp -O /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_H
UMftlX009649-670.recal.table; TMPDIR=/scratch1/03437/sprakash/tmp java -Xmx8g -jar /work2/03437/sprakash/lonestar/apps/gatk/gat
k-package-4.1.8.0-local.jar ApplyBQSR -I /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.sor
ted.bam -R /work2/03437/sprakash/lonestar/references/hs37d5.fa --bqsr-recal-file /corral-secure/uth/Sex-Chromosome-Loss/BGI/out
put/V300087608_L02_HUMftlX009649-670.recal.table -O /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009
649-670.bamI also viewed the header of the aberrant .bam file. I can't see anything unusual:
@SQ SN:1 LN:249250621 M5:1b22b98cdeb4a9304cb5d48026a85128 UR:file:C:\GATK\hs37d5.fa
@SQ SN:2 LN:243199373 M5:a0d9851da00400dec1098a9255ac712e UR:file:C:\GATK\hs37d5.fa
@SQ SN:3 LN:198022430 M5:fdfd811849cc2fadebc929bb925902e5 UR:file:C:\GATK\hs37d5.fa
@SQ SN:4 LN:191154276 M5:23dccd106897542ad87d2765d28a19a1 UR:file:C:\GATK\hs37d5.fa
@SQ SN:5 LN:180915260 M5:0740173db9ffd264d728f32784845cd7 UR:file:C:\GATK\hs37d5.fa
@SQ SN:6 LN:171115067 M5:1d3a93a248d92a729ee764823acbbc6b UR:file:C:\GATK\hs37d5.fa
@SQ SN:7 LN:159138663 M5:618366e953d6aaad97dbe4777c29375e UR:file:C:\GATK\hs37d5.fa
@SQ SN:8 LN:146364022 M5:96f514a9929e410c6651697bded59aec UR:file:C:\GATK\hs37d5.fa
@SQ SN:9 LN:141213431 M5:3e273117f15e0a400f01055d9f393768 UR:file:C:\GATK\hs37d5.fa
@SQ SN:10 LN:135534747 M5:988c28e000e84c26d552359af1ea2e1d UR:file:C:\GATK\hs37d5.fa
@SQ SN:11 LN:135006516 M5:98c59049a2df285c76ffb1c6db8f8b96 UR:file:C:\GATK\hs37d5.fa
@SQ SN:12 LN:133851895 M5:51851ac0e1a115847ad36449b0015864 UR:file:C:\GATK\hs37d5.fa
@SQ SN:13 LN:115169878 M5:283f8d7892baa81b510a015719ca7b0b UR:file:C:\GATK\hs37d5.fa
@SQ SN:14 LN:107349540 M5:98f3cae32b2a2e9524bc19813927542e UR:file:C:\GATK\hs37d5.fa
@SQ SN:15 LN:102531392 M5:e5645a794a8238215b2cd77acb95a078 UR:file:C:\GATK\hs37d5.fa
@SQ SN:16 LN:90354753 M5:fc9b1a7b42b97a864f56b348b06095e6 UR:file:C:\GATK\hs37d5.fa
@SQ SN:17 LN:81195210 M5:351f64d4f4f9ddd45b35336ad97aa6de UR:file:C:\GATK\hs37d5.fa
@SQ SN:18 LN:78077248 M5:b15d4b2d29dde9d3e4f93d1d0f2cbc9c UR:file:C:\GATK\hs37d5.fa
@SQ SN:19 LN:59128983 M5:1aacd71f30db8e561810913e0b72636d UR:file:C:\GATK\hs37d5.fa
@SQ SN:20 LN:63025520 M5:0dec9660ec1efaaf33281c0d5ea2560f UR:file:C:\GATK\hs37d5.fa
@SQ SN:21 LN:48129895 M5:2979a6085bfe28e3ad6f552f361ed74d UR:file:C:\GATK\hs37d5.fa
@SQ SN:22 LN:51304566 M5:a718acaa6135fdca8357d5bfe94211dd UR:file:C:\GATK\hs37d5.fa
@SQ SN:X LN:155270560 M5:7e0e2e580297b7764e31dbc80c2540dd UR:file:C:\GATK\hs37d5.fa
@SQ SN:Y LN:59373566 M5:1fa3474750af0948bdf97d5a0ee52e51 UR:file:C:\GATK\hs37d5.fa
@SQ SN:MT LN:16569 M5:c68f52674c9fb33aef52dcf399755519 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000207.1 LN:4262 M5:f3814841f1939d3ca19072d9e89f3fd7 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000226.1 LN:15008 M5:1c1b2cd1fccbc0a99b6a447fa24d1504 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000229.1 LN:19913 M5:d0f40ec87de311d8e715b52e4c7062e1 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000231.1 LN:27386 M5:ba8882ce3a1efa2080e5d29b956568a4 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000210.1 LN:27682 M5:851106a74238044126131ce2a8e5847c UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000239.1 LN:33824 M5:99795f15702caec4fa1c4e15f8a29c07 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000235.1 LN:34474 M5:118a25ca210cfbcdfb6c2ebb249f9680 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000201.1 LN:36148 M5:dfb7e7ec60ffdcb85cb359ea28454ee9 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000247.1 LN:36422 M5:7de00226bb7df1c57276ca6baabafd15 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000245.1 LN:36651 M5:89bc61960f37d94abf0df2d481ada0ec UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000197.1 LN:37175 M5:6f5efdd36643a9b8c8ccad6f2f1edc7b UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000203.1 LN:37498 M5:96358c325fe0e70bee73436e8bb14dbd UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000246.1 LN:38154 M5:e4afcd31912af9d9c2546acf1cb23af2 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000249.1 LN:38502 M5:1d78abec37c15fe29a275eb08d5af236 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000196.1 LN:38914 M5:d92206d1bb4c3b4019c43c0875c06dc0 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000248.1 LN:39786 M5:5a8e43bec9be36c7b49c84d585107776 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000244.1 LN:39929 M5:0996b4475f353ca98bacb756ac479140 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000238.1 LN:39939 M5:131b1efc3270cc838686b54e7c34b17b UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000202.1 LN:40103 M5:06cbf126247d89664a4faebad130fe9c UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000234.1 LN:40531 M5:93f998536b61a56fd0ff47322a911d4b UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000232.1 LN:40652 M5:3e06b6741061ad93a8587531307057d8 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000206.1 LN:41001 M5:43f69e423533e948bfae5ce1d45bd3f1 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000240.1 LN:41933 M5:445a86173da9f237d7bcf41c6cb8cc62 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000236.1 LN:41934 M5:fdcd739913efa1fdc64b6c0cd7016779 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000241.1 LN:42152 M5:ef4258cdc5a45c206cea8fc3e1d858cf UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000243.1 LN:43341 M5:cc34279a7e353136741c9fce79bc4396 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000242.1 LN:43523 M5:2f8694fc47576bc81b5fe9e7de0ba49e UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000230.1 LN:43691 M5:b4eb71ee878d3706246b7c1dbef69299 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000237.1 LN:45867 M5:e0c82e7751df73f4f6d0ed30cdc853c0 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000233.1 LN:45941 M5:7fed60298a8d62ff808b74b6ce820001 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000204.1 LN:81310 M5:efc49c871536fa8d79cb0a06fa739722 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000198.1 LN:90085 M5:868e7784040da90d900d2d1b667a1383 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000208.1 LN:92689 M5:aa81be49bf3fe63a79bdc6a6f279abf6 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000191.1 LN:106433 M5:d75b436f50a8214ee9c2a51d30b2c2cc UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000227.1 LN:128374 M5:a4aead23f8053f2655e468bcc6ecdceb UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000228.1 LN:129120 M5:c5a17c97e2c1a0b6a9cc5a6b064b714f UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000214.1 LN:137718 M5:46c2032c37f2ed899eb41c0473319a69 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000221.1 LN:155397 M5:3238fb74ea87ae857f9c7508d315babb UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000209.1 LN:159169 M5:f40598e2a5a6b26e84a3775e0d1e2c81 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000218.1 LN:161147 M5:1d708b54644c26c7e01c2dad5426d38c UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000220.1 LN:161802 M5:fc35de963c57bf7648429e6454f1c9db UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000213.1 LN:164239 M5:9d424fdcc98866650b58f004080a992a UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000211.1 LN:166566 M5:7daaa45c66b288847b9b32b964e623d3 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000199.1 LN:169874 M5:569af3b73522fab4b40995ae4944e78e UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000217.1 LN:172149 M5:6d243e18dea1945fb7f2517615b8f52e UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000216.1 LN:172294 M5:642a232d91c486ac339263820aef7fe0 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000215.1 LN:172545 M5:5eb3b418480ae67a997957c909375a73 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000205.1 LN:174588 M5:d22441398d99caf673e9afb9a1908ec5 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000219.1 LN:179198 M5:f977edd13bac459cb2ed4a5457dba1b3 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000224.1 LN:179693 M5:d5b2fc04f6b41b212a4198a07f450e20 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000223.1 LN:180455 M5:399dfa03bf32022ab52a846f7ca35b30 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000195.1 LN:182896 M5:5d9ec007868d517e73543b005ba48535 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000212.1 LN:186858 M5:563531689f3dbd691331fd6c5730a88b UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000222.1 LN:186861 M5:6fe9abac455169f50470f5a6b01d0f59 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000200.1 LN:187035 M5:75e4c8d17cd4addf3917d1703cacaf25 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000193.1 LN:189789 M5:dbb6e8ece0b5de29da56601613007c2a UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000194.1 LN:191469 M5:6ac8f815bf8e845bb3031b73f812c012 UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000225.1 LN:211173 M5:63945c3e6962f28ffd469719a747e73c UR:file:C:\GATK\hs37d5.fa
@SQ SN:GL000192.1 LN:547496 M5:325ba9e808f669dfeee210fdd7b470ac UR:file:C:\GATK\hs37d5.fa
@SQ SN:NC_007605 LN:171823 M5:6743bd63b3ff2b5b8985d8933c53290a UR:file:C:\GATK\hs37d5.fa
@SQ SN:hs37d5 LN:35477943 M5:5b6a4b3a81a2d3c134b7d14bf6ad39f1 UR:file:C:\GATK\hs37d5.fa
@RG ID:670 LB:HUMftlX009649 PL:NIMBLEGEN SM:511458 PU:V300087608
@PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:/usr/local/bin/bwa mem -p -M -t 136 /work2/03437/sprakash/lonestar/references/hs37d5.fa /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.cut.fq.gz
@PG ID:MarkDuplicates VN:Version:4.1.8.0 CL:MarkDuplicates --INPUT /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.fix_read_group.bam --OUTPUT /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.marked_dup.bam --METRICS_FILE /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.marked_dup_metrics.txt --TMP_DIR /scratch1/03437/sprakash/tmp --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --REMOVE_DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --OPTICAL_DUPLICATE_PIXEL_DISTANCE 100 --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false PN:MarkDuplicates
PP:bwa
@PG ID:GATK ApplyBQSR VN:4.1.8.0 CL:ApplyBQSR --output /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.bam --bqsr-recal-file /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.recal.table --input /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L01_HUMftlX009649-670.sorted.bam --reference /work2/03437/sprakash/lonestar/references/hs37d5.fa --preserve-qscores-less-than 6 --use-original-qualities false --quantize-quals 0 --round-down-quantized false --emit-original-quals false --global-qscore-prior -1.0 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false
What do you suggest?
-
Siddharth Prakash is your alignment still looking similar to the image you shared above? If so, you will not be able to get results with HaplotypeCaller.
You can take a look at your bam/sam file before and after each pre-processing step in IGV to figure out when the alignment starts to have issues.
-
Hi Genevieve,
Yes, I reran the pipeline and checked the preprocessed bams. They all look the same as what I posted. I'm calling bwa/0.7.17. Any suggestions?
-
You are using -p argument with bwa mem which:
Assume the first input query file is interleaved paired-end FASTA/Q. See the command description for details.
Is your file /corral-secure/uth/Sex-Chromosome-Loss/BGI/output/V300087608_L02_HUMftlX009649-670.cut.fq.gz properly interleaved paired-end reads? I noticed it has "cut" in the name, is it a subset of the reads? If so, that could be how you lost the mates.
-
Yes, my output is interleaved paired-end. Output is from cutadapt 3.1.
-
For some reason your reads are not aligning properly with bwa mem and you'll have to look closer into your data to determine where this issue is coming from.
-
I am stuck for two reasons:
1. When I ran this command 6 months ago it worked just fine. I used bwa 0.7.16 instead of 0.7.17 then. I just reran the same .fastq files that I had successfully aligned earlier and got the same mess of an output.
2. I don't know how to troubleshoot the issue other than to go back to bwa 0.7.16. Do you have any suggestions?
-
We just provide support for GATK issues on this forum, since this sounds like a bwa issue I would recommend reaching out to the bwa developers. You could also post this on biostars.
I'll see if anyone knows anything about this issue in my team but I can't guarantee I'll be able to provide answers.
Please sign in to leave a comment.
18 comments