HRun not found in vcf header or INFO
Answered
a) GATK version used: 4.1.2.0
Hi, I'm following GATK germline short variant discovery pipeline (https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-) to call variants with mostly whole exome data. However, I couldn't find the HRun (Largest Contiguous Homopolymer Run of Variant Allele In Either Direction) in my VCF header or INFO. Just wonder why it happened, how HRun is calculated and is there anyway to add this info with my current VCF?
The header of my VCF:
##fileformat=VCFv4.2
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
##FILTER=<ID=LowQual,Description="Low quality">
##FILTER=<ID=VQSRTrancheINDEL90.00to91.00,Description="Truth sensitivity tranche level for INDEL model at VQS Lod: 1.3823 <= x < 1.5199">
##FILTER=<ID=VQSRTrancheINDEL91.00to92.00,Description="Truth sensitivity tranche level for INDEL model at VQS Lod: 1.2198 <= x < 1.3823">
##FILTER=<ID=VQSRTrancheINDEL92.00to95.00,Description="Truth sensitivity tranche level for INDEL model at VQS Lod: 0.8802 <= x < 1.2198">
##FILTER=<ID=VQSRTrancheINDEL95.00to99.00,Description="Truth sensitivity tranche level for INDEL model at VQS Lod: -0.1415 <= x < 0.8802">
##FILTER=<ID=VQSRTrancheINDEL99.00to99.50,Description="Truth sensitivity tranche level for INDEL model at VQS Lod: -0.6956 <= x < -0.1415">
##FILTER=<ID=VQSRTrancheINDEL99.50to100.00+,Description="Truth sensitivity tranche level for INDEL model at VQS Lod < -174.4657">
##FILTER=<ID=VQSRTrancheINDEL99.50to100.00,Description="Truth sensitivity tranche level for INDEL model at VQS Lod: -174.4657 <= x < -0.6956">
##FILTER=<ID=VQSRTrancheSNP99.00to99.50,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -0.1739 <= x < 0.4909">
##FILTER=<ID=VQSRTrancheSNP99.50to99.90,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -3.5031 <= x < -0.1739">
##FILTER=<ID=VQSRTrancheSNP99.90to99.95,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -8.9646 <= x < -3.5031">
##FILTER=<ID=VQSRTrancheSNP99.95to100.00+,Description="Truth sensitivity tranche level for SNP model at VQS Lod < -34285.4958">
##FILTER=<ID=VQSRTrancheSNP99.95to100.00,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -34285.4958 <= x < -8.9646">
##FORMAT=<ID=AB,Number=1,Type=Float,Description="Allele balance for each het genotype">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=MQ0,Number=1,Type=Integer,Description="Number of Mapping Quality Zero Reads per sample">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[/data/analysis/DarnellR/Project_CCDG_13607_B01_GRM_WGS/Sample_HG02262/analysis/HG02262.final.bam] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[BadCigar] disable_read_filter=[] intervals=[chr1] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/gpfs/internal/sweng/production/Resources/GRCh38_1000genomes/GRCh38_full_analysis_set_plus_decoy_hla.fa nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=LINEAR variant_index_parameter=128000 reference_window_stop=0 logging_level=INFO log_to_file=null help=false version=false likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN dbsnp=(RodBinding name= source=UNBOUND) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[AlleleBalanceBySample, DepthPerAlleleBySample, DepthPerSampleHC, InbreedingCoeff, MappingQualityZeroBySample, StrandBiasBySample, Coverage, FisherStrand, HaplotypeScore, MappingQualityRankSumTest, MappingQualityZero, QualByDepth, RMSMappingQuality, ReadPosRankSumTest, VariantType, StrandBiasBySample] excludeAnnotation=[ChromosomeCounts, FisherStrand, StrandOddsRatio, QualByDepth] group=[Standard, StandardHCAnnotation] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=GVCF bamOutput=null bamWriterType=CALLED_HAPLOTYPES disableOptimizations=false annotateNDA=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 standard_min_confidence_threshold_for_calling=-0.0 standard_min_confidence_threshold_for_emitting=-0.0 max_alternate_alleles=6 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=true gcpHMM=10 pair_hmm_implementation=VECTOR_LOGLESS_CACHING pair_hmm_sub_implementation=ENABLE_ALL always_load_vector_logless_PairHMM_lib=false phredScaledGlobalReadMismappingRate=45 noFpga=false sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false maxNumHaplotypesInPopulation=128 errorCorrectKmers=false minPruning=2 debugGraphTransformations=false allowCyclesInKmerGraphToGeneratePaths=false graphOutput=null kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 GVCFGQBands=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 70, 80, 90, 99] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 includeUmappedReads=false useAllelesTrigger=false doNotRunPhysicalPhasing=false keepRG=null justDetermineActiveRegions=false dontGenotype=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false errorCorrectReads=false pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=10000 minReadsPerAlignmentStart=10 mergeVariantsViaLD=false activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false",Date="Mon Dec 31 10:31:35 EST 2018",Epoch=1546270295854,Version=3.5-0-g36282e4>
##GATKCommandLine=<ID=ApplyVQSR,CommandLine="ApplyVQSR --recal-file /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_broad_Macrogen_all/combined_gVCF_Psomagen_broad_Macrogen_1KG/VQSR/INDEL/AUTOSOMAL.vqsr.indel.recal --tranches-file /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_broad_Macrogen_all/combined_gVCF_Psomagen_broad_Macrogen_1KG/VQSR/INDEL/AUTOSOMAL.vqsr.indel.tranches --output SNP.recal_99.0.INDEL.recal_90.0.vcf.gz --truth-sensitivity-filter-level 90.0 --mode INDEL --variant /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_broad_Macrogen_all/combined_gVCF_Psomagen_broad_Macrogen_1KG/VQSR/SNP/SNP.recalibrated_99.0.vcf.gz --reference /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_2020/GATK/ref/Homo_sapiens_assembly38.fasta --create-output-variant-index true --use-allele-specific-annotations false --ignore-all-filters false --exclude-filtered false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false",Version="4.1.2.0",Date="August 25, 2021 5:41:33 PM EDT">
##GATKCommandLine=<ID=ApplyVQSR,CommandLine="ApplyVQSR --recal-file /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_broad_Macrogen_all/combined_gVCF_Psomagen_broad_Macrogen_1KG/VQSR/SNP/AUTOSOMAL.vqsr.snp.recal --tranches-file /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_broad_Macrogen_all/combined_gVCF_Psomagen_broad_Macrogen_1KG/VQSR/SNP/AUTOSOMAL.vqsr.snp.tranches --output /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_broad_Macrogen_all/combined_gVCF_Psomagen_broad_Macrogen_1KG/VQSR/SNP/SNP.recalibrated_99.0.vcf.gz --truth-sensitivity-filter-level 99.0 --mode SNP --variant /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_broad_Macrogen_all/combined_gVCF_Psomagen_broad_Macrogen_1KG/variant_call_all/joint_vcf/AUTOSOMAL.jointcalls.vcf.gz --reference /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_2020/GATK/ref/Homo_sapiens_assembly38.fasta --create-output-variant-index true --use-allele-specific-annotations false --ignore-all-filters false --exclude-filtered false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false",Version="4.1.2.0",Date="August 25, 2021 5:10:52 PM EDT">
##GATKCommandLine=<ID=CombineGVCFs,CommandLine="CombineGVCFs --output chr1.ALL.combined.cohort.g.vcf.gz --variant /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_broad_Macrogen_all/combined_gVCF_Psomagen_broad_Macrogen_1KG/gvcf_samples/WES_AC_1KG_cohort_final.list --intervals /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_broad_Macrogen_all/target_region/chr1.interval.bed --interval-padding 10 --reference /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_2020/GATK/ref/Homo_sapiens_assembly38.fasta --annotation-group StandardAnnotation --annotation-group AS_StandardAnnotation --annotation-group StandardHCAnnotation --convert-to-base-pair-resolution false --break-bands-at-multiples-of 0 --input-is-somatic false --drop-somatic-filtering-annotations false --ignore-variants-starting-outside-interval false --interval-set-rule UNION --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --disable-tool-default-annotations false --enable-all-annotations false",Version="4.1.2.0",Date="August 20, 2021 4:12:30 PM EDT">
##GATKCommandLine=<ID=GenotypeGVCFs,CommandLine="GenotypeGVCFs --output /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_broad_Macrogen_all/combined_gVCF_Psomagen_broad_Macrogen_1KG/variant_call_all/joint_vcf/chr1.gatk4.jointcalls.vcf --max-alternate-alleles 20 --variant /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_broad_Macrogen_all/combined_gVCF_Psomagen_broad_Macrogen_1KG/combineVCF/chr1.ALL.combined.cohort.g.vcf.gz --intervals /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_broad_Macrogen_all/target_region/chr1.interval.bed --interval-padding 10 --reference /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_2020/GATK/ref/Homo_sapiens_assembly38.fasta --annotation-group StandardAnnotation --annotation-group AS_StandardAnnotation --annotation-group StandardHCAnnotation --include-non-variant-sites false --merge-input-intervals false --input-is-somatic false --tumor-lod-to-emit 3.5 --allele-fraction-error 0.001 --keep-combined-raw-annotations false --use-new-qual-calculator true --use-old-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --only-output-calls-starting-in-intervals false --interval-set-rule UNION --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --disable-tool-default-annotations false --enable-all-annotations false",Version="4.1.2.0",Date="August 24, 2021 1:41:15 PM EDT">
##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --genotyping-mode DISCOVERY --bam-output gVCF/14788.realigned.g.vcf.out.bam --emit-ref-confidence GVCF --output gVCF/14788.g.vcf --intervals /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_2020/SureSelect_bed/Agilent_download_hg38/agilent_v5_S04380110_hs_hg38/v5_hg38_covered_cleaned.list --interval-padding 10 --input BAM-GATK-RA-RC/mapped_reads/14788.nodup.bqsr.mapped.sorted.bam --reference /gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_2020/GATK/ref/Homo_sapiens_assembly38.fasta --annotation-group StandardAnnotation --annotation-group AS_StandardAnnotation --annotation-group StandardHCAnnotation --use-new-qual-calculator true --use-old-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --genotype-filtered-alleles false --contamination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvcf-gq-bands 8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-gq-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvcf-gq-bands 18 --gvcf-gq-bands 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --gvcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 --gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 --gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvcf-gq-bands 40 --gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq-bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands 51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-bands 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --indel-size-to-eliminate-in-ref-model 10 --use-alleles-trigger false --disable-optimizations false --just-determine-active-regions false --dont-genotype false --do-not-run-physical-phasing false --use-filtered-reads-for-annotations false --correct-overlapping-quality false --adaptive-pruning false --do-not-recover-dangling-branches false --recover-dangling-heads false --consensus false --dont-trim-active-regions false --max-disc-ar-extension 25 --max-gga-ar-extension 300 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --max-unpruned-variants 100 --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --min-base-quality-score 10 --smith-waterman JAVA --max-mnp-distance 0 --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --interval-set-rule UNION --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-default-read-filters false --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version="4.1.2.0",Date="July 20, 2021 3:12:06 AM EDT">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=AS_BaseQRankSum,Number=A,Type=Float,Description="allele specific Z-score from Wilcoxon rank sum test of each Alt Vs. Ref base qualities">
##INFO=<ID=AS_FS,Number=A,Type=Float,Description="allele specific phred-scaled p-value using Fisher's exact test to detect strand bias of each alt allele">
##INFO=<ID=AS_InbreedingCoeff,Number=A,Type=Float,Description="allele specific heterozygosity as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation; relate to inbreeding coefficient">
##INFO=<ID=AS_MQ,Number=A,Type=Float,Description="Allele-specific RMS Mapping Quality">
##INFO=<ID=AS_MQRankSum,Number=A,Type=Float,Description="Allele-specific Mapping Quality Rank Sum">
##INFO=<ID=AS_QD,Number=A,Type=Float,Description="Allele-specific Variant Confidence/Quality by Depth">
##INFO=<ID=AS_RAW_BaseQRankSum,Number=1,Type=String,Description="raw data for allele specific rank sum test of base qualities">
##INFO=<ID=AS_RAW_MQ,Number=1,Type=String,Description="Allele-specfic raw data for RMS Mapping Quality">
##INFO=<ID=AS_RAW_MQRankSum,Number=1,Type=String,Description="Allele-specfic raw data for Mapping Quality Rank Sum">
##INFO=<ID=AS_RAW_ReadPosRankSum,Number=1,Type=String,Description="allele specific raw data for rank sum test of read position bias">
##INFO=<ID=AS_ReadPosRankSum,Number=A,Type=Float,Description="allele specific Z-score from Wilcoxon rank sum test of each Alt vs. Ref read position bias">
##INFO=<ID=AS_SB_TABLE,Number=1,Type=String,Description="Allele-specific forward/reverse read counts for strand bias tests">
##INFO=<ID=AS_SOR,Number=A,Type=Float,Description="Allele specific strand Odds Ratio of 2x|Alts| contingency table to detect allele specific strand bias">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=NEGATIVE_TRAIN_SITE,Number=0,Type=Flag,Description="This variant was used to build the negative training set of bad variants">
##INFO=<ID=POSITIVE_TRAIN_SITE,Number=0,Type=Flag,Description="This variant was used to build the positive training set of good variants">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality">
##INFO=<ID=RAW_MQandDP,Number=2,Type=Integer,Description="Raw data (sum of squared MQ and total depth) for improved RMS Mapping Quality calculation. Incompatible with deprecated RAW_MQ formulation.">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">
##INFO=<ID=VQSLOD,Number=1,Type=Float,Description="Log odds of being a true variant versus being false under the trained gaussian mixture model">
##INFO=<ID=VariantType,Number=1,Type=String,Description="Variant type description">
##INFO=<ID=culprit,Number=1,Type=String,Description="The annotation which was the worst performing in the Gaussian mixture model, likely the reason why the variant was filtered out">
##contig=<ID=chr1,length=248956422>
....
##reference=file:///gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_2020/GATK/ref/Homo_sapiens_assembly38.fasta
##source=ApplyVQSR
##source=CombineGVCFs
##source=GenotypeGVCFs
##source=HaplotypeCaller
-
Hi Mingzhou Fu,
I am going to move your post into our Community Discussions -> General Discussion topic, as the Germline topic is for reporting bugs and issues with GATK. You can read more about our forum guidelines and the topics here: Forum Guidelines.
Best,
Bhanu
-
Mingzhou Fu the GATK annotations can be found in a list here: https://gatk.broadinstitute.org/hc/en-us/articles/4405443524763--Tool-Documentation-Index#VariantAnnotations
HRun is not a GATK annotation so it cannot be added with our tools. I'm not familiar enough with that annotation to know if there is a GATK annotation that would be a replacement for it.
-
Hi there,
I actually found the Hrun (named HomopolymerRun) in GATK3.8. I tried to run GATK3.8 VariantAnnotator on my VCF file generated by GATK4.
Here's the command
module load gatk/3.8.0
ref_genome=/gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_2020/GATK/ref/Homo_sapiens_assembly38.fasta
in_vcf=$in_dir/SNP.recal_99.0.INDEL.recal_90.0.vcf.gz
gatk_bundle_sit=/gpfs/data/chaklab/data/HSCR_WES_2020_MF/Psomagen_broad_Macrogen_all/GATK_hg38_bundle
dbsnp=$gatk_bundle_sit/Homo_sapiens_assembly38.dbsnp138.vcf
java -Xmx180g -Xms160g -jar /gpfs/share/apps/gatk/3.8.0/GenomeAnalysisTK.jar \
-T VariantAnnotator \
-R $ref_genome \
--variant $in_vcf \
--out $out_dir/SNP.recal_99.0.INDEL.recal_90.0.extraannot.vcf.gz \
-A GCContent -A HomopolymerRun -A HardyWeinberg \
--dbsnp $dbsnp
I could run it without problem, but the log file had a lot warnings - 'HomopolymerRun - Encountered a homopolymer at chr1:44140831 longer than the tool's default window size, so the position was skipped. To process this position, add --reference_window_stop to your command with a value equal or greater than 85'.
I want to check what exact the warning is...Just wonder if there's a webpage/toolindex for GATK3?
Thank you!
-
Yes, our support team does not currently support GATK3, but you can see GATK3 information at our legacy forum site here: https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/
-
Thank you!
Please sign in to leave a comment.
5 comments