Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

Input bam file not being recognized by BQSR

0

9 comments

  • Avatar
    Bhanu Gandham

    Hi 

     

    Diagnose your BAM files with ValidateSamFile.

    Take a look at this doc: https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile

    0
    Comment actions Permalink
  • Avatar
    danilovkiri

    Hi Amujal Marion

    The problem might be also somewhere here. Look at the error 

    A USER ERROR has occurred: Invalid argument 'UGTB015-pe.sorted.marked_duplicates.bam'.

    When you specify 

    -I ${sample}-pe.sorted.marked_duplicates.bam \
    --known-sites ${ref_dbsnp}

     

    the ${ref_dbsnp} variable contains a correct path, i.e. it is either a full path (starting with `/`) or a local path (starting with `.`). The ${sample} variable is neither a beginning of a full path nor a local path. When you specify `-I filename_whatever_lalala` (as well as -O ...) the tools look for the filename_whatever_lalala in the root directory, not in the cwd folder! It is a good practice to always specify absolute paths to all files to be independent from cwd and to prevent such mistakes.

     

    0
    Comment actions Permalink
  • Avatar
    danilovkiri

    Bhanu Gandham

    I also found that judging by the error stack the barclay.argparser uses LegacyCommandLineArgumentParser.java (https://github.com/broadinstitute/barclay/blob/860ca6740459813372667f20ecabeced98b90837/src/main/java/org/broadinstitute/barclay/argparser/LegacyCommandLineArgumentParser.java) though it `implements` the name of CommandLineArgumentParser.java (the error formatting "A USER ERROR has occurred: Invalid argument" can be found only in the Legacy class). The Legacy class supports options of the form KEY=VALUE, plus positional arguments, as stated in the header (lines 55-56, 388). Is it possible that the GATK 4.1.7.0 wrapper used a wrong CLI Barclay parser class for some reason?

    0
    Comment actions Permalink
  • Avatar
    Amujal Marion

    1. Diagnose your BAM files with ValidateSamFile.

    No errors found
    [Mon Jun 15 12:30:34 CDT 2020] picard.sam.ValidateSamFile done. Elapsed time: 6.17 minutes.
    Runtime.totalMemory()=524812288

    2. I have also re-run the script with the absolute path to the bam files specified as suggested above but still got the above error message.

    3. I haven't tried out the final recommendation "I also found that judging by the error stack the barclay.argparser uses LegacyCommandLineArgumentParser.java" because I don't seem to understand how to go about it.

    Thank you!!

     

    0
    Comment actions Permalink
  • Avatar
    danilovkiri

    Amujal Marion

    Please copy and post the full BaseRecalibarator log here. It begins right after `echo $sample` execution. 

    0
    Comment actions Permalink
  • Avatar
    Amujal Marion

    UGTB015
    Using GATK jar /mnt/................../..........gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx4g -Djava.io.tmpdir=/mnt/............./.............../Test_data/New_test_fastQfiles/out/Aligned_Bam/trial/tmpdir -jar /mnt/............./................/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar BaseRecalibrator -R -I /mnt/.................../..................../Test_data/New_test_fastQfiles/out/Aligned_Bam/trial/UGTB015-pe.sorted.marked_duplicates.bam --known-sites /mnt/.................../...................../Known_sites/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf --known-sites /mnt/................./................/Known_sites/resources_broad_hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf --known-sites /mnt/......................./..................../Known_sites/resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf -O UGTB015_recal_data.table
    USAGE: BaseRecalibrator [arguments]

    First pass of the Base Quality Score Recalibration (BQSR) -- Generates recalibration table based on various
    user-specified covariates (such as read group, reported quality score, machine cycle, and nucleotide context).
    Version:4.1.7.0


    Required Arguments:

    --input,-I:String BAM/SAM/CRAM file containing reads This argument must be specified at least once.
    Required.

    --known-sites:FeatureInput One or more databases of known polymorphic sites used to exclude regions around known
    polymorphisms from analysis. This argument must be specified at least once. Required.

    --output,-O:File The output recalibration table file to create Required.

    --reference,-R:GATKPathSpecifier
    Reference sequence file Required.


    Optional Arguments:

    --add-output-sam-program-record,-add-output-sam-program-record:Boolean
    If true, adds a PG tag to created SAM/BAM/CRAM files. Default value: true. Possible
    values: {true, false}

    --add-output-vcf-command-line,-add-output-vcf-command-line:Boolean
    If true, adds a command line header line to created VCF files. Default value: true.
    Possible values: {true, false}

    --arguments_file:File read one or more arguments files and add them to the command line This argument may be
    specified 0 or more times. Default value: null.

    --binary-tag-name:String the binary tag covariate name if using it Default value: null.

    --bqsr-baq-gap-open-penalty:Double
    BQSR BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for
    whole genome call sets Default value: 40.0.

    --cloud-index-prefetch-buffer,-CIPB:Integer
    Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to
    cloudPrefetchBuffer if unset. Default value: -1.

    --cloud-prefetch-buffer,-CPB:Integer
    Size of the cloud-only prefetch buffer (in MB; 0 to disable). Default value: 40.

    --create-output-bam-index,-OBI:Boolean
    If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file. Default
    value: true. Possible values: {true, false}

    --create-output-bam-md5,-OBM:Boolean
    If true, create a MD5 digest for any BAM/SAM/CRAM file created Default value: false.
    Possible values: {true, false}

    --create-output-variant-index,-OVI:Boolean
    If true, create a VCF index when writing a coordinate-sorted VCF file. Default value:
    true. Possible values: {true, false}

    --create-output-variant-md5,-OVM:Boolean
    If true, create a a MD5 digest any VCF file created. Default value: false. Possible
    values: {true, false}

    --default-base-qualities:Byte Assign a default base quality Default value: -1.

    --deletions-default-quality:Byte
    default quality for the base deletions covariate Default value: 45.

    --disable-bam-index-caching,-DBIC:Boolean
    If true, don't cache bam indexes, this will reduce memory requirements but may harm
    performance if many intervals are specified. Caching is automatically disabled if there
    are no intervals specified. Default value: false. Possible values: {true, false}

    --disable-read-filter,-DF:String
    Read filters to be disabled before analysis This argument may be specified 0 or more
    times. Default value: null. Possible Values: {MappedReadFilter,
    MappingQualityAvailableReadFilter, MappingQualityNotZeroReadFilter,
    NotDuplicateReadFilter, NotSecondaryAlignmentReadFilter,
    PassesVendorQualityCheckReadFilter, WellformedReadFilter}

    --disable-sequence-dictionary-validation,-disable-sequence-dictionary-validation:Boolean
    If specified, do not check the sequence dictionaries from our inputs for compatibility.
    Use at your own risk! Default value: false. Possible values: {true, false}

    --exclude-intervals,-XL:StringOne or more genomic intervals to exclude from processing This argument may be specified 0
    or more times. Default value: null.

    --gatk-config-file:String A configuration file to use with the GATK. Default value: null.

    --gcs-max-retries,-gcs-retries:Integer
    If the GCS bucket channel errors out, how many times it will attempt to re-initiate the
    connection Default value: 20.

    --gcs-project-for-requester-pays:String
    Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be
    accessed. Default value: .

    --help,-h:Boolean display the help message Default value: false. Possible values: {true, false}

    --indels-context-size,-ics:Integer
    Size of the k-mer context to be used for base insertions and deletions Default value: 3.

    --insertions-default-quality:Byte
    default quality for the base insertions covariate Default value: 45.

    --interval-exclusion-padding,-ixp:Integer
    Amount of padding (in bp) to add to each interval you are excluding. Default value: 0.

    --interval-merging-rule,-imr:IntervalMergingRule
    Interval merging rule for abutting intervals Default value: ALL. Possible values: {ALL,
    OVERLAPPING_ONLY}

    --interval-padding,-ip:IntegerAmount of padding (in bp) to add to each interval you are including. Default value: 0.

    --interval-set-rule,-isr:IntervalSetRule
    Set merging approach to use for combining interval inputs Default value: UNION. Possible
    values: {UNION, INTERSECTION}

    --intervals,-L:String One or more genomic intervals over which to operate This argument may be specified 0 or
    more times. Default value: null.

    --lenient,-LE:Boolean Lenient processing of VCF files Default value: false. Possible values: {true, false}

    --low-quality-tail:Byte minimum quality for the bases in the tail of the reads to be considered Default value: 2.

    --maximum-cycle-value,-max-cycle:Integer
    The maximum cycle value permitted for the Cycle covariate Default value: 500.

    --mismatches-context-size,-mcs:Integer
    Size of the k-mer context to be used for base mismatches Default value: 2.

    --mismatches-default-quality:Byte
    default quality for the base mismatches covariate Default value: -1.

    --preserve-qscores-less-than:Integer
    Don't recalibrate bases with quality scores less than this threshold (with -bqsr) Default
    value: 6.

    --quantizing-levels:Integer number of distinct quality scores in the quantized output Default value: 16.

    --QUIET:Boolean Whether to suppress job-summary info on System.err. Default value: false. Possible
    values: {true, false}

    --read-filter,-RF:String Read filters to be applied before analysis This argument may be specified 0 or more
    times. Default value: null. Possible Values: {AlignmentAgreesWithHeaderReadFilter,
    AllowAllReadsReadFilter, AmbiguousBaseReadFilter, CigarContainsNoNOperator,
    FirstOfPairReadFilter, FragmentLengthReadFilter, GoodCigarReadFilter,
    HasReadGroupReadFilter, IntervalOverlapReadFilter, LibraryReadFilter, MappedReadFilter,
    MappingQualityAvailableReadFilter, MappingQualityNotZeroReadFilter,
    MappingQualityReadFilter, MatchingBasesAndQualsReadFilter, MateDifferentStrandReadFilter,
    MateDistantReadFilter, MateOnSameContigOrNoMappedMateReadFilter,
    MateUnmappedAndUnmappedReadFilter, MetricsReadFilter,
    NonChimericOriginalAlignmentReadFilter, NonZeroFragmentLengthReadFilter,
    NonZeroReferenceLengthAlignmentReadFilter, NotDuplicateReadFilter,
    NotOpticalDuplicateReadFilter, NotProperlyPairedReadFilter,
    NotSecondaryAlignmentReadFilter, NotSupplementaryAlignmentReadFilter,
    OverclippedReadFilter, PairedReadFilter, PassesVendorQualityCheckReadFilter,
    PlatformReadFilter, PlatformUnitReadFilter, PrimaryLineReadFilter,
    ProperlyPairedReadFilter, ReadGroupBlackListReadFilter, ReadGroupReadFilter,
    ReadLengthEqualsCigarLengthReadFilter, ReadLengthReadFilter, ReadNameReadFilter,
    ReadStrandFilter, SampleReadFilter, SecondOfPairReadFilter, SeqIsStoredReadFilter,
    SoftClippedReadFilter, ValidAlignmentEndReadFilter, ValidAlignmentStartReadFilter,
    WellformedReadFilter}

    --read-index,-read-index:String
    Indices to use for the read inputs. If specified, an index must be provided for every read
    input and in the same order as the read inputs. If this argument is not specified, the
    path to the index for each input will be inferred automatically. This argument may be
    specified 0 or more times. Default value: null.

    --read-validation-stringency,-VS:ValidationStringency
    Validation stringency for all SAM/BAM/CRAM/SRA files read by this program. The default
    stringency value SILENT can improve performance when processing a BAM file in which
    variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default
    value: SILENT. Possible values: {STRICT, LENIENT, SILENT}

    --seconds-between-progress-updates,-seconds-between-progress-updates:Double
    Output traversal statistics every time this many seconds elapse Default value: 10.0.

    --sequence-dictionary,-sequence-dictionary:String
    Use the given sequence dictionary as the master/canonical sequence dictionary. Must be a
    .dict file. Default value: null.

    --sites-only-vcf-output:Boolean
    If true, don't emit genotype fields when writing vcf file output. Default value: false.
    Possible values: {true, false}

    --tmp-dir:GATKPathSpecifier Temp directory to use. Default value: null.

    --use-jdk-deflater,-jdk-deflater:Boolean
    Whether to use the JdkDeflater (as opposed to IntelDeflater) Default value: false.
    Possible values: {true, false}

    --use-jdk-inflater,-jdk-inflater:Boolean
    Whether to use the JdkInflater (as opposed to IntelInflater) Default value: false.
    Possible values: {true, false}

    --use-original-qualities,-OQ:Boolean
    Use the base quality scores from the OQ tag Default value: false. Possible values: {true,
    false}

    --verbosity,-verbosity:LogLevel
    Control verbosity of logging. Default value: INFO. Possible values: {ERROR, WARNING,
    INFO, DEBUG}

    --version:Boolean display the version number for this tool Default value: false. Possible values: {true,
    false}


    Advanced Arguments:

    --disable-tool-default-read-filters,-disable-tool-default-read-filters:Boolean
    Disable all tool default read filters (WARNING: many tools will not function correctly
    without their default read filters on) Default value: false. Possible values: {true,
    false}

    --showHidden,-showHidden:Boolean
    display hidden arguments Default value: false. Possible values: {true, false}

    Conditional Arguments for readFilter:

    Valid only if "AmbiguousBaseReadFilter" is specified:
    --ambig-filter-bases:Integer Threshold number of ambiguous bases. If null, uses threshold fraction; otherwise,
    overrides threshold fraction. Default value: null. Cannot be used in conjuction with
    argument(s) maxAmbiguousBaseFraction

    --ambig-filter-frac:Double Threshold fraction of ambiguous bases Default value: 0.05. Cannot be used in conjuction
    with argument(s) maxAmbiguousBases

    Valid only if "FragmentLengthReadFilter" is specified:
    --max-fragment-length:Integer Maximum length of fragment (insert size) Default value: 1000000.

    --min-fragment-length:Integer Minimum length of fragment (insert size) Default value: 0.

    Valid only if "IntervalOverlapReadFilter" is specified:
    --keep-intervals:String One or more genomic intervals to keep This argument must be specified at least once.
    Required.

    Valid only if "LibraryReadFilter" is specified:
    --library,-library:String Name of the library to keep This argument must be specified at least once. Required.

    Valid only if "MappingQualityReadFilter" is specified:
    --maximum-mapping-quality:Integer
    Maximum mapping quality to keep (inclusive) Default value: null.

    --minimum-mapping-quality:Integer
    Minimum mapping quality to keep (inclusive) Default value: 10.

    Valid only if "MateDistantReadFilter" is specified:
    --mate-too-distant-length:Integer
    Minimum start location difference at which mapped mates are considered distant Default
    value: 1000.

    Valid only if "OverclippedReadFilter" is specified:
    --dont-require-soft-clips-both-ends:Boolean
    Allow a read to be filtered out based on having only 1 soft-clipped block. By default,
    both ends must have a soft-clipped block, setting this flag requires only 1 soft-clipped
    block Default value: false. Possible values: {true, false}

    --filter-too-short:Integer Minimum number of aligned bases Default value: 30.

    Valid only if "PlatformReadFilter" is specified:
    --platform-filter-name:String Platform attribute (PL) to match This argument must be specified at least once. Required.

    Valid only if "PlatformUnitReadFilter" is specified:
    --black-listed-lanes:String Platform unit (PU) to filter out This argument must be specified at least once. Required.

    Valid only if "ReadGroupBlackListReadFilter" is specified:
    --read-group-black-list:StringA read group filter expression in the form "attribute:value", where "attribute" is a two
    character read group attribute such as "RG" or "PU". This argument must be specified at
    least once. Required.

    Valid only if "ReadGroupReadFilter" is specified:
    --keep-read-group:String The name of the read group to keep Required.

    Valid only if "ReadLengthReadFilter" is specified:
    --max-read-length:Integer Keep only reads with length at most equal to the specified value Required.

    --min-read-length:Integer Keep only reads with length at least equal to the specified value Default value: 1.

    Valid only if "ReadNameReadFilter" is specified:
    --read-name:String Keep only reads with this read name Required.

    Valid only if "ReadStrandFilter" is specified:
    --keep-reverse-strand-only:Boolean
    Keep only reads on the reverse strand Required. Possible values: {true, false}

    Valid only if "SampleReadFilter" is specified:
    --sample,-sample:String The name of the sample(s) to keep, filtering out all others This argument must be
    specified at least once. Required.

    Valid only if "SoftClippedReadFilter" is specified:
    --invert-soft-clip-ratio-filter:Boolean
    Inverts the results from this filter, causing all variants that would pass to fail and
    visa-versa. Default value: false. Possible values: {true, false}

    --soft-clipped-leading-trailing-ratio:Double
    Threshold ratio of soft clipped bases (leading / trailing the cigar string) to total bases
    in read for read to be filtered. Default value: null. Cannot be used in conjuction with
    argument(s) minimumSoftClippedRatio

    --soft-clipped-ratio-threshold:Double
    Threshold ratio of soft clipped bases (anywhere in the cigar string) to total bases in
    read for read to be filtered. Default value: null. Cannot be used in conjuction with
    argument(s) minimumLeadingTrailingSoftClippedRatio


    ***********************************************************************

    A USER ERROR has occurred: Invalid argument '/mnt/......................./.................../Test_data/New_test_fastQfiles/out/Aligned_Bam/trial/UGTB015-pe.sorted.marked_duplicates.bam'.

    ***********************************************************************
    org.broadinstitute.barclay.argparser.CommandLineException: Invalid argument '/mnt/................/................../Test_data/New_test_fastQfiles/out/Aligned_Bam/trial/UGTB015-pe.sorted.marked_duplicates.bam'.
    at org.broadinstitute.barclay.argparser.CommandLineArgumentParser.setPositionalArgument(CommandLineArgumentParser.java:600)
    at org.broadinstitute.barclay.argparser.CommandLineArgumentParser.parseArguments(CommandLineArgumentParser.java:432)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.parseArgs(CommandLineProgram.java:232)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:206)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)

    0
    Comment actions Permalink
  • Avatar
    danilovkiri

    Amujal Marion Can you notice something strange in the very beginning of the log where GATK explicitly shows the command it is running? Here, take a look at the highlighted code. Specifically, where is your $REF, which must come after -R argument? This is exactly the reason for the observed behaviour which is reproducible from my side.

    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx4g -Djava.io.tmpdir=/mnt/............./.............../Test_data/New_test_fastQfiles/out/Aligned_Bam/trial/tmpdir -jar /mnt/............./................/gatk-4.1.7.0/gatk-package-4.1.7.0-local.jar BaseRecalibrator -R -I /mnt/.................../..................../Test_data/New_test_fastQfiles/out/Aligned_Bam/trial/UGTB015-pe.sorted.marked_duplicates.bam --known-sites /mnt/.................../...................../Known_sites/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf --known-sites /mnt/................./................/Known_sites/resources_broad_hg38_v0_Mills_and_1000G_gold_standard.indels.hg38.vcf --known-sites /mnt/......................./..................../Known_sites/resources_broad_hg38_v0_1000G_phase1.snps.high_confidence.hg38.vcf -O UGTB015_recal_data.table

    0
    Comment actions Permalink
  • Avatar
    Amujal Marion

    Thank you so much!

    This has helped me resolve the issue.

    0
    Comment actions Permalink
  • Avatar
    Bhanu Gandham

    Good catch danilovkiri ! Thank you!

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk