Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

(How to) Call rare germline copy number variants Follow

10 comments

  • Avatar
    Enrico Cocchi

    Is there any way to get the log2 output instead of the CN from PostprocessGermlineCNVCalls?

    1
    Comment actions Permalink
  • Avatar
    Calvin Hung

    Hi, I believe the *.tsv files in the tutorial_11684.tar.gz either from the GoogleDrive or from the FTP site are deprecated and cannot run through GermlineCNVCaller since GATK v4.1.x.x. I managed to hack the format and fixed it myself. You might want to update the tutorial files as well.

    0
    Comment actions Permalink
  • Avatar
    Ruqian Lyu

    Hi, 

    Thanks for the great tutorial.

    I'm trying to run the pipeline for 300 low coverage samples (~5X). At the step of running GermlineCNVCaller, I'm seeing the tool keeps increasing the number of epochs because CNV calling is not converged. It is now at 50 epochs. Is this something expected or is it possible the optimisation procedure has been "trapped"  ?


    0
    Comment actions Permalink
  • Avatar
    Ju Jose

    Thanks for the tutorial! Could you help me to understand her the NA19017.chr20sub.bam file was prepared? Is it just a BWA mapping reads? Does it got the sort and marked duplicates steps?

    0
    Comment actions Permalink
  • Avatar
    astiac

    I would like to know where I can get the following files:

    mappability-track regions file (in either .bed or .bed.gz format).
    segmental-duplication-track regions file (in either .bed or .bed.gz format).
    contig-ploidy-priors_contig_ploidy_priors.tsv 

    0
    Comment actions Permalink
  • Avatar
    jfarrell

    This link below is broken  from above.  Has there been an update with the Tutorial which matches the latest WDL pipeline?

    ftp://ftp.broadinstitute.org/tutorials/dataset

     

    Download tutorial_11684.tar.gz either from the GoogleDrive or from the FTP site. The bundle includes data for Notebook #11685 and Notebook #11686. To access the ftp site, leave the password field blank. If the GoogleDrive link is broken, please let us know. The tutorial also requires the GRCh38 reference FASTA, dictionary and index. These are available from the GATK Resource Bundle. The example data is from the 1000 Genomes project Phase 3 aligned to GRCh38.

    0
    Comment actions Permalink
  • Avatar
    Chipmunks

    Thanks for the tutorial! I have some troubles when using your tutorial to call CNV. Can you give me some suggestions? Here is my questions:

    Can't generate ploidy-calls directory and ploidy-calls/SAMPLE_0 when use DetermineGermlineContigPloidy – GATK (broadinstitute.org).

    0
    Comment actions Permalink
  • Avatar
    Marcela Martinez

    Hi,

    I am planning to run the Germline CNVs in docker on 106 targeted exomes. I am not planning to use the wld pipeline. I already run the Preprocessing step and I wonder how to run the next step, CollectReadsCounts over all those exams using the script below. Should I use a for loop to iterate over each bam sample and get every hdf5 sample result?

    In addition, is necessary to generate the cohort model on "normal samples" or it can be done on the same batch of affected ones?

    thanks

    gatk CollectReadCounts \
              -I sample.bam \
              -L intervals.interval_list \
              --interval-merging-rule OVERLAPPING_ONLY \
              -O sample.counts.hdf5
    0
    Comment actions Permalink
  • Avatar
    梁家成

    Hi,

    In step 4, the script contains all the input files. Running this script will take a lot of time. Is it possible to create a script for each input file? Will the results obtained be consistent with the results obtained from the script that contains all the input files?

    thanks

    gatk GermlineCNVCaller \
            --run-mode COHORT \
            -L scatter-sm/twelve_1of2.interval_list \
            -I cvg/HG00096.tsv -I cvg/HG00268.tsv -I cvg/HG00419.tsv -I cvg/HG00759.tsv \
            -I cvg/HG01051.tsv -I cvg/HG01112.tsv -I cvg/HG01500.tsv -I cvg/HG01565.tsv \
            -I cvg/HG01583.tsv -I cvg/HG01595.tsv -I cvg/HG01879.tsv -I cvg/HG02568.tsv \
            -I cvg/HG02922.tsv -I cvg/HG03006.tsv -I cvg/HG03052.tsv -I cvg/HG03642.tsv \
            -I cvg/HG03742.tsv -I cvg/NA18525.tsv -I cvg/NA18939.tsv -I cvg/NA19017.tsv \
            -I cvg/NA19625.tsv -I cvg/NA19648.tsv -I cvg/NA20502.tsv -I cvg/NA20845.tsv \
            --contig-ploidy-calls ploidy-calls \
            --annotated-intervals twelveregions.annotated.tsv \
            --interval-merging-rule OVERLAPPING_ONLY \
            --output cohort24-twelve \
            --output-prefix cohort24-twelve_1of2 \
            --verbosity DEBUG

     

    1
    Comment actions Permalink
  • Avatar
    Chris Pyatt

    I'm trying to run this on a WES cohort with 200k intervals, split into 5k groups by the scatter method described in section 4.2

    When I compare between scattered & non-scattered results (on a smaller subset that is not intractable to run whole), the segments called are not the same. I presume this is because I am missing any CNVs that span a boundary between scatter groups. How can I get around this?

    Thank you

    1
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk