Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

PlotModeledSegments lead to java.lang.IllegalArgumentException related to inconsistency

0

5 comments

  • Avatar
    Gökalp Çelik

    Are you using the proper hets.tsv file produced after the ModelSegments step in this plotting step?

    --allelic-counts <File>       Input file containing allelic counts at heterozygous sites (.hets.tsv output of
                                  ModelSegments).  Default value: null.

    If not you need to replace that allelic.counts.tsv that you are using with that file. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Yuwei Bao

    Hi Gökalp:

    Thanks for your reply.

    Yes, I am using the sample.allelicCount.tsv file created by CollectAllelicCounts

    Here are the example entries of the DFG2.allelicCounts.tsv

    CONTIG  POSITION        REF_COUNT       ALT_COUNT       REF_NUCLEOTIDE  ALT_NUCLEOTIDE
    2L      4954    7       0       G       N
    2L      4955    8       0       C       N
    2L      4956    8       0       G       N
    2L      4957    9       0       T       N
    2L      4958    9       0       A       N
    2L      4959    8       0       T       N
    2L      4960    8       0       G       N
    2L      4961    8       0       C       N
    2L      4962    8       0       G       N


    Prior to this step, I used these commands

    CollectAllelicCounts
    Example entries were listed above.

    ModelSegments
    Example entries of DFG2.cr.seg
    CONTIG  START   END     NUM_POINTS_COPY_RATIO   MEAN_LOG2_COPY_RATIO
    2L      5001    1891000 1730    -0.033812
    2L      1891001 1907000 8       1.171973
    2L      1907001 3391000 1368    -0.007153
    2L      3391001 3433000 41      0.914498
    2L      3435001 23513712        16800   0.014652
    2R      1001    5796000 2477    0.075654
    2R      5796001 17813000        10998   -0.009842
    2R      17813001        17841000        28      0.916708
    2R      17841001        25286936        6823    -0.024294
     
    CallCopyRatioSegments
    Example entries of DFG2.called.seg
    CONTIG  START   END     NUM_POINTS_COPY_RATIO   MEAN_LOG2_COPY_RATIO    CALL
    2L      5001    1891000 1730    -0.033812       0
    2L      1891001 1907000 8       1.171973        +
    2L      1907001 3391000 1368    -0.007153       0
    2L      3391001 3433000 41      0.914498        +
    2L      3435001 23513712        16800   0.014652        0
    2R      1001    5796000 2477    0.075654        0
    2R      5796001 17813000        10998   -0.009842       0
    2R      17813001        17841000        28      0.916708        +
    2R      17841001        25286936        6823    -0.024294       0


    Please give me more advice on checking things. Thanks a lot!

    Yuwei

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    Actually you need to use the allelic counts file created by ModelSegments tool not the one you generated using CollectAllelicCounts. ModelSegments tool filters allelic counts that are usable by the downstream models therefore model only contains those sites that are selected by ModelSegments tool. There must be an output file with name hets.tsv at the end. That file is the one you need to use when plotting modeled segments. 

    Regards. 

    0
    Comment actions Permalink
  • Avatar
    Yuwei Bao

    Thank you very much!! That leads to a plot! 

    However, I am not sure how to get rid of the extra genome. For example, the PlotDenoisedCopyRatios leads to the result


    These are the genomes I want to show.

    The result of PlotModeledSegments 


    Are there some ways to get rid of the chromosomes beyond X?

    Also to interpret this result, I observed consistency within each genome, and some segments were identified on each genome. What other conclusions can I draw from this kind of results?

    Thanks a lot!

    Yuwei

    0
    Comment actions Permalink
  • Avatar
    Gökalp Çelik

    If you wish not to have any other chromosomes beyond X you need to remove them from your analysis of read count collections and  allelic counts collections. 

    Regards. 

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk