Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

sample_gender.report.txt is empty when running SVpreprocess restricted to chromosomes

0

8 comments

  • Avatar
    Bob Handsaker

    Much of the downstream code expects a (poorly named) "gender map" file giving the biological sex of each sample. This is used to compensate for the fact that X is significantly larger than Y and therefore it creates a bit of a batch effect if you don't normalize males and females separately.

    If you do nothing, parts of the code that are tolerant of an empty gender map will work, but the read depth estimates for males and females will be slightly biased. If you run into code paths that require a gender map, you can always supply your own (either based on self-declared sex or any other method you want to use).

    The effect on analyzing the autosomes is generally pretty small.

    Since you did the preprocessing by chromosome, another possibility would be to use the QScript for SVMergeMetadata to merge just the X and Y chromosomes together, then run the CallSampleGender utility on the merged metadata (see the SVPreprocess.q QScript or your log files to see an example). I think this should work.

    0
    Comment actions Permalink
  • Avatar
    zzq

    Dear Bob Handsaker

    Thank you. The autosomes, X and Y chromosomes are all included in the chr.list, why SVpreprocess can not predict the gender information of the sample. 

    1:1-end
    2:1-end
    3:1-end
    4:1-end
    5:1-end
    6:1-end
    7:1-end
    8:1-end
    X:1-end
    Y:1-end

    Best,

    Zheng zhuqing

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    I had thought from your former post that you were running preprocessing on each chromosome separately. If you are running them all together, then the sex calling should work.

    Can you clarify which reference genome and which GS reference metadata bundle you are using?

    Can you show me some info about what is in your metadata output file metadata/profiles_100Kb/rd.dat?

    0
    Comment actions Permalink
  • Avatar
    zzq

    dear Bob Handsaker

    I ran preprocessing for each sample with "-L chr.list" parameter, here all the autosomes and sex chromosomes are included in the chr.list file. Do you mean that preprocessing will be performed on each chromosome listed in the chr.list separately? If so, I wonder that how to run preprocessing for some specific chromosomes together? 

     

    Sorry, my study focus on animal in which I generated the reference metadata by myself. 

     

    Yes, the content of metadata/profiles_100Kb/rd.dat is as following:

    cat /metadata/profiles_100Kb/rd.dat 
    SAMPLE seq_1 seq_2 seq_3 seq_4 seq_5 seq_6 seq_7 seq_8 seq_9 seq_10 seq_11 seq_12 seq_13 seq_1seq_15 seq_16 seq_17 seq_18 seq_X seq_Y
    sample_1 2.0254 2.0093 2.0005 2.0130 2.0173 1.9989 2.0177 2.0211 2.0189 2.0335 1.9724 1.9435 2.0241.9990 2.0260 2.1718 1.9897 1.9783 2.0150 0.3271

    Best,

    Zheng zhuqing

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    OK, that all looks good.

    What are the arguments you are passing to CallSampleGender?  And what is in your ploidy map?

    CallSampleGender does a lot of defaulting to try to handle different cases, based on -L, -ploidyMapFile and -genderBedFile. It will generally compute the intersection of these to find contiguous intervals where the ploidy map is different between makes and females. You should also check that you are not passing a -genomeMaskFile argument to this utility that masks out either X or Y or both.

    I see in your rd.dat example data that Y is showing some mapped reads in what I might suspect is a female (based on X being at 2.015). We use -genderBedFile to select just "reliable" regions of X and Y to use for sex calling on human, which cleans up the results. You may want to do the same.

     

    0
    Comment actions Permalink
  • Avatar
    zzq

    Dear Bob Handsaker

    Thank you. The X and Y chromosomes are included in -L, the content of -ploidyMapFile is as following, in which I removed the PAR region from X and Y.

    X 1 1029481 F 2
    X 1 1029481 M 1
    X 6405431 125939595 F 2
    X 6405431 125939595 M 1
    Y 1 200469 F 0
    Y 1 200469 M 1
    Y 4791972 43547828 F 0
    Y 4791972 43547828 M 1
    * * * * 2

    Note, if I removed -L and ran Preprocessing on both chromosomes and scaffolds, the gender information deduced by GenomeSTRiP will be saved in the sample_gender.report.txt file. Thus, I just curious about the difference of running Preprocessing with -L or not.

     

    Thank you for your information, in the following study i will build a custom -genderBedFile for my species.

     

    Best,

    Zheng zhuqing

    0
    Comment actions Permalink
  • Avatar
    Bob Handsaker

    There is perhaps a bug, but it would be helpful to have a test case to reproduce it.

    Can you find in the log files the exact arguments being passed to CallSampleGender?

     

    0
    Comment actions Permalink
  • Avatar
    zzq

    Dear Bob Handsaker

    The log file is as follows:

    INFO 01:16:22,554 HelpFormatter - --------------------------------------------------------- 
    INFO 01:16:22,557 HelpFormatter - Program Name: org.broadinstitute.sv.apps.CallSampleGender
    INFO 01:16:22,564 HelpFormatter - Program Args: -I /path/metadata/headers.bam -configFile genstrip_parameters.txt -R ref.fa -md /path/metadata -genomeMaskFile svmask.fasta -L chr.list -ploidyMapFile ploidymap.txt -O /path/metadata/sample_gender.report.txt
    INFO 01:16:22,571 HelpFormatter - Executing as cche@s006 on Linux 3.10.0-862.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_161-b14.
    INFO 01:16:22,571 HelpFormatter - Date/Time: 2021/04/20 01:16:22
    INFO 01:16:22,571 HelpFormatter - ---------------------------------------------------------
    INFO 01:16:22,572 HelpFormatter - ---------------------------------------------------------
    INFO 01:16:22,574 CallSampleGender - Opening reference sequence ...
    INFO 01:16:22,575 CallSampleGender - Opened reference sequence.
    INFO 01:16:22,585 MetaData - Opening metadata ...
    INFO 01:16:22,585 MetaData - Adding metadata location /path/metadata ...
    INFO 01:16:22,600 MetaData - Opened metadata.
    INFO 01:16:22,819 GenderAlgorithm - There must be intervals defined for both chrX and chrY in order to call sample genders
    INFO 01:16:22,821 CommandLineProgram - Program completed.
    ------------------------------------------------------------------------------------------
    Done. ------------------------------------------------------------------------------------------
    There were no warn messages.
    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk