Genome Analysis Toolkit

Variant Discovery in High-Throughput Sequencing Data

GATK process banner

Need Help?

Search our documentation

Community Forum

Hi, How can we help?

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size. Learn more

newbie question: how to split a genome bam file by using picard

0

8 comments

  • Avatar
    Bhanu Gandham

    Hi ,

    The GATK support team is focused on resolving questions about GATK tool-specific errors and abnormal results from the tools. For all other questions, such as this one, we are building a backlog to work through when we have the capacity.

    Please continue to post your questions because we will be mining them for improvements to documentation, resources, and tools.

    We cannot guarantee a reply, however, we ask other community members to help out if you know the answer.

    For context, check out our support policy.

     

    0
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    You can use the gatk tool PrintReads to select regions of a bam.

     

    ```

    gatk PrintReads --input original.bam -L Y --output yOnly.bam

    ```

    0
    Comment actions Permalink
  • Avatar
    danilovkiri

    Hi You Meng

    The most convenient and conventional solution is to use samtools for any operations on SAM/BAM files (the documentation is available at http://www.htslib.org/doc/samtools-view.html).

    To subset a BAM file using samtools you need to index it first (samtools index -@ <threads> <input.bam>) the run `samtools view -b -o <output.bam> -@<threads> <input.bam> chrY`. Note that -@ argument usage is reasonable for speeding up and only in case you have more that one CPU. It must have an INT value equal to the desired number of threads but not exceeding the max CPU/threads number. Unless you specify `-b` the output will be in SAM format (plain text).

    Also be aware of chromosome naming in your BAM file. Chromosomes can be named like chrN, N, etc. In order to find out how the chromosomes are named in your BAM run `samtools view <input.bam> | head` and look for the values in the 3rd field. 

    0
    Comment actions Permalink
  • Avatar
    You Meng

    Thanks Louis and danilovkiri,

    because I don't have experience in linux so I tried to find a tool in windows platform to do the extraction. looks like picard is the only tool i can find so far in WIN os. Is it true that picard does not have a tool/command to extract individual chromosome from a bam (I went through each tool name in the list but none of them seems like a cammand to this task). I am not from any bio background so all the terminologies are really hard for me (do i need to do all those alignment, comparison... are all those steps mandatory before i can extract an individual chromosome... sorry i have no idea, i just want to send me Y to some company to do analysis, that's all at least for now). Maybe i am wrong that I thought this is just the most basic or "first step" operation in genome analysis. 

    Really appriciate your help.

    1
    Comment actions Permalink
  • Avatar
    danilovkiri

    Does anyone around you have Linux/macOS? It is really the easiest way since Windows is not supported by definitely all the bioinformatics tools (it not because the developers are Linux fans, however, they are, the reason is simply in the environment and convenience). Picard does not have the functionality you look for. If you are familiar with AWS/DigitalOcean/Hetzner/etc you can try their cloud servers, but it still requires some prior Linux knowledge. As I have said, you better find a friend with ubuntu/macOS, install samtools (pretty easy, googling can help) and run the commands mentioned above.

    As far as I understood, your BAM file comes from a genetic sequencing provider, so it has aligned sequencing data. You don't have to perform any extra procedures.

    1
    Comment actions Permalink
  • Avatar
    You Meng

    Got it. I will try the linux approach. 

    Thank you for all the info.

    0
    Comment actions Permalink
  • Avatar
    Louis Bergelson

    To second danilovkiri, linux is pretty much an essential skill for doing bioinformatics.  The good news is that it's actually pretty easy to get started now a days!  It used to be hard and scary to set up, but now there very easy to use and install distributions of linux which you can find lots of tutorials on the internet about.

      You can set up linux on the same computer as your windows machine by either installing it on a virtual machine or as a separate OS you can also boot into (called dual booting).  Windows 10 has a new feature called windows subsystem for linux which might be worth checking out, it's like a linux emulator for windows.  I suspect that that will be trickier than using a proper linux installation since windows subsystem is subtly different in a bunch of ways that will probably need more expertise to understand than just installing a separate ubuntu installation.  

    Using command line tools has a bit of a learning curve but once you get past the initial bumps it's very fast and convenient way to do things!  Good luck!

    0
    Comment actions Permalink
  • Avatar
    You Meng

    Hi Louis,

    I have installed virtualbox and then installed mint in that virtualbox but encountered a lot of problems. it does not work like how it is supposed to, for example, the fullscreen mode does not work so it is in a very small screen ( i have followed some posts about “Insert Guest Additions***” but none of them works). also i have to access the host disk to read the bam file but after trying all kinds of solutions in the online posts i can find, the "shared folder" still does not work. Then i tried to install MX Linux, then ubuntu... each distribution has its own problem..I may have to try the dual booting way later. 

    Thank you for the suggestion

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk