Contig names lost after FastaAlternateReferenceMaker
Hi,
I have a transcriptomic dataset that has been done with CBA_J mice. I wanted to add the CBA_J SNPs to the classical mm10 reference that was done with C57 strain.
So for that, I used the FastaAlternateReferenceMaker after converting the names of the chromosomes in the CBA_J SNPs file with bcftools. They were like 1, 2, 3...and I renamed them chr1, chr2, chr3.. like the GRCm38 reference fasta.
It worked fine. But in the final fasta produced, I lost all of the chromosome/region names.
Here are the names in the reference GRCm38 :
cat Mus_musculus.GRCm38.dna_sm.primary_assembly.fa.fai
1 195471971 59 60 61
10 130694993 198729958 60 61
11 122082543 331603262 60 61
12 120129022 455720576 60 61
13 120421639 577851810 60 61
14 124902244 700280538 60 61
15 104043685 827264548 60 61
16 98207768 933042355 60 61
17 94987271 1032886980 60 61
18 90702639 1129457433 60 61
19 61431566 1221671843 60 61
2 182113224 1284127328 60 61
3 160039680 1469275832 60 61
4 156508116 1631982899 60 61
5 151834684 1791099543 60 61
6 149736546 1945464865 60 61
7 145441459 2097697080 60 61
8 129401213 2245562623 60 61
9 124595110 2377120582 60 61
MT 16299 2503792335 60 61
X 171031299 2503808965 60 61
Y 91744698 2677690844 60 61
JH584299.1 953012 2770964691 60 61
GL456233.1 336933 2771933657 60 61
JH584301.1 259875 2772276276 60 61
GL456211.1 241735 2772540553 60 61
GL456350.1 227966 2772786387 60 61
JH584293.1 207968 2773018223 60 61
GL456221.1 206961 2773229728 60 61
JH584297.1 205776 2773440209 60 61
JH584296.1 199368 2773649485 60 61
GL456354.1 195993 2773852246 60 61
JH584294.1 191905 2774051576 60 61
JH584298.1 184189 2774246750 60 61
JH584300.1 182347 2774434079 60 61
GL456219.1 175968 2774619536 60 61
GL456210.1 169725 2774798507 60 61
JH584303.1 158099 2774971131 60 61
JH584302.1 155838 2775131935 60 61
GL456212.1 153618 2775290441 60 61
JH584304.1 114452 2775446690 60 61
GL456379.1 72385 2775563119 60 61
GL456216.1 66673 2775636780 60 61
GL456393.1 55711 2775704634 60 61
GL456366.1 47073 2775761343 60 61
GL456367.1 42057 2775809270 60 61
GL456239.1 40056 2775852097 60 61
GL456213.1 39340 2775892890 60 61
GL456383.1 38659 2775932955 60 61
GL456385.1 35240 2775972328 60 61
GL456360.1 31704 2776008225 60 61
GL456378.1 31602 2776040527 60 61
GL456389.1 28772 2776072725 60 61
GL456372.1 28664 2776102046 60 61
GL456370.1 26764 2776131257 60 61
GL456381.1 25871 2776158537 60 61
GL456387.1 24685 2776184909 60 61
GL456390.1 24668 2776210075 60 61
GL456394.1 24323 2776235224 60 61
GL456392.1 23629 2776260022 60 61
GL456382.1 23158 2776284114 60 61
GL456359.1 22974 2776307727 60 61
GL456396.1 21240 2776331153 60 61
GL456368.1 20208 2776352816 60 61
JH584292.1 14945 2776373430 60 61
JH584295.1 1976 2776388693 60 61
and here are the names in the Vcf files with the SNPs :
[kbouzid@core-login1 References]$ bcftools query -f '%CHROM\n' renamed_CBA_J_snps.vcf | sort -u
chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrX
chrY
And after making the alternate reference, here are the chromosomes ID :
[kbouzid@core-login1 References]$ cat Fasta_CBA.fa.fai
1 195471971 17 60 61
2 130694993 198729873 60 61
3 122082543 331603134 60 61
4 120129022 455720405 60 61
5 120421639 577851596 60 61
6 124902244 700280281 60 61
7 104043685 827264248 60 61
8 98207768 933042012 60 61
9 94987271 1032886594 60 61
10 90702639 1129457005 60 61
11 61431566 1221671373 60 61
12 182113224 1284126817 60 61
13 160039680 1469275280 60 61
14 156508116 1631982306 60 61
15 151834684 1791098909 60 61
16 149736546 1945464190 60 61
17 145441459 2097696364 60 61
18 129401213 2245561866 60 61
19 124595110 2377119784 60 61
20 16299 2503791495 60 61
21 171031299 2503808084 60 61
22 91744698 2677689922 60 61
23 953012 2770963723 60 61
24 336933 2771932643 60 61
25 259875 2772275216 60 61
26 241735 2772539447 60 61
27 227966 2772785235 60 61
28 207968 2773017025 60 61
29 206961 2773228484 60 61
30 205776 2773438919 60 61
31 199368 2773648149 60 61
32 195993 2773850864 60 61
33 191905 2774050148 60 61
34 184189 2774245276 60 61
35 182347 2774432559 60 61
36 175968 2774617970 60 61
37 169725 2774796895 60 61
38 158099 2774969473 60 61
39 155838 2775130231 60 61
40 153618 2775288691 60 61
41 114452 2775444894 60 61
42 72385 2775561277 60 61
43 66673 2775634892 60 61
44 55711 2775702700 60 61
45 47073 2775759363 60 61
46 42057 2775807244 60 61
47 40056 2775850025 60 61
48 39340 2775890772 60 61
49 38659 2775930791 60 61
50 35240 2775970118 60 61
51 31704 2776005969 60 61
52 31602 2776038225 60 61
53 28772 2776070377 60 61
54 28664 2776099652 60 61
55 26764 2776128817 60 61
56 25871 2776156051 60 61
57 24685 2776182377 60 61
58 24668 2776207497 60 61
59 24323 2776232600 60 61
60 23629 2776257352 60 61
61 23158 2776281398 60 61
62 22974 2776304965 60 61
63 21240 2776328345 60 61
64 20208 2776349962 60 61
65 14945 2776370530 60 61
66 1976 2776385747 60 61
The names of the chromosomes and regions are lost, is it normal ? How can I know which number corresponds to which chromosome ? Is there any option I can add in order to keep the chromosome names in the final fasta ?
Best,
Kheira
REQUIRED for all errors and issues:
a) GATK version used:
[kbouzid@core-login1 References]$ gatk --version
Using GATK jar /gatk/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.2.6.1-local.jar --version
The Genome Analysis Toolkit (GATK) v4.2.6.1
HTSJDK Version: 2.24.1
Picard Version: 2.27.1
b) Exact command used:
gatk FastaAlternateReferenceMaker -R Mus_musculus.GRCm38.dna_sm.primary_assembly.fa -O Fasta_CBA.fa -V renamed_CBA_J_snps.vcf
c) Entire program log:
Using GATK jar /gatk/gatk-package-4.2.6.1-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.2.6.1-local.jar FastaAlternateReferenceMaker -R Mus_musculus.GRCm38.dna_sm.primary_assembly.fa -O Fasta_CBA.fa -V renamed_CBA_J_snps.vcf
08:15:41.929 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.2.6.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
08:15:42.111 INFO FastaAlternateReferenceMaker - ------------------------------------------------------------
08:15:42.112 INFO FastaAlternateReferenceMaker - The Genome Analysis Toolkit (GATK) v4.2.6.1
08:15:42.112 INFO FastaAlternateReferenceMaker - For support and documentation go to https://software.broadinstitute.org/gatk/
08:15:42.112 INFO FastaAlternateReferenceMaker - Executing as kbouzid@core-login1.cluster.france-bioinformatique.fr on Linux v3.10.0-1160.6.1.el7.x86_64 amd64
08:15:42.112 INFO FastaAlternateReferenceMaker - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_242-8u242-b08-0ubuntu3~18.04-b08
08:15:42.112 INFO FastaAlternateReferenceMaker - Start Date/Time: May 15, 2024 8:15:41 AM GMT
08:15:42.112 INFO FastaAlternateReferenceMaker - ------------------------------------------------------------
08:15:42.112 INFO FastaAlternateReferenceMaker - ------------------------------------------------------------
08:15:42.113 INFO FastaAlternateReferenceMaker - HTSJDK Version: 2.24.1
08:15:42.113 INFO FastaAlternateReferenceMaker - Picard Version: 2.27.1
08:15:42.113 INFO FastaAlternateReferenceMaker - Built for Spark Version: 2.4.5
08:15:42.113 INFO FastaAlternateReferenceMaker - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:15:42.113 INFO FastaAlternateReferenceMaker - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:15:42.113 INFO FastaAlternateReferenceMaker - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:15:42.113 INFO FastaAlternateReferenceMaker - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:15:42.113 INFO FastaAlternateReferenceMaker - Deflater: IntelDeflater
08:15:42.114 INFO FastaAlternateReferenceMaker - Inflater: IntelInflater
08:15:42.114 INFO FastaAlternateReferenceMaker - GCS max retries/reopens: 20
08:15:42.114 INFO FastaAlternateReferenceMaker - Requester pays: disabled
08:15:42.114 INFO FastaAlternateReferenceMaker - Initializing engine
08:15:42.592 INFO FeatureManager - Using codec VCFCodec to read file file:///shared/projects/placentatlas/References/renamed_CBA_J_snps.vcf
08:15:42.726 INFO FastaAlternateReferenceMaker - Done initializing engine
08:15:43.685 INFO ProgressMeter - Starting traversal
08:15:43.685 INFO ProgressMeter - Current Locus Elapsed Minutes Bases Processed Bases/Minute
08:15:53.694 INFO ProgressMeter - 1:21045000 0.2 21045000 126270000.0
08:16:03.687 INFO ProgressMeter - 1:38408000 0.3 38408000 115224000.0
08:16:13.688 INFO ProgressMeter - 1:54872000 0.5 54872000 109744000.0
08:16:23.687 INFO ProgressMeter - 1:67396000 0.7 67396000 101094000.0
08:16:33.687 INFO ProgressMeter - 1:79653000 0.8 79653000 95583600.0
08:16:43.687 INFO ProgressMeter - 1:97408000 1.0 97408000 97408000.0
08:16:53.686 INFO ProgressMeter - 1:109867000 1.2 109867000 94171714.3
08:17:03.687 INFO ProgressMeter - 1:129175000 1.3 129175000 96881250.0
08:17:13.686 INFO ProgressMeter - 1:142566000 1.5 142566000 95044000.0
08:17:23.687 INFO ProgressMeter - 1:160004000 1.7 160004000 96002400.0
08:17:33.687 INFO ProgressMeter - 1:174553000 1.8 174553000 95210727.3
08:17:43.687 INFO ProgressMeter - 1:187526000 2.0 187526000 93763000.0
08:17:55.293 INFO ProgressMeter - 10:29 2.2 195472000 89116909.6
08:18:05.293 INFO ProgressMeter - 10:13437029 2.4 208909000 88517012.0
08:18:15.293 INFO ProgressMeter - 10:26089029 2.5 221561000 87685579.7
08:18:25.294 INFO ProgressMeter - 10:37333029 2.7 232805000 86434290.8
08:18:35.292 INFO ProgressMeter - 10:51440029 2.9 246912000 86329848.6
08:18:45.293 INFO ProgressMeter - 10:69454029 3.0 264926000 87527724.9
08:18:55.293 INFO ProgressMeter - 10:81763029 3.2 277235000 86814087.2
08:19:05.293 INFO ProgressMeter - 10:94311029 3.4 289783000 86242373.7
08:19:15.294 INFO ProgressMeter - 10:112312029 3.5 307784000 87270871.3
08:19:27.948 INFO ProgressMeter - 11:36 3.7 326167000 87264092.9
08:19:37.949 INFO ProgressMeter - 11:24485036 3.9 350652000 89810212.5
08:19:47.949 INFO ProgressMeter - 11:45490036 4.1 371657000 91293037.8
08:19:57.950 INFO ProgressMeter - 11:69952036 4.2 396119000 93474998.2
08:20:07.949 INFO ProgressMeter - 11:94936036 4.4 421103000 95610341.3
08:20:17.949 INFO ProgressMeter - 11:121100036 4.6 447267000 97848116.0
08:20:27.948 INFO ProgressMeter - 12:16836493 4.7 465086000 98167043.1
08:20:37.949 INFO ProgressMeter - 12:41711493 4.9 489961000 99903011.6
08:20:47.949 INFO ProgressMeter - 12:68770493 5.1 517020000 101955551.5
08:20:57.948 INFO ProgressMeter - 12:98972493 5.2 547222000 104477537.9
08:21:07.948 INFO ProgressMeter - 13:1917471 5.4 570296000 105525038.4
08:21:17.948 INFO ProgressMeter - 13:30065471 5.6 598444000 107420646.1
08:21:27.948 INFO ProgressMeter - 13:60706471 5.7 629085000 109640622.5
08:21:37.948 INFO ProgressMeter - 13:92865471 5.9 661244000 111992367.2
08:21:49.601 INFO ProgressMeter - 14:832 6.1 688801000 112944735.6
08:21:59.600 INFO ProgressMeter - 14:31931832 6.3 720732000 115036737.1
08:22:09.600 INFO ProgressMeter - 14:63891832 6.4 752692000 117024829.4
08:22:19.600 INFO ProgressMeter - 14:95555832 6.6 784356000 118867632.9
08:22:31.586 INFO ProgressMeter - 15:588 6.8 813703000 119691542.0
08:22:41.586 INFO ProgressMeter - 15:31463588 7.0 845166000 121344723.6
08:22:51.586 INFO ProgressMeter - 15:63807588 7.1 877510000 123044169.2
08:23:01.586 INFO ProgressMeter - 15:94699588 7.3 908402000 124467047.3
08:23:11.586 INFO ProgressMeter - 16:14208903 7.5 931955000 124843268.6
08:23:21.586 INFO ProgressMeter - 16:45298903 7.6 963045000 126190653.0
08:23:31.586 INFO ProgressMeter - 16:77014903 7.8 994761000 127560718.1
08:23:41.589 INFO ProgressMeter - 17:3848135 8.0 1019802000 128034601.2
08:23:51.589 INFO ProgressMeter - 17:33659135 8.1 1049613000 129076435.3
08:24:01.590 INFO ProgressMeter - 17:65216135 8.3 1081170000 130286822.9
08:24:13.352 INFO ProgressMeter - 18:864 8.5 1110942000 130784710.0
08:24:23.352 INFO ProgressMeter - 18:32617864 8.7 1143559000 132033921.8
08:24:33.353 INFO ProgressMeter - 18:65115864 8.8 1176057000 133222483.6
08:24:43.428 INFO ProgressMeter - 19:225 9.0 1201644000 133579821.5
08:24:53.428 INFO ProgressMeter - 19:32381225 9.2 1234025000 134684088.2
08:25:03.881 INFO ProgressMeter - 2:659 9.3 1263076000 135282464.1
08:25:13.881 INFO ProgressMeter - 2:31527659 9.5 1294603000 136227395.9
08:25:23.882 INFO ProgressMeter - 2:62438659 9.7 1325514000 137076052.0
08:25:33.881 INFO ProgressMeter - 2:93365659 9.8 1356441000 137897576.2
08:25:43.882 INFO ProgressMeter - 2:122582659 10.0 1385658000 138520780.7
08:25:53.881 INFO ProgressMeter - 2:154215659 10.2 1417291000 139361122.3
08:26:07.047 INFO ProgressMeter - 3:435 10.4 1445189000 139103150.7
08:26:17.047 INFO ProgressMeter - 3:29835435 10.6 1475024000 139733232.3
08:26:27.046 INFO ProgressMeter - 3:58830435 10.7 1504019000 140265388.0
08:26:37.046 INFO ProgressMeter - 3:86406435 10.9 1531595000 140650942.8
08:26:47.046 INFO ProgressMeter - 3:114512435 11.1 1559701000 141072811.1
08:26:57.047 INFO ProgressMeter - 3:144284435 11.2 1589473000 141630598.8
08:27:07.046 INFO ProgressMeter - 4:2921755 11.4 1608150000 141197904.5
08:27:17.046 INFO ProgressMeter - 4:30392755 11.6 1635621000 141538681.2
08:27:27.046 INFO ProgressMeter - 4:57949755 11.7 1663178000 141877104.2
08:27:37.046 INFO ProgressMeter - 4:87205755 11.9 1692434000 142348940.2
08:27:47.047 INFO ProgressMeter - 4:116801755 12.1 1722030000 142835932.3
08:27:57.046 INFO ProgressMeter - 4:145181755 12.2 1750410000 143210156.0
08:28:07.046 INFO ProgressMeter - 5:7072639 12.4 1768809000 142768698.9
08:28:17.046 INFO ProgressMeter - 5:35422639 12.6 1797159000 143131490.9
08:28:27.046 INFO ProgressMeter - 5:63692639 12.7 1825429000 143478489.8
08:28:37.047 INFO ProgressMeter - 5:93139639 12.9 1854876000 143907830.8
08:28:47.046 INFO ProgressMeter - 5:120998639 13.1 1882735000 144204580.3
08:28:57.046 INFO ProgressMeter - 5:148366639 13.2 1910103000 144456715.7
08:29:07.046 INFO ProgressMeter - 6:15465955 13.4 1929037000 144072669.8
08:29:17.046 INFO ProgressMeter - 6:43318955 13.6 1956890000 144356004.7
08:29:27.047 INFO ProgressMeter - 6:72935955 13.7 1986507000 144761003.7
08:29:37.046 INFO ProgressMeter - 6:102980955 13.9 2016552000 145187098.0
08:29:47.049 INFO ProgressMeter - 6:117790955 14.1 2031362000 144519037.5
08:29:57.048 INFO ProgressMeter - 6:130950955 14.2 2044522000 143750792.5
08:30:07.048 INFO ProgressMeter - 6:143645955 14.4 2057217000 142968028.4
08:30:17.048 INFO ProgressMeter - 7:1798409 14.6 2065106000 141873017.0
08:30:27.048 INFO ProgressMeter - 7:14575409 14.7 2077883000 141134802.2
08:30:37.048 INFO ProgressMeter - 7:28096409 14.9 2091404000 140463082.7
08:30:47.048 INFO ProgressMeter - 7:40660409 15.1 2103968000 139742672.1
08:30:57.048 INFO ProgressMeter - 7:52770409 15.2 2116078000 139008212.5
08:31:07.048 INFO ProgressMeter - 7:65865409 15.4 2129173000 138353666.7
08:31:17.048 INFO ProgressMeter - 7:80039409 15.6 2143347000 137782508.6
08:31:27.047 INFO ProgressMeter - 7:92826409 15.7 2156134000 137135243.0
08:31:37.051 INFO ProgressMeter - 7:107147409 15.9 2170455000 136597955.4
08:31:47.049 INFO ProgressMeter - 7:121184409 16.1 2184492000 136054276.6
08:31:57.050 INFO ProgressMeter - 7:135327409 16.2 2198635000 135528302.9
08:32:07.049 INFO ProgressMeter - 8:6300950 16.4 2215050000 135151653.2
08:32:17.048 INFO ProgressMeter - 8:28110950 16.6 2236860000 135108449.9
08:32:27.048 INFO ProgressMeter - 8:52371950 16.7 2261121000 135212675.0
08:32:37.048 INFO ProgressMeter - 8:78663950 16.9 2287413000 135435096.2
08:32:47.048 INFO ProgressMeter - 8:106350950 17.1 2315100000 135734959.9
08:32:58.129 INFO ProgressMeter - 9:737 17.2 2338151000 135617970.3
08:33:08.129 INFO ProgressMeter - 9:26819737 17.4 2364970000 135860166.6
08:33:18.129 INFO ProgressMeter - 9:55355737 17.6 2393506000 136195470.0
08:33:28.129 INFO ProgressMeter - 9:83644737 17.7 2421795000 136510550.6
08:33:38.130 INFO ProgressMeter - 9:113849737 17.9 2452000000 136926761.1
08:33:48.129 INFO ProgressMeter - X:10205328 18.1 2472967000 136824176.1
08:33:58.130 INFO ProgressMeter - X:28091328 18.2 2490853000 136554557.9
08:34:08.129 INFO ProgressMeter - X:47179328 18.4 2509941000 136355122.0
08:34:18.129 INFO ProgressMeter - X:71187328 18.6 2533949000 136424150.9
08:34:28.129 INFO ProgressMeter - X:94524328 18.7 2557286000 136456147.6
08:34:38.129 INFO ProgressMeter - X:121344328 18.9 2584106000 136671794.0
08:34:48.130 INFO ProgressMeter - X:141292328 19.1 2604054000 136523391.7
08:34:58.129 INFO ProgressMeter - X:168341328 19.2 2631103000 136746621.5
08:35:08.130 INFO ProgressMeter - Y:7681029 19.4 2641474000 136106653.6
08:35:18.129 INFO ProgressMeter - Y:26299029 19.6 2660092000 135898907.0
08:35:28.130 INFO ProgressMeter - Y:48883029 19.7 2682676000 135895572.9
08:35:38.129 INFO ProgressMeter - Y:72470029 19.9 2706263000 135942677.9
08:35:48.782 INFO ProgressMeter - JH584299.1:331 20.1 2725538000 135700624.7
08:35:52.488 INFO ProgressMeter - JH584295.1:1202 20.1 2730871774 135549334.3
08:35:52.489 INFO ProgressMeter - Traversal complete. Processed 2730871774 total bases in 20.1 minutes.
08:35:52.491 INFO FastaAlternateReferenceMaker - Shutting down engine
[May 15, 2024 8:35:52 AM GMT] org.broadinstitute.hellbender.tools.walkers.fasta.FastaAlternateReferenceMaker done. Elapsed time: 20.18 minutes.
Runtime.totalMemory()=12896436224
-
Looking at your original reference index Mus_musculus.GRCm38.dna_sm.primary_assembly.fa.fai , it seems that the contigs are using the 1, 2, ..., X, Y, MT convention unless I'm missing something. Using the VCF with the "chr" prefix appended might be a potential issue here. As far as I know, this tool also takes an interval list so it might be a good idea to run a quick&small test by using the original primary assembly, unaltered VCF, and a very small known interval where you want to introduce a SNP to the reference.
-
Don't worry, Kheira, I got you.
The problem is the -L bed file. If u specify this argument, the output fasta will be broken. To fix this, first make sure the contigs, 1st col in vcf are all "chr1" format. Then delete the -L argument.
And it will output fa.fai like below:
❯ rg '>' masked.fa
1:>1 chr1:1-248956422
4149260:>2 chr10:1-133797422
6379213:>3 chr11:1-135086622
8630653:>4 chr12:1-133275309
10851906:>5 chr13:1-114364328
12757978:>6 chr14:1-107043718
14542039:>7 chr15:1-101991189
16241891:>8 chr16:1-90338345
17747527:>9 chr17:1-83257441
19135146:>10 chr18:1-80373285
20474705:>11 chr19:1-58617616
21451662:>12 chr2:1-242193529
25488222:>13 chr20:1-64444167
26562301:>14 chr21:1-46709983
27340798:>15 chr22:1-50818468
28187778:>16 chr3:1-198295559
31492703:>17 chr4:1-190214555
34662931:>18 chr5:1-181538259
37688580:>19 chr6:1-170805979
40535367:>20 chr7:1-159345973
43191140:>21 chr8:1-145138636
45610118:>22 chr9:1-138394717
47916706:>23 chrMT:1-16569
47916984:>24 chrX:1-156040895
50517655:>25 chrY:1-57227415
51471447:>26 KI270728.1:1-1872759
51502661:>27 KI270727.1:1-448248
51510133:>28 KI270442.1:1-392061
51516669:>29 KI270729.1:1-280839
51521351:>30 GL000225.1:1-211173
51524872:>31 KI270743.1:1-210658
51528384:>32 GL000008.2:1-209709
51531881:>33 GL000009.2:1-201709
51535244:>34 KI270747.1:1-198735
51538558:>35 KI270722.1:1-194050
51541794:>36 GL000194.1:1-191469
51544987:>37 KI270742.1:1-186739
51548101:>38 GL000205.2:1-185591
51551196:>39 GL000195.1:1-182896
51554246:>40 KI270736.1:1-181920
51557279:>41 KI270733.1:1-179772
51560277:>42 GL000224.1:1-179693
51563273:>43 GL000219.1:1-179198
51566261:>44 KI270719.1:1-176845
51569210:>45 GL000216.2:1-176608
51572155:>46 KI270712.1:1-176043
51575091:>47 KI270706.1:1-175055
51578010:>48 KI270725.1:1-172810
51580892:>49 KI270744.1:1-168472
51583701:>50 KI270734.1:1-165050
51586453:>51 GL000213.1:1-164239
51589192:>52 GL000220.1:1-161802
51591890:>53 KI270715.1:1-161471
51594583:>54 GL000218.1:1-161147
51597270:>55 KI270749.1:1-158759
51599917:>56 KI270741.1:1-157432
51602542:>57 GL000221.1:1-155397
51605133:>58 KI270716.1:1-153799
51607698:>59 KI270731.1:1-150754
51610212:>60 KI270751.1:1-150742
51612726:>61 KI270750.1:1-148850
51615208:>62 KI270519.1:1-138126
51617512:>63 GL000214.1:1-137718
51619809:>64 KI270708.1:1-127682
51621939:>65 KI270730.1:1-112551
51623816:>66 KI270438.1:1-112505
51625693:>67 KI270737.1:1-103838
51627425:>68 KI270721.1:1-100316
51629098:>69 KI270738.1:1-99375
51630756:>70 KI270748.1:1-93321
51632313:>71 KI270435.1:1-92983
51633864:>72 GL000208.1:1-92689
51635410:>73 KI270538.1:1-91309
51636933:>74 KI270756.1:1-79590
51638261:>75 KI270739.1:1-73985
51639496:>76 KI270757.1:1-71251
51640685:>77 KI270709.1:1-66860
51641801:>78 KI270746.1:1-66486
51642911:>79 KI270753.1:1-62944
51643962:>80 KI270589.1:1-44474
51644705:>81 KI270726.1:1-43739
51645435:>82 KI270735.1:1-42811
51646150:>83 KI270711.1:1-42210
51646855:>84 KI270745.1:1-41891
51647555:>85 KI270714.1:1-41717
51648252:>86 KI270732.1:1-41543
51648946:>87 KI270713.1:1-40745
51649627:>88 KI270754.1:1-40191
51650298:>89 KI270710.1:1-40176
51650969:>90 KI270717.1:1-40062
51651638:>91 KI270724.1:1-39555
51652299:>92 KI270720.1:1-39050
51652951:>93 KI270723.1:1-38115
51653588:>94 KI270718.1:1-38054
51654224:>95 KI270317.1:1-37690
51654854:>96 KI270740.1:1-37240
51655476:>97 KI270755.1:1-36723
51656090:>98 KI270707.1:1-32032
51656625:>99 KI270579.1:1-31033
51657144:>100 KI270752.1:1-27745
51657608:>101 KI270512.1:1-22689
51657988:>102 KI270322.1:1-21476
51658347:>103 GL000226.1:1-15008
51658599:>104 KI270311.1:1-12399
51658807:>105 KI270366.1:1-8320
51658947:>106 KI270511.1:1-8127
51659084:>107 KI270448.1:1-7992
51659219:>108 KI270521.1:1-7642
51659348:>109 KI270581.1:1-7046
51659467:>110 KI270582.1:1-6504
51659577:>111 KI270515.1:1-6361
51659685:>112 KI270588.1:1-6158
51659789:>113 KI270591.1:1-5796
51659887:>114 KI270522.1:1-5674
51659983:>115 KI270507.1:1-5353
51660074:>116 KI270590.1:1-4685
51660154:>117 KI270584.1:1-4513
51660231:>118 KI270320.1:1-4416
51660306:>119 KI270382.1:1-4215
51660378:>120 KI270468.1:1-4055
51660447:>121 KI270467.1:1-3920
51660514:>122 KI270362.1:1-3530
51660574:>123 KI270517.1:1-3253
51660630:>124 KI270593.1:1-3041
51660682:>125 KI270528.1:1-2983
51660733:>126 KI270587.1:1-2969
51660784:>127 KI270364.1:1-2855
51660833:>128 KI270371.1:1-2805
51660881:>129 KI270333.1:1-2699
51660927:>130 KI270374.1:1-2656
51660973:>131 KI270411.1:1-2646
51661019:>132 KI270414.1:1-2489
51661062:>133 KI270510.1:1-2415
51661104:>134 KI270390.1:1-2387
51661145:>135 KI270375.1:1-2378
51661186:>136 KI270420.1:1-2321
51661226:>137 KI270509.1:1-2318
51661266:>138 KI270315.1:1-2276
51661305:>139 KI270302.1:1-2274
51661344:>140 KI270518.1:1-2186
51661382:>141 KI270530.1:1-2168
51661420:>142 KI270304.1:1-2165
51661458:>143 KI270418.1:1-2145
51661495:>144 KI270424.1:1-2140
51661532:>145 KI270417.1:1-2043
51661568:>146 KI270508.1:1-1951
51661602:>147 KI270303.1:1-1942
51661636:>148 KI270381.1:1-1930
51661670:>149 KI270529.1:1-1899
51661703:>150 KI270425.1:1-1884
51661736:>151 KI270396.1:1-1880
51661769:>152 KI270363.1:1-1803
51661801:>153 KI270386.1:1-1788
51661832:>154 KI270465.1:1-1774
51661863:>155 KI270383.1:1-1750
51661894:>156 KI270384.1:1-1658
51661923:>157 KI270330.1:1-1652
51661952:>158 KI270372.1:1-1650
51661981:>159 KI270548.1:1-1599
51662009:>160 KI270580.1:1-1553
51662036:>161 KI270387.1:1-1537
51662063:>162 KI270391.1:1-1484
51662089:>163 KI270305.1:1-1472
51662115:>164 KI270373.1:1-1451
51662141:>165 KI270422.1:1-1445
51662167:>166 KI270316.1:1-1444
51662193:>167 KI270340.1:1-1428
51662218:>168 KI270338.1:1-1428
51662243:>169 KI270583.1:1-1400
51662268:>170 KI270334.1:1-1368
51662292:>171 KI270429.1:1-1361
51662316:>172 KI270393.1:1-1308
51662339:>173 KI270516.1:1-1300
51662362:>174 KI270389.1:1-1298
51662385:>175 KI270466.1:1-1233
51662407:>176 KI270388.1:1-1216
51662429:>177 KI270544.1:1-1202
51662451:>178 KI270310.1:1-1201
51662473:>179 KI270412.1:1-1179
51662494:>180 KI270395.1:1-1143
51662515:>181 KI270376.1:1-1136
51662535:>182 KI270337.1:1-1121
51662555:>183 KI270335.1:1-1048
51662574:>184 KI270378.1:1-1048
51662593:>185 KI270379.1:1-1045
51662612:>186 KI270329.1:1-1040
51662631:>187 KI270419.1:1-1029
51662650:>188 KI270336.1:1-1026
51662669:>189 KI270312.1:1-998
51662687:>190 KI270539.1:1-993
51662705:>191 KI270385.1:1-990
51662723:>192 KI270423.1:1-981
51662741:>193 KI270392.1:1-971
51662759:>194 KI270394.1:1-970I think GATK group should add necessary interpretion in FastaAlternateReferenceMaker documentation. "If you want to generate masked fasta for general use, please skip the -L argument."
Wait, it's still not right. We don't need 1,2,3... contigs, It will cause troubles. Although we can use sed to modify it, why dont keeping it original in FastaAlternateReferenceMaker? To use the masked fasta as a drop-in replacement, it is necessary to leave contigs unmodified.
BTW, the sed cmdln to get contigs back in my situation is:
sed 's/>\([0-9]*\) \(.*\):\([0-9]*\)-\([0-9]*\)/>\2/g' output.fa
I hope someone see this.
-
Hi all. This is the default behavior for our FastaAlternateReferenceMaker therefore changing contig names after the operation is necessary depending on the purpose.
If you wish to avoid this behavior you can open an issue in github page of GATK and request a feature.
https://github.com/broadinstitute/gatk
Or alternatively you may try using
bcftools consensus
which does not change sequence names unless a region is provided however keep in mind that it may have a different replacement behavior compared to FastaAlternateReferenceMaker so it is up to you to check the results.
I hope this helps.
EDIT: I just added a PR for this behavior. Please check back to see if this PR is merged.
-
Thanks for your endorsement and great work, and hope we can use it in the upcoming versions.
Please sign in to leave a comment.
4 comments