Launch prediction server Query precalculated results
Algorithms Prediction server Database query Output description Benchmark results

Benchmark result for HIV-1 A-K and CRF01_AE subtypes nucleotide sequences

(a) Number of gene segments per subtype

Gene 01_AE A B C D F G H J K Subtotal
3rev 198 209 3677 755 83 19 51 4 10 2 5008
3tat 193 197 3649 771 94 20 56 3 6 2 4991
5rev 110 33 1942 152 71 5 7 0 1 0 2321
5tat 120 60 2178 202 48 4 3 0 0 1 2616
env 2945 7293 47944 5229 1794 525 912 92 53 70 66857
gag 631 2411 8631 2483 700 137 379 83 32 23 15510
ltr 197 251 5027 654 103 16 36 9 1 8 6302
nef 234 334 6297 744 138 22 39 7 5 22 7842
pol 4687 2933 52002 5136 1288 808 1124 109 59 45 68191
vif 107 58 1106 98 31 6 3 4 1 0 1414
vpr 107 42 1518 44 13 4 3 1 1 0 1733
vpu 148 32 1063 157 14 4 18 2 1 1 1440
Total 9677 13853 135034 16425 4377 1570 2631 314 170 174 184225

(b) Subtype prediction concordance (%) with LANL gold standard

Gene 01_AE A B C D F G H J K Subtotal
3rev 87.9 91.9 99.9 98.7 81.9 78.9 88.2 25.0 10.0 100.0 98.2
3tat 90.2 92.4 99.9 96.5 72.3 75.0 85.7 33.3 66.7 50.0 97.9
5rev 98.2 87.9 99.9 75.0 57.7 40.0 28.6 100.0 96.4
5tat 90.8 95.0 100.0 99.5 89.6 100.0 100.0 0.0 99.2
env 97.8 98.7 99.8 98.6 99.1 74.7 93.4 56.5 45.3 17.1 99.0
gag 89.4 98.1 99.3 98.9 94.7 83.9 98.2 31.3 31.2 43.5 97.7
ltr 99.5 98.0 99.9 99.4 95.1 87.5 91.7 33.3 0.0 12.5 99.4
nef 86.8 98.5 99.9 97.8 71.7 68.2 89.7 42.9 0.0 4.5 98.2
pol 93.5 96.5 99.8 96.0 81.4 92.6 90.9 66.1 55.9 13.3 98.2
vif 100.0 89.7 99.9 100.0 77.4 50.0 100.0 0.0 0.0 98.4
vpr 100.0 100.0 99.9 93.2 100.0 100.0 100.0 0.0 100.0 99.7
vpu 100.0 100.0 99.0 100.0 100.0 100.0 66.7 0.0 100.0 0.0 98.6
Total 94.6 97.8 99.8 97.5 90.4 84.8 92.4 50.3 44.1 19.0 98.5

(c) Confusion table (LANL on the left, MuLDAS at the top)

LANL 01_AE A B C D F G H J K Subtotal
01_AE 9154 51 58 18 7 2 3 1 1 1 9296
A 302 13551 25 103 107 11 124 76 42 13 14354
B 200 84 134771 242 275 104 32 14 11 30 135763
C 1 52 73 16020 22 27 7 15 8 15 16240
D 3 42 48 26 3956 36 3 8 6 40 4168
F 4 4 21 7 2 1331 8 2 2 19 1400
G 11 44 37 4 1 19 2430 26 10 9 2591
H 1 18 1 2 1 13 18 158 15 2 229
J 1 7 2 6 17 6 11 75 12 137
K 1 10 3 33 47
Total 9677 13853 135034 16425 4377 1570 2631 314 170 174 184225

(d) Number of gene segments per subtype after outlier filtering (O <= 2.0)

Gene 01_AE A B C D F G H J K Subtotal
3rev 195 202 3661 755 79 13 38 0 0 2 4945
3tat 189 194 3633 760 82 17 52 0 0 1 4928
5rev 110 32 1921 152 68 3 6 0 0 0 2292
5tat 119 58 2169 202 45 3 3 0 0 0 2599
env 2904 7134 47892 5206 1744 360 698 10 0 3 65951
gag 597 2391 8617 2450 682 103 305 4 0 3 15152
ltr 194 217 4689 630 81 1 4 3 0 1 5820
nef 203 331 6284 743 109 14 30 6 0 1 7721
pol 4326 2770 50539 5041 1170 534 787 9 0 0 65176
vif 107 58 1105 98 31 2 3 0 0 0 1404
vpr 107 42 1510 44 13 2 3 1 0 0 1722
vpu 148 32 1054 157 14 2 13 0 0 0 1420
Total 9199 13461 133074 16238 4118 1054 1942 33 0 11 179130

(e) Subtype prediction concordance (%) with LANL gold standard after outlier filtering (O <= 2.0)

Gene 01_AE A B C D F G H J K Subtotal
3rev 89.2 93.1 99.9 98.7 82.3 92.3 92.1 100.0 98.7
3tat 90.5 92.3 99.9 97.4 78.0 82.4 88.5 100.0 98.3
5rev 98.2 90.6 99.9 75.0 57.4 66.7 33.3 96.6
5tat 91.6 94.8 100.0 99.5 93.3 100.0 100.0 99.3
env 97.9 98.7 99.8 98.6 99.0 83.9 97.1 90.0 100.0 99.4
gag 91.0 98.2 99.4 98.9 95.3 85.4 99.0 75.0 66.7 98.5
ltr 99.5 98.6 99.9 99.5 97.5 100.0 100.0 100.0 100.0 99.8
nef 98.5 98.8 99.9 98.0 85.3 92.9 96.7 50.0 100.0 99.4
pol 96.0 97.0 99.8 96.0 84.4 95.7 93.9 100.0 98.8
vif 100.0 89.7 99.9 100.0 77.4 100.0 100.0 99.0
vpr 100.0 100.0 99.9 93.2 100.0 100.0 100.0 0.0 99.7
vpu 100.0 100.0 99.1 100.0 100.0 100.0 92.3 99.2
Total 96.3 98.0 99.8 97.6 92.2 90.3 95.6 81.8 90.9 99.0

(f) Confusion table (LANL on the left, MuLDAS at the top) after outlier filtering (O <= 2.0)

LANL 01_AE A B C D F G H J K Subtotal
01_AE 8858 47 54 17 5 2 8983
A 166 13193 21 96 94 2 60 1 13633
B 161 77 132832 240 210 56 8 1 133585
C 1 47 66 15845 6 14 1 1 15981
D 2 35 44 24 3798 5 2 1 3911
F 2 3 21 7 2 952 5 1 1 994
G 9 36 36 4 9 1856 1950
H 16 2 1 4 6 27 56
J 7 2 2 6 2 1 20
K 1 6 10 17
Total 9199 13461 133074 16238 4118 1054 1942 33 11 179130

 

Benchmark result for HIV-1 A-K and CRF01_AE subtypes protein sequences

(a) Number of gene segments per subtype

Gene 01_AE A B C D F G H J K Subtotal
env 2778 5990 43875 4884 1674 453 873 83 73 50 60733
gag 597 2045 7367 2745 696 155 361 48 25 32 14071
nef 181 156 3992 421 74 16 31 8 0 2 4881
pol 4891 2324 46947 4647 1816 956 835 210 85 215 62926
rev 156 45 675 474 8 2 0 0 0 0 1360
tat 111 64 1573 215 36 3 5 0 1 0 2008
vif 115 19 803 89 57 3 32 0 2 0 1120
vpr 112 41 1379 58 58 4 9 1 0 0 1662
vpu 148 48 903 157 14 2 13 0 1 0 1286
Total 9089 10732 107514 13690 4433 1594 2159 350 187 299 150047

(b) Subtype prediction concordance (%) with LANL gold standard

Gene 01_AE A B C D F G H J K Subtotal
env 97.3 97.1 99.6 96.1 96.4 66.9 84.9 39.8 13.7 22.0 98.2
gag 83.8 94.7 98.0 81.7 78.6 60.0 83.9 18.8 0.0 6.2 91.4
nef 97.8 94.9 99.7 94.5 75.7 50.0 54.8 12.5 50.0 98.1
pol 82.7 78.0 99.5 91.5 49.3 67.2 73.8 21.4 15.3 0.5 93.8
rev 71.8 97.8 100.0 99.6 50.0 100.0 96.2
tat 100.0 98.4 99.8 99.5 97.2 100.0 60.0 100.0 99.6
vif 100.0 100.0 100.0 100.0 43.9 100.0 9.4 50.0 94.5
vpr 93.8 92.7 98.9 65.5 8.6 0.0 33.3 0.0 93.4
vpu 92.6 87.5 99.8 94.9 78.6 100.0 92.3 100.0 97.6
Total 88.1 92.4 99.5 91.7 72.0 66.2 78.6 25.1 13.9 5.0 95.6

(c) Confusion table (LANL on the left, MuLDAS at the top)

LANL 01_AE A B C D F G H J K Subtotal
01_AE 8004 200 53 39 1 6 40 13 5 3 8364
A 716 9918 95 253 108 98 250 106 26 34 11604
B 192 102 106947 607 1073 269 45 78 55 154 109522
C 79 89 83 12549 26 41 41 14 17 9 12948
D 15 36 184 51 3191 17 10 10 45 12 3571
F 7 20 76 52 8 1056 37 5 2 50 1313
G 74 292 55 100 15 74 1698 24 10 16 2358
H 1 59 12 9 6 13 15 88 1 2 206
J 15 8 23 5 10 16 12 26 4 119
K 1 1 1 7 10 7 15 42
Total 9089 10732 107514 13690 4433 1594 2159 350 187 299 150047

(d) Number of gene segments per subtype after outlier filtering (O <= 2.0)

Gene 01_AE A B C D F G H J K Subtotal
env 2686 5788 43810 4854 1567 323 629 8 0 5 59670
gag 563 2025 7338 2723 668 103 289 6 0 7 13722
nef 174 154 3988 419 67 11 20 3 0 2 4838
pol 4457 2120 43934 4443 1630 529 562 23 0 3 57701
rev 156 44 670 474 8 2 0 0 0 0 1354
tat 111 64 1567 215 35 3 3 0 0 0 1998
vif 0 0 733 0 0 0 12 0 0 0 745
vpr 111 40 1378 58 55 4 7 0 0 0 1653
vpu 146 47 901 157 13 1 12 0 0 0 1277
Total 8404 10282 104319 13343 4043 976 1534 40 0 17 142958

(e) Subtype prediction concordance (%) with LANL gold standard after outlier filtering (O <= 2.0)

Gene 01_AE A B C D F G H J K Subtotal
env 97.4 97.2 99.6 96.1 96.7 69.7 87.0 75.0 100.0 98.6
gag 86.0 95.1 98.1 81.7 78.4 58.3 86.9 0.0 14.3 92.3
nef 97.7 94.8 99.7 95.0 80.6 54.5 85.0 33.3 50.0 98.6
pol 84.8 78.7 99.5 92.4 51.9 71.3 78.6 43.5 0.0 95.2
rev 71.8 97.7 100.0 99.6 50.0 100.0 96.2
tat 100.0 98.4 99.8 99.5 100.0 100.0 66.7 99.7
vif 100.0 25.0 98.8
vpr 94.6 95.0 98.9 65.5 9.1 0.0 42.9 93.9
vpu 92.5 89.4 99.8 94.9 84.6 100.0 100.0 97.8
Total 89.4 92.9 99.5 91.9 74.1 69.1 83.2 42.5 41.2 96.6

(f) Confusion table (LANL on the left, MuLDAS at the top) after outlier filtering (O <= 2.0)

LANL 01_AE A B C D F G H J K Subtotal
01_AE 7514 184 51 38 1 6 20 1 1 7816
A 600 9551 90 233 99 67 142 5 4 10791
B 147 80 103779 584 898 152 25 12 4 105681
C 56 79 79 12266 21 14 18 1 12534
D 13 30 173 49 2994 9 8 2 1 3279
F 5 18 72 49 6 674 24 2 850
G 67 274 55 95 14 40 1277 1822
H 1 52 12 9 5 3 10 17 109
J 14 7 13 5 5 6 50
K 1 1 7 6 4 7 26
Total 8404 10282 104319 13343 4043 976 1534 40 17 142958

 

Benchmark result for HCV nucleotide sequences

(a) Number of gene segments per genotype

Gene 1 2 3 4 5 6 Subtotal
3utr 0 0 0 0 0 0 0
5utr 1665 314 621 386 71 238 3295
arfp 2802 310 580 235 24 435 4386
core 4088 461 721 283 31 492 6076
e1 15697 1805 2517 1031 137 341 21528
e2 18996 1705 2511 418 89 388 24107
ns2 1909 11 16 20 1 46 2003
ns3 2765 64 271 22 107 46 3275
ns4a 565 18 198 18 108 43 950
ns4b 746 13 39 20 108 42 968
ns5a 5487 51 259 20 1 53 5871
ns5b 3213 519 523 443 122 334 5154
okamoto 2913 471 429 403 119 246 4581
p7 2142 9 16 20 1 44 2232
Total 62988 5751 8701 3319 919 2748 84426

(b) Genotype prediction concordance (%) with LANL gold standard

Gene 1 2 3 4 5 6 Subtotal
3utr
5utr 87.4 94.3 96.5 90.9 77.5 68.1 88.6
arfp 99.8 99.0 99.7 99.6 87.5 99.3 99.6
core 99.5 98.7 99.6 95.4 96.8 99.0 99.2
e1 96.3 93.3 89.4 100.0 100.0 99.4 95.5
e2 93.4 66.3 92.6 92.1 62.9 34.5 90.4
ns2 100.0 100.0 100.0 100.0 100.0 100.0 100.0
ns3 100.0 100.0 100.0 100.0 100.0 100.0 100.0
ns4a 100.0 88.9 100.0 100.0 100.0 95.3 99.6
ns4b 100.0 84.6 100.0 100.0 100.0 97.6 99.7
ns5a 100.0 98.0 100.0 100.0 100.0 100.0 100.0
ns5b 99.9 87.1 95.8 99.3 98.4 99.7 98.1
okamoto 99.8 85.8 95.3 99.8 100.0 100.0 98.0
p7 100.0 100.0 100.0 100.0 100.0 100.0 100.0
Total 96.7 85.0 94.0 97.4 94.0 87.5 95.3

(c) Confusion table (LANL on the left, MuLDAS at the top)

LANL 1 2 3 4 5 6 Subtotal
1 60919 748 69 28 17 72 61853
2 592 4890 58 28 13 52 5633
3 24 33 8179 28 7 11 8282
4 1260 63 183 3233 14 18 4771
5 24 11 180 1 864 191 1271
6 169 6 32 1 4 2404 2616
Total 62988 5751 8701 3319 919 2748 84426

(d) Number of gene segments per genotype after outlier filtering (O <= 2.0)

Gene 1 2 3 4 5 6 Subtotal
3utr 0 0 0 0 0 0 0
5utr 1472 236 387 0 0 152 2247
arfp 2790 304 578 143 3 396 4214
core 4081 456 692 261 15 476 5981
e1 15682 1673 2508 886 78 339 21166
e2 18793 1545 2043 0 8 340 22729
ns2 1877 11 16 0 1 38 1943
ns3 2763 64 269 0 4 45 3145
ns4a 565 17 156 0 37 21 796
ns4b 743 13 33 0 3 42 834
ns5a 5470 51 220 0 1 53 5795
ns5b 3169 448 481 4 2 317 4421
okamoto 2913 470 429 403 117 244 4576
p7 2142 9 10 0 1 34 2196
Total 62460 5297 7822 1697 270 2497 80043

(e) Genotype prediction concordance (%) with LANL gold standard after outlier filtering (O <= 2.0)

Gene 1 2 3 4 5 6 Subtotal
3utr
5utr 89.3 96.2 97.9 82.2 91.0
arfp 99.8 99.3 99.7 99.3 100.0 99.2 99.7
core 99.6 98.9 99.6 96.9 100.0 98.9 99.4
e1 96.3 92.8 89.6 100.0 100.0 99.4 95.5
e2 93.9 65.1 93.6 87.5 37.9 91.1
ns2 100.0 100.0 100.0 100.0 100.0 100.0
ns3 100.0 100.0 100.0 100.0 100.0 100.0
ns4a 100.0 94.1 100.0 100.0 95.2 99.7
ns4b 100.0 84.6 100.0 100.0 97.6 99.6
ns5a 100.0 98.0 100.0 100.0 100.0 100.0
ns5b 99.9 85.3 96.3 75.0 100.0 99.7 98.0
okamoto 99.8 85.7 95.3 99.8 100.0 100.0 98.0
p7 100.0 100.0 100.0 100.0 100.0 100.0
Total 96.9 84.7 94.4 99.4 99.6 89.9 95.7

(f) Confusion table (LANL on the left, MuLDAS at the top) after outlier filtering (O <= 2.0)

LANL 1 2 3 4 5 6 Subtotal
1 60546 721 54 9 32 61362
2 500 4484 10 2 1 21 5018
3 15 28 7381 5 7429
4 1238 50 172 1686 9 3155
5 18 11 177 269 184 659
6 143 3 28 2246 2420
Total 62460 5297 7822 1697 270 2497 80043

 

Benchmark result for HCV protein sequences

(a) Number of gene segments per genotype

Gene 1 2 3 4 5 6 Subtotal
arfp 159 2 8 5 2 5 181
core 2941 323 516 221 33 410 4444
e1 13524 1384 1690 871 115 332 17916
e2 14538 1226 1656 440 107 147 18114
ns2 1859 415 104 74 10 129 2591
ns3 2686 63 270 22 85 55 3181
ns4a 536 18 197 26 88 40 905
ns4b 675 13 39 20 95 43 885
ns5a 5035 51 259 20 1 53 5419
ns5b 2649 504 491 395 122 315 4476
okamoto 2402 456 397 357 119 231 3962
p7 1135 39 19 20 1 43 1257
Total 48139 4494 5646 2471 778 1803 63331

(b) Genotype prediction concordance (%) with LANL gold standard

Gene 1 2 3 4 5 6 Subtotal
arfp 100.0 100.0 100.0 100.0 100.0 100.0 100.0
core 98.6 98.5 96.9 88.7 81.8 99.0 97.8
e1 95.7 93.4 95.0 80.0 99.1 99.1 94.8
e2 93.6 58.2 90.9 98.0 11.2 89.8 90.6
ns2 65.7 20.2 17.3 28.4 10.0 35.7 53.7
ns3 100.0 100.0 100.0 100.0 100.0 83.6 99.7
ns4a 99.6 88.9 100.0 76.9 100.0 100.0 98.9
ns4b 100.0 84.6 100.0 100.0 100.0 95.3 99.5
ns5a 100.0 98.0 100.0 100.0 100.0 100.0 100.0
ns5b 99.9 89.3 99.0 99.7 98.4 99.7 98.5
okamoto 99.8 88.2 99.2 100.0 100.0 99.6 98.4
p7 100.0 89.7 84.2 100.0 100.0 100.0 99.4
Total 95.4 76.5 93.9 89.2 85.5 93.5 93.5

(c) Confusion table (LANL on the left, MuLDAS at the top)

LANL 1 2 3 4 5 6 Subtotal
1 45941 823 56 7 76 90 46993
2 280 3437 74 9 15 8 3823
3 205 62 5299 71 12 4 5653
4 1606 16 184 2203 8 1 4018
5 101 142 27 179 665 15 1129
6 6 14 6 2 2 1685 1715
Total 48139 4494 5646 2471 778 1803 63331

(d) Number of gene segments per genotype after outlier filtering (O <= 2.0)

Gene 1 2 3 4 5 6 Subtotal
arfp 153 2 4 1 0 5 165
core 2917 313 504 206 24 391 4355
e1 13516 1374 1687 602 28 327 17534
e2 14316 1084 894 0 2 114 16410
ns2 923 10 12 0 1 38 984
ns3 2676 63 236 0 21 36 3032
ns4a 533 15 193 0 40 30 811
ns4b 670 13 28 0 22 41 774
ns5a 4993 47 249 0 1 53 5343
ns5b 2603 415 376 4 5 257 3660
okamoto 2400 455 397 353 118 230 3953
p7 1133 8 13 0 1 32 1187
Total 46833 3799 4593 1166 263 1554 58208

(e) Genotype prediction concordance (%) with LANL gold standard after outlier filtering (O <= 2.0)

Gene 1 2 3 4 5 6 Subtotal
arfp 100.0 100.0 100.0 100.0 100.0 100.0
core 98.8 98.4 97.0 90.8 83.3 99.2 98.1
e1 95.7 93.3 95.0 100.0 100.0 99.4 95.7
e2 94.3 59.8 92.7 50.0 93.9 91.9
ns2 100.0 100.0 100.0 100.0 100.0 100.0
ns3 100.0 100.0 100.0 100.0 97.2 99.9
ns4a 99.8 93.3 100.0 100.0 100.0 99.8
ns4b 100.0 84.6 100.0 100.0 97.6 99.6
ns5a 100.0 97.9 100.0 100.0 100.0 100.0
ns5b 99.9 88.2 99.2 100.0 100.0 100.0 98.5
okamoto 99.8 88.1 99.2 100.0 100.0 100.0 98.5
p7 100.0 100.0 100.0 100.0 100.0 100.0
Total 96.9 83.2 96.3 98.4 98.1 99.1 96.1

(f) Confusion table (LANL on the left, MuLDAS at the top) after outlier filtering (O <= 2.0)

LANL 1 2 3 4 5 6 Subtotal
1 45388 528 19 3 6 45944
2 232 3159 2 1 3394
3 20 51 4423 18 3 4515
4 1165 14 130 1147 2 1 2459
5 26 46 17 258 4 351
6 2 1 2 1540 1545
Total 46833 3799 4593 1166 263 1554 58208

 

(The nucleotide/protein sequence and genotype/subtype data were downloaded from GenBank/GenPept and LANL, respectively)


Copyrighted by Bio-Data Mining Lab, Department of Bioinformatics and Life Sciences, Soongsil University, Seoul, Korea