Benchmark result for HIV-1 A-K and CRF01_AE subtypes nucleotide sequences
(a) Number of gene segments per subtype
Gene |
01_AE |
A |
B |
C |
D |
F |
G |
H |
J |
K |
Subtotal |
3rev |
198 |
209 |
3677 |
755 |
83 |
19 |
51 |
4 |
10 |
2 |
5008 |
3tat |
193 |
197 |
3649 |
771 |
94 |
20 |
56 |
3 |
6 |
2 |
4991 |
5rev |
110 |
33 |
1942 |
152 |
71 |
5 |
7 |
0 |
1 |
0 |
2321 |
5tat |
120 |
60 |
2178 |
202 |
48 |
4 |
3 |
0 |
0 |
1 |
2616 |
env |
2945 |
7293 |
47944 |
5229 |
1794 |
525 |
912 |
92 |
53 |
70 |
66857 |
gag |
631 |
2411 |
8631 |
2483 |
700 |
137 |
379 |
83 |
32 |
23 |
15510 |
ltr |
197 |
251 |
5027 |
654 |
103 |
16 |
36 |
9 |
1 |
8 |
6302 |
nef |
234 |
334 |
6297 |
744 |
138 |
22 |
39 |
7 |
5 |
22 |
7842 |
pol |
4687 |
2933 |
52002 |
5136 |
1288 |
808 |
1124 |
109 |
59 |
45 |
68191 |
vif |
107 |
58 |
1106 |
98 |
31 |
6 |
3 |
4 |
1 |
0 |
1414 |
vpr |
107 |
42 |
1518 |
44 |
13 |
4 |
3 |
1 |
1 |
0 |
1733 |
vpu |
148 |
32 |
1063 |
157 |
14 |
4 |
18 |
2 |
1 |
1 |
1440 |
Total |
9677 |
13853 |
135034 |
16425 |
4377 |
1570 |
2631 |
314 |
170 |
174 |
184225 |
(b) Subtype prediction concordance (%) with LANL gold standard
Gene |
01_AE |
A |
B |
C |
D |
F |
G |
H |
J |
K |
Subtotal |
3rev |
87.9 |
91.9 |
99.9 |
98.7 |
81.9 |
78.9 |
88.2 |
25.0 |
10.0 |
100.0 |
98.2 |
3tat |
90.2 |
92.4 |
99.9 |
96.5 |
72.3 |
75.0 |
85.7 |
33.3 |
66.7 |
50.0 |
97.9 |
5rev |
98.2 |
87.9 |
99.9 |
75.0 |
57.7 |
40.0 |
28.6 |
|
100.0 |
|
96.4 |
5tat |
90.8 |
95.0 |
100.0 |
99.5 |
89.6 |
100.0 |
100.0 |
|
|
0.0 |
99.2 |
env |
97.8 |
98.7 |
99.8 |
98.6 |
99.1 |
74.7 |
93.4 |
56.5 |
45.3 |
17.1 |
99.0 |
gag |
89.4 |
98.1 |
99.3 |
98.9 |
94.7 |
83.9 |
98.2 |
31.3 |
31.2 |
43.5 |
97.7 |
ltr |
99.5 |
98.0 |
99.9 |
99.4 |
95.1 |
87.5 |
91.7 |
33.3 |
0.0 |
12.5 |
99.4 |
nef |
86.8 |
98.5 |
99.9 |
97.8 |
71.7 |
68.2 |
89.7 |
42.9 |
0.0 |
4.5 |
98.2 |
pol |
93.5 |
96.5 |
99.8 |
96.0 |
81.4 |
92.6 |
90.9 |
66.1 |
55.9 |
13.3 |
98.2 |
vif |
100.0 |
89.7 |
99.9 |
100.0 |
77.4 |
50.0 |
100.0 |
0.0 |
0.0 |
|
98.4 |
vpr |
100.0 |
100.0 |
99.9 |
93.2 |
100.0 |
100.0 |
100.0 |
0.0 |
100.0 |
|
99.7 |
vpu |
100.0 |
100.0 |
99.0 |
100.0 |
100.0 |
100.0 |
66.7 |
0.0 |
100.0 |
0.0 |
98.6 |
Total |
94.6 |
97.8 |
99.8 |
97.5 |
90.4 |
84.8 |
92.4 |
50.3 |
44.1 |
19.0 |
98.5 |
(c) Confusion table (LANL on the left, MuLDAS at the top)
LANL |
01_AE |
A |
B |
C |
D |
F |
G |
H |
J |
K |
Subtotal |
01_AE |
9154 |
51 |
58 |
18 |
7 |
2 |
3 |
1 |
1 |
1 |
9296 |
A |
302 |
13551 |
25 |
103 |
107 |
11 |
124 |
76 |
42 |
13 |
14354 |
B |
200 |
84 |
134771 |
242 |
275 |
104 |
32 |
14 |
11 |
30 |
135763 |
C |
1 |
52 |
73 |
16020 |
22 |
27 |
7 |
15 |
8 |
15 |
16240 |
D |
3 |
42 |
48 |
26 |
3956 |
36 |
3 |
8 |
6 |
40 |
4168 |
F |
4 |
4 |
21 |
7 |
2 |
1331 |
8 |
2 |
2 |
19 |
1400 |
G |
11 |
44 |
37 |
4 |
1 |
19 |
2430 |
26 |
10 |
9 |
2591 |
H |
1 |
18 |
1 |
2 |
1 |
13 |
18 |
158 |
15 |
2 |
229 |
J |
1 |
7 |
|
2 |
6 |
17 |
6 |
11 |
75 |
12 |
137 |
K |
|
|
|
1 |
|
10 |
|
3 |
|
33 |
47 |
Total |
9677 |
13853 |
135034 |
16425 |
4377 |
1570 |
2631 |
314 |
170 |
174 |
184225 |
(d) Number of gene segments per subtype after outlier filtering (O <= 2.0)
Gene |
01_AE |
A |
B |
C |
D |
F |
G |
H |
J |
K |
Subtotal |
3rev |
195 |
202 |
3661 |
755 |
79 |
13 |
38 |
0 |
0 |
2 |
4945 |
3tat |
189 |
194 |
3633 |
760 |
82 |
17 |
52 |
0 |
0 |
1 |
4928 |
5rev |
110 |
32 |
1921 |
152 |
68 |
3 |
6 |
0 |
0 |
0 |
2292 |
5tat |
119 |
58 |
2169 |
202 |
45 |
3 |
3 |
0 |
0 |
0 |
2599 |
env |
2904 |
7134 |
47892 |
5206 |
1744 |
360 |
698 |
10 |
0 |
3 |
65951 |
gag |
597 |
2391 |
8617 |
2450 |
682 |
103 |
305 |
4 |
0 |
3 |
15152 |
ltr |
194 |
217 |
4689 |
630 |
81 |
1 |
4 |
3 |
0 |
1 |
5820 |
nef |
203 |
331 |
6284 |
743 |
109 |
14 |
30 |
6 |
0 |
1 |
7721 |
pol |
4326 |
2770 |
50539 |
5041 |
1170 |
534 |
787 |
9 |
0 |
0 |
65176 |
vif |
107 |
58 |
1105 |
98 |
31 |
2 |
3 |
0 |
0 |
0 |
1404 |
vpr |
107 |
42 |
1510 |
44 |
13 |
2 |
3 |
1 |
0 |
0 |
1722 |
vpu |
148 |
32 |
1054 |
157 |
14 |
2 |
13 |
0 |
0 |
0 |
1420 |
Total |
9199 |
13461 |
133074 |
16238 |
4118 |
1054 |
1942 |
33 |
0 |
11 |
179130 |
(e) Subtype prediction concordance (%) with LANL gold standard after outlier filtering (O <= 2.0)
Gene |
01_AE |
A |
B |
C |
D |
F |
G |
H |
J |
K |
Subtotal |
3rev |
89.2 |
93.1 |
99.9 |
98.7 |
82.3 |
92.3 |
92.1 |
|
|
100.0 |
98.7 |
3tat |
90.5 |
92.3 |
99.9 |
97.4 |
78.0 |
82.4 |
88.5 |
|
|
100.0 |
98.3 |
5rev |
98.2 |
90.6 |
99.9 |
75.0 |
57.4 |
66.7 |
33.3 |
|
|
|
96.6 |
5tat |
91.6 |
94.8 |
100.0 |
99.5 |
93.3 |
100.0 |
100.0 |
|
|
|
99.3 |
env |
97.9 |
98.7 |
99.8 |
98.6 |
99.0 |
83.9 |
97.1 |
90.0 |
|
100.0 |
99.4 |
gag |
91.0 |
98.2 |
99.4 |
98.9 |
95.3 |
85.4 |
99.0 |
75.0 |
|
66.7 |
98.5 |
ltr |
99.5 |
98.6 |
99.9 |
99.5 |
97.5 |
100.0 |
100.0 |
100.0 |
|
100.0 |
99.8 |
nef |
98.5 |
98.8 |
99.9 |
98.0 |
85.3 |
92.9 |
96.7 |
50.0 |
|
100.0 |
99.4 |
pol |
96.0 |
97.0 |
99.8 |
96.0 |
84.4 |
95.7 |
93.9 |
100.0 |
|
|
98.8 |
vif |
100.0 |
89.7 |
99.9 |
100.0 |
77.4 |
100.0 |
100.0 |
|
|
|
99.0 |
vpr |
100.0 |
100.0 |
99.9 |
93.2 |
100.0 |
100.0 |
100.0 |
0.0 |
|
|
99.7 |
vpu |
100.0 |
100.0 |
99.1 |
100.0 |
100.0 |
100.0 |
92.3 |
|
|
|
99.2 |
Total |
96.3 |
98.0 |
99.8 |
97.6 |
92.2 |
90.3 |
95.6 |
81.8 |
|
90.9 |
99.0 |
(f) Confusion table (LANL on the left, MuLDAS at the top) after outlier filtering (O <= 2.0)
LANL |
01_AE |
A |
B |
C |
D |
F |
G |
H |
J |
K |
Subtotal |
01_AE |
8858 |
47 |
54 |
17 |
5 |
|
2 |
|
|
|
8983 |
A |
166 |
13193 |
21 |
96 |
94 |
2 |
60 |
1 |
|
|
13633 |
B |
161 |
77 |
132832 |
240 |
210 |
56 |
8 |
1 |
|
|
133585 |
C |
1 |
47 |
66 |
15845 |
6 |
14 |
1 |
1 |
|
|
15981 |
D |
2 |
35 |
44 |
24 |
3798 |
5 |
2 |
1 |
|
|
3911 |
F |
2 |
3 |
21 |
7 |
2 |
952 |
5 |
1 |
|
1 |
994 |
G |
9 |
36 |
36 |
4 |
|
9 |
1856 |
|
|
|
1950 |
H |
|
16 |
|
2 |
1 |
4 |
6 |
27 |
|
|
56 |
J |
|
7 |
|
2 |
2 |
6 |
2 |
1 |
|
|
20 |
K |
|
|
|
1 |
|
6 |
|
|
|
10 |
17 |
Total |
9199 |
13461 |
133074 |
16238 |
4118 |
1054 |
1942 |
33 |
|
11 |
179130 |
Benchmark result for HIV-1 A-K and CRF01_AE subtypes protein sequences
(a) Number of gene segments per subtype
Gene |
01_AE |
A |
B |
C |
D |
F |
G |
H |
J |
K |
Subtotal |
env |
2778 |
5990 |
43875 |
4884 |
1674 |
453 |
873 |
83 |
73 |
50 |
60733 |
gag |
597 |
2045 |
7367 |
2745 |
696 |
155 |
361 |
48 |
25 |
32 |
14071 |
nef |
181 |
156 |
3992 |
421 |
74 |
16 |
31 |
8 |
0 |
2 |
4881 |
pol |
4891 |
2324 |
46947 |
4647 |
1816 |
956 |
835 |
210 |
85 |
215 |
62926 |
rev |
156 |
45 |
675 |
474 |
8 |
2 |
0 |
0 |
0 |
0 |
1360 |
tat |
111 |
64 |
1573 |
215 |
36 |
3 |
5 |
0 |
1 |
0 |
2008 |
vif |
115 |
19 |
803 |
89 |
57 |
3 |
32 |
0 |
2 |
0 |
1120 |
vpr |
112 |
41 |
1379 |
58 |
58 |
4 |
9 |
1 |
0 |
0 |
1662 |
vpu |
148 |
48 |
903 |
157 |
14 |
2 |
13 |
0 |
1 |
0 |
1286 |
Total |
9089 |
10732 |
107514 |
13690 |
4433 |
1594 |
2159 |
350 |
187 |
299 |
150047 |
(b) Subtype prediction concordance (%) with LANL gold standard
Gene |
01_AE |
A |
B |
C |
D |
F |
G |
H |
J |
K |
Subtotal |
env |
97.3 |
97.1 |
99.6 |
96.1 |
96.4 |
66.9 |
84.9 |
39.8 |
13.7 |
22.0 |
98.2 |
gag |
83.8 |
94.7 |
98.0 |
81.7 |
78.6 |
60.0 |
83.9 |
18.8 |
0.0 |
6.2 |
91.4 |
nef |
97.8 |
94.9 |
99.7 |
94.5 |
75.7 |
50.0 |
54.8 |
12.5 |
|
50.0 |
98.1 |
pol |
82.7 |
78.0 |
99.5 |
91.5 |
49.3 |
67.2 |
73.8 |
21.4 |
15.3 |
0.5 |
93.8 |
rev |
71.8 |
97.8 |
100.0 |
99.6 |
50.0 |
100.0 |
|
|
|
|
96.2 |
tat |
100.0 |
98.4 |
99.8 |
99.5 |
97.2 |
100.0 |
60.0 |
|
100.0 |
|
99.6 |
vif |
100.0 |
100.0 |
100.0 |
100.0 |
43.9 |
100.0 |
9.4 |
|
50.0 |
|
94.5 |
vpr |
93.8 |
92.7 |
98.9 |
65.5 |
8.6 |
0.0 |
33.3 |
0.0 |
|
|
93.4 |
vpu |
92.6 |
87.5 |
99.8 |
94.9 |
78.6 |
100.0 |
92.3 |
|
100.0 |
|
97.6 |
Total |
88.1 |
92.4 |
99.5 |
91.7 |
72.0 |
66.2 |
78.6 |
25.1 |
13.9 |
5.0 |
95.6 |
(c) Confusion table (LANL on the left, MuLDAS at the top)
LANL |
01_AE |
A |
B |
C |
D |
F |
G |
H |
J |
K |
Subtotal |
01_AE |
8004 |
200 |
53 |
39 |
1 |
6 |
40 |
13 |
5 |
3 |
8364 |
A |
716 |
9918 |
95 |
253 |
108 |
98 |
250 |
106 |
26 |
34 |
11604 |
B |
192 |
102 |
106947 |
607 |
1073 |
269 |
45 |
78 |
55 |
154 |
109522 |
C |
79 |
89 |
83 |
12549 |
26 |
41 |
41 |
14 |
17 |
9 |
12948 |
D |
15 |
36 |
184 |
51 |
3191 |
17 |
10 |
10 |
45 |
12 |
3571 |
F |
7 |
20 |
76 |
52 |
8 |
1056 |
37 |
5 |
2 |
50 |
1313 |
G |
74 |
292 |
55 |
100 |
15 |
74 |
1698 |
24 |
10 |
16 |
2358 |
H |
1 |
59 |
12 |
9 |
6 |
13 |
15 |
88 |
1 |
2 |
206 |
J |
|
15 |
8 |
23 |
5 |
10 |
16 |
12 |
26 |
4 |
119 |
K |
1 |
1 |
1 |
7 |
|
10 |
7 |
|
|
15 |
42 |
Total |
9089 |
10732 |
107514 |
13690 |
4433 |
1594 |
2159 |
350 |
187 |
299 |
150047 |
(d) Number of gene segments per subtype after outlier filtering (O <= 2.0)
Gene |
01_AE |
A |
B |
C |
D |
F |
G |
H |
J |
K |
Subtotal |
env |
2686 |
5788 |
43810 |
4854 |
1567 |
323 |
629 |
8 |
0 |
5 |
59670 |
gag |
563 |
2025 |
7338 |
2723 |
668 |
103 |
289 |
6 |
0 |
7 |
13722 |
nef |
174 |
154 |
3988 |
419 |
67 |
11 |
20 |
3 |
0 |
2 |
4838 |
pol |
4457 |
2120 |
43934 |
4443 |
1630 |
529 |
562 |
23 |
0 |
3 |
57701 |
rev |
156 |
44 |
670 |
474 |
8 |
2 |
0 |
0 |
0 |
0 |
1354 |
tat |
111 |
64 |
1567 |
215 |
35 |
3 |
3 |
0 |
0 |
0 |
1998 |
vif |
0 |
0 |
733 |
0 |
0 |
0 |
12 |
0 |
0 |
0 |
745 |
vpr |
111 |
40 |
1378 |
58 |
55 |
4 |
7 |
0 |
0 |
0 |
1653 |
vpu |
146 |
47 |
901 |
157 |
13 |
1 |
12 |
0 |
0 |
0 |
1277 |
Total |
8404 |
10282 |
104319 |
13343 |
4043 |
976 |
1534 |
40 |
0 |
17 |
142958 |
(e) Subtype prediction concordance (%) with LANL gold standard after outlier filtering (O <= 2.0)
Gene |
01_AE |
A |
B |
C |
D |
F |
G |
H |
J |
K |
Subtotal |
env |
97.4 |
97.2 |
99.6 |
96.1 |
96.7 |
69.7 |
87.0 |
75.0 |
|
100.0 |
98.6 |
gag |
86.0 |
95.1 |
98.1 |
81.7 |
78.4 |
58.3 |
86.9 |
0.0 |
|
14.3 |
92.3 |
nef |
97.7 |
94.8 |
99.7 |
95.0 |
80.6 |
54.5 |
85.0 |
33.3 |
|
50.0 |
98.6 |
pol |
84.8 |
78.7 |
99.5 |
92.4 |
51.9 |
71.3 |
78.6 |
43.5 |
|
0.0 |
95.2 |
rev |
71.8 |
97.7 |
100.0 |
99.6 |
50.0 |
100.0 |
|
|
|
|
96.2 |
tat |
100.0 |
98.4 |
99.8 |
99.5 |
100.0 |
100.0 |
66.7 |
|
|
|
99.7 |
vif |
|
|
100.0 |
|
|
|
25.0 |
|
|
|
98.8 |
vpr |
94.6 |
95.0 |
98.9 |
65.5 |
9.1 |
0.0 |
42.9 |
|
|
|
93.9 |
vpu |
92.5 |
89.4 |
99.8 |
94.9 |
84.6 |
100.0 |
100.0 |
|
|
|
97.8 |
Total |
89.4 |
92.9 |
99.5 |
91.9 |
74.1 |
69.1 |
83.2 |
42.5 |
|
41.2 |
96.6 |
(f) Confusion table (LANL on the left, MuLDAS at the top) after outlier filtering (O <= 2.0)
LANL |
01_AE |
A |
B |
C |
D |
F |
G |
H |
J |
K |
Subtotal |
01_AE |
7514 |
184 |
51 |
38 |
1 |
6 |
20 |
1 |
|
1 |
7816 |
A |
600 |
9551 |
90 |
233 |
99 |
67 |
142 |
5 |
|
4 |
10791 |
B |
147 |
80 |
103779 |
584 |
898 |
152 |
25 |
12 |
|
4 |
105681 |
C |
56 |
79 |
79 |
12266 |
21 |
14 |
18 |
1 |
|
|
12534 |
D |
13 |
30 |
173 |
49 |
2994 |
9 |
8 |
2 |
|
1 |
3279 |
F |
5 |
18 |
72 |
49 |
6 |
674 |
24 |
2 |
|
|
850 |
G |
67 |
274 |
55 |
95 |
14 |
40 |
1277 |
|
|
|
1822 |
H |
1 |
52 |
12 |
9 |
5 |
3 |
10 |
17 |
|
|
109 |
J |
|
14 |
7 |
13 |
5 |
5 |
6 |
|
|
|
50 |
K |
1 |
|
1 |
7 |
|
6 |
4 |
|
|
7 |
26 |
Total |
8404 |
10282 |
104319 |
13343 |
4043 |
976 |
1534 |
40 |
|
17 |
142958 |
Benchmark result for HCV nucleotide sequences
(a) Number of gene segments per genotype
Gene |
1 |
2 |
3 |
4 |
5 |
6 |
Subtotal |
3utr |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
5utr |
1665 |
314 |
621 |
386 |
71 |
238 |
3295 |
arfp |
2802 |
310 |
580 |
235 |
24 |
435 |
4386 |
core |
4088 |
461 |
721 |
283 |
31 |
492 |
6076 |
e1 |
15697 |
1805 |
2517 |
1031 |
137 |
341 |
21528 |
e2 |
18996 |
1705 |
2511 |
418 |
89 |
388 |
24107 |
ns2 |
1909 |
11 |
16 |
20 |
1 |
46 |
2003 |
ns3 |
2765 |
64 |
271 |
22 |
107 |
46 |
3275 |
ns4a |
565 |
18 |
198 |
18 |
108 |
43 |
950 |
ns4b |
746 |
13 |
39 |
20 |
108 |
42 |
968 |
ns5a |
5487 |
51 |
259 |
20 |
1 |
53 |
5871 |
ns5b |
3213 |
519 |
523 |
443 |
122 |
334 |
5154 |
okamoto |
2913 |
471 |
429 |
403 |
119 |
246 |
4581 |
p7 |
2142 |
9 |
16 |
20 |
1 |
44 |
2232 |
Total |
62988 |
5751 |
8701 |
3319 |
919 |
2748 |
84426 |
(b) Genotype prediction concordance (%) with LANL gold standard
Gene |
1 |
2 |
3 |
4 |
5 |
6 |
Subtotal |
3utr |
|
|
|
|
|
|
|
5utr |
87.4 |
94.3 |
96.5 |
90.9 |
77.5 |
68.1 |
88.6 |
arfp |
99.8 |
99.0 |
99.7 |
99.6 |
87.5 |
99.3 |
99.6 |
core |
99.5 |
98.7 |
99.6 |
95.4 |
96.8 |
99.0 |
99.2 |
e1 |
96.3 |
93.3 |
89.4 |
100.0 |
100.0 |
99.4 |
95.5 |
e2 |
93.4 |
66.3 |
92.6 |
92.1 |
62.9 |
34.5 |
90.4 |
ns2 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
ns3 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
ns4a |
100.0 |
88.9 |
100.0 |
100.0 |
100.0 |
95.3 |
99.6 |
ns4b |
100.0 |
84.6 |
100.0 |
100.0 |
100.0 |
97.6 |
99.7 |
ns5a |
100.0 |
98.0 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
ns5b |
99.9 |
87.1 |
95.8 |
99.3 |
98.4 |
99.7 |
98.1 |
okamoto |
99.8 |
85.8 |
95.3 |
99.8 |
100.0 |
100.0 |
98.0 |
p7 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
Total |
96.7 |
85.0 |
94.0 |
97.4 |
94.0 |
87.5 |
95.3 |
(c) Confusion table (LANL on the left, MuLDAS at the top)
LANL |
1 |
2 |
3 |
4 |
5 |
6 |
Subtotal |
1 |
60919 |
748 |
69 |
28 |
17 |
72 |
61853 |
2 |
592 |
4890 |
58 |
28 |
13 |
52 |
5633 |
3 |
24 |
33 |
8179 |
28 |
7 |
11 |
8282 |
4 |
1260 |
63 |
183 |
3233 |
14 |
18 |
4771 |
5 |
24 |
11 |
180 |
1 |
864 |
191 |
1271 |
6 |
169 |
6 |
32 |
1 |
4 |
2404 |
2616 |
Total |
62988 |
5751 |
8701 |
3319 |
919 |
2748 |
84426 |
(d) Number of gene segments per genotype after outlier filtering (O <= 2.0)
Gene |
1 |
2 |
3 |
4 |
5 |
6 |
Subtotal |
3utr |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
5utr |
1472 |
236 |
387 |
0 |
0 |
152 |
2247 |
arfp |
2790 |
304 |
578 |
143 |
3 |
396 |
4214 |
core |
4081 |
456 |
692 |
261 |
15 |
476 |
5981 |
e1 |
15682 |
1673 |
2508 |
886 |
78 |
339 |
21166 |
e2 |
18793 |
1545 |
2043 |
0 |
8 |
340 |
22729 |
ns2 |
1877 |
11 |
16 |
0 |
1 |
38 |
1943 |
ns3 |
2763 |
64 |
269 |
0 |
4 |
45 |
3145 |
ns4a |
565 |
17 |
156 |
0 |
37 |
21 |
796 |
ns4b |
743 |
13 |
33 |
0 |
3 |
42 |
834 |
ns5a |
5470 |
51 |
220 |
0 |
1 |
53 |
5795 |
ns5b |
3169 |
448 |
481 |
4 |
2 |
317 |
4421 |
okamoto |
2913 |
470 |
429 |
403 |
117 |
244 |
4576 |
p7 |
2142 |
9 |
10 |
0 |
1 |
34 |
2196 |
Total |
62460 |
5297 |
7822 |
1697 |
270 |
2497 |
80043 |
(e) Genotype prediction concordance (%) with LANL gold standard after outlier filtering (O <= 2.0)
Gene |
1 |
2 |
3 |
4 |
5 |
6 |
Subtotal |
3utr |
|
|
|
|
|
|
|
5utr |
89.3 |
96.2 |
97.9 |
|
|
82.2 |
91.0 |
arfp |
99.8 |
99.3 |
99.7 |
99.3 |
100.0 |
99.2 |
99.7 |
core |
99.6 |
98.9 |
99.6 |
96.9 |
100.0 |
98.9 |
99.4 |
e1 |
96.3 |
92.8 |
89.6 |
100.0 |
100.0 |
99.4 |
95.5 |
e2 |
93.9 |
65.1 |
93.6 |
|
87.5 |
37.9 |
91.1 |
ns2 |
100.0 |
100.0 |
100.0 |
|
100.0 |
100.0 |
100.0 |
ns3 |
100.0 |
100.0 |
100.0 |
|
100.0 |
100.0 |
100.0 |
ns4a |
100.0 |
94.1 |
100.0 |
|
100.0 |
95.2 |
99.7 |
ns4b |
100.0 |
84.6 |
100.0 |
|
100.0 |
97.6 |
99.6 |
ns5a |
100.0 |
98.0 |
100.0 |
|
100.0 |
100.0 |
100.0 |
ns5b |
99.9 |
85.3 |
96.3 |
75.0 |
100.0 |
99.7 |
98.0 |
okamoto |
99.8 |
85.7 |
95.3 |
99.8 |
100.0 |
100.0 |
98.0 |
p7 |
100.0 |
100.0 |
100.0 |
|
100.0 |
100.0 |
100.0 |
Total |
96.9 |
84.7 |
94.4 |
99.4 |
99.6 |
89.9 |
95.7 |
(f) Confusion table (LANL on the left, MuLDAS at the top) after outlier filtering (O <= 2.0)
LANL |
1 |
2 |
3 |
4 |
5 |
6 |
Subtotal |
1 |
60546 |
721 |
54 |
9 |
|
32 |
61362 |
2 |
500 |
4484 |
10 |
2 |
1 |
21 |
5018 |
3 |
15 |
28 |
7381 |
|
|
5 |
7429 |
4 |
1238 |
50 |
172 |
1686 |
|
9 |
3155 |
5 |
18 |
11 |
177 |
|
269 |
184 |
659 |
6 |
143 |
3 |
28 |
|
|
2246 |
2420 |
Total |
62460 |
5297 |
7822 |
1697 |
270 |
2497 |
80043 |
Benchmark result for HCV protein sequences
(a) Number of gene segments per genotype
Gene |
1 |
2 |
3 |
4 |
5 |
6 |
Subtotal |
arfp |
159 |
2 |
8 |
5 |
2 |
5 |
181 |
core |
2941 |
323 |
516 |
221 |
33 |
410 |
4444 |
e1 |
13524 |
1384 |
1690 |
871 |
115 |
332 |
17916 |
e2 |
14538 |
1226 |
1656 |
440 |
107 |
147 |
18114 |
ns2 |
1859 |
415 |
104 |
74 |
10 |
129 |
2591 |
ns3 |
2686 |
63 |
270 |
22 |
85 |
55 |
3181 |
ns4a |
536 |
18 |
197 |
26 |
88 |
40 |
905 |
ns4b |
675 |
13 |
39 |
20 |
95 |
43 |
885 |
ns5a |
5035 |
51 |
259 |
20 |
1 |
53 |
5419 |
ns5b |
2649 |
504 |
491 |
395 |
122 |
315 |
4476 |
okamoto |
2402 |
456 |
397 |
357 |
119 |
231 |
3962 |
p7 |
1135 |
39 |
19 |
20 |
1 |
43 |
1257 |
Total |
48139 |
4494 |
5646 |
2471 |
778 |
1803 |
63331 |
(b) Genotype prediction concordance (%) with LANL gold standard
Gene |
1 |
2 |
3 |
4 |
5 |
6 |
Subtotal |
arfp |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
core |
98.6 |
98.5 |
96.9 |
88.7 |
81.8 |
99.0 |
97.8 |
e1 |
95.7 |
93.4 |
95.0 |
80.0 |
99.1 |
99.1 |
94.8 |
e2 |
93.6 |
58.2 |
90.9 |
98.0 |
11.2 |
89.8 |
90.6 |
ns2 |
65.7 |
20.2 |
17.3 |
28.4 |
10.0 |
35.7 |
53.7 |
ns3 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
83.6 |
99.7 |
ns4a |
99.6 |
88.9 |
100.0 |
76.9 |
100.0 |
100.0 |
98.9 |
ns4b |
100.0 |
84.6 |
100.0 |
100.0 |
100.0 |
95.3 |
99.5 |
ns5a |
100.0 |
98.0 |
100.0 |
100.0 |
100.0 |
100.0 |
100.0 |
ns5b |
99.9 |
89.3 |
99.0 |
99.7 |
98.4 |
99.7 |
98.5 |
okamoto |
99.8 |
88.2 |
99.2 |
100.0 |
100.0 |
99.6 |
98.4 |
p7 |
100.0 |
89.7 |
84.2 |
100.0 |
100.0 |
100.0 |
99.4 |
Total |
95.4 |
76.5 |
93.9 |
89.2 |
85.5 |
93.5 |
93.5 |
(c) Confusion table (LANL on the left, MuLDAS at the top)
LANL |
1 |
2 |
3 |
4 |
5 |
6 |
Subtotal |
1 |
45941 |
823 |
56 |
7 |
76 |
90 |
46993 |
2 |
280 |
3437 |
74 |
9 |
15 |
8 |
3823 |
3 |
205 |
62 |
5299 |
71 |
12 |
4 |
5653 |
4 |
1606 |
16 |
184 |
2203 |
8 |
1 |
4018 |
5 |
101 |
142 |
27 |
179 |
665 |
15 |
1129 |
6 |
6 |
14 |
6 |
2 |
2 |
1685 |
1715 |
Total |
48139 |
4494 |
5646 |
2471 |
778 |
1803 |
63331 |
(d) Number of gene segments per genotype after outlier filtering (O <= 2.0)
Gene |
1 |
2 |
3 |
4 |
5 |
6 |
Subtotal |
arfp |
153 |
2 |
4 |
1 |
0 |
5 |
165 |
core |
2917 |
313 |
504 |
206 |
24 |
391 |
4355 |
e1 |
13516 |
1374 |
1687 |
602 |
28 |
327 |
17534 |
e2 |
14316 |
1084 |
894 |
0 |
2 |
114 |
16410 |
ns2 |
923 |
10 |
12 |
0 |
1 |
38 |
984 |
ns3 |
2676 |
63 |
236 |
0 |
21 |
36 |
3032 |
ns4a |
533 |
15 |
193 |
0 |
40 |
30 |
811 |
ns4b |
670 |
13 |
28 |
0 |
22 |
41 |
774 |
ns5a |
4993 |
47 |
249 |
0 |
1 |
53 |
5343 |
ns5b |
2603 |
415 |
376 |
4 |
5 |
257 |
3660 |
okamoto |
2400 |
455 |
397 |
353 |
118 |
230 |
3953 |
p7 |
1133 |
8 |
13 |
0 |
1 |
32 |
1187 |
Total |
46833 |
3799 |
4593 |
1166 |
263 |
1554 |
58208 |
(e) Genotype prediction concordance (%) with LANL gold standard after outlier filtering (O <= 2.0)
Gene |
1 |
2 |
3 |
4 |
5 |
6 |
Subtotal |
arfp |
100.0 |
100.0 |
100.0 |
100.0 |
|
100.0 |
100.0 |
core |
98.8 |
98.4 |
97.0 |
90.8 |
83.3 |
99.2 |
98.1 |
e1 |
95.7 |
93.3 |
95.0 |
100.0 |
100.0 |
99.4 |
95.7 |
e2 |
94.3 |
59.8 |
92.7 |
|
50.0 |
93.9 |
91.9 |
ns2 |
100.0 |
100.0 |
100.0 |
|
100.0 |
100.0 |
100.0 |
ns3 |
100.0 |
100.0 |
100.0 |
|
100.0 |
97.2 |
99.9 |
ns4a |
99.8 |
93.3 |
100.0 |
|
100.0 |
100.0 |
99.8 |
ns4b |
100.0 |
84.6 |
100.0 |
|
100.0 |
97.6 |
99.6 |
ns5a |
100.0 |
97.9 |
100.0 |
|
100.0 |
100.0 |
100.0 |
ns5b |
99.9 |
88.2 |
99.2 |
100.0 |
100.0 |
100.0 |
98.5 |
okamoto |
99.8 |
88.1 |
99.2 |
100.0 |
100.0 |
100.0 |
98.5 |
p7 |
100.0 |
100.0 |
100.0 |
|
100.0 |
100.0 |
100.0 |
Total |
96.9 |
83.2 |
96.3 |
98.4 |
98.1 |
99.1 |
96.1 |
(f) Confusion table (LANL on the left, MuLDAS at the top) after outlier filtering (O <= 2.0)
LANL |
1 |
2 |
3 |
4 |
5 |
6 |
Subtotal |
1 |
45388 |
528 |
19 |
|
3 |
6 |
45944 |
2 |
232 |
3159 |
2 |
1 |
|
|
3394 |
3 |
20 |
51 |
4423 |
18 |
|
3 |
4515 |
4 |
1165 |
14 |
130 |
1147 |
2 |
1 |
2459 |
5 |
26 |
46 |
17 |
|
258 |
4 |
351 |
6 |
2 |
1 |
2 |
|
|
1540 |
1545 |
Total |
46833 |
3799 |
4593 |
1166 |
263 |
1554 |
58208 |
(The nucleotide/protein sequence and genotype/subtype data were downloaded from GenBank/GenPept and LANL, respectively)
|