Launch prediction server Query precalculated results
Algorithms Prediction server Database query Output description Benchmark results

How to interpret the MuLDAS output (HIV-1 as an example)

MuLDAS output consists of a summary table at the top, followed by detailed results for each gene from both major and nested analyses.

Gene by Gene Subtype Prediction Summary

The query sequence is aligned to RefSeq genome sequence (NC_001802) using BLASTN to identify the genes it covers. In this case, pol, env, gag, vif, ltr etc are located within the query with their extent. For each genic region, separate predictions were made by two different analyses: major and nested. The best MAP subtype for each analysis is reported. Clicking the best subtype will navigate to the corresponding detailed result section.

Detailed result for each gene

The posterior probability being each subtype is on the left panel. The highest one, maximum a posteriori estimate is signified in red (A in this case). Please note the very low probabilities for other subtypes. LANL defines sub-subtypes such as A1, A2, F1, and F2. The multiple alignment files downloaded from LANL contains the degenerate A and F subtypes as well as these sub-subtypes. As the distinction between these sub-subtypes are feasible with the current method, we use degenerate the sub-subtypes to A or F. The right panel shows the leave-one-out cross-validation result of 430 reference sequences in this case, with no mis-classification. At the lower right corner, optional download links are offered for GGobi XML and distance matrix files. GGobi is a powerful tool for dynamic and interactive visualization of high-dimensional data. A static 3D plot of the principal coordinates is also displayed.

Nested analysis

As subtype A is reported as the best among the major group, the CRFs partly originated from A are collected and the MDS and LDA steps are repeated as shown below.

In this case, MDS was run in 5-dimensional space and the cross-validation result is somewhat poorer than the major analysis. Even so, the subtype is consistently identified as A. The outlier-ness analysis also indicates that the query is well inside the A subtype cluster.

Copyrighted by Bio-Data Mining Lab, Department of Bioinformatics and Life Sciences, Soongsil University, Seoul, Korea