The 'Strength of Expression-Based Association' (SEBA) scoring method
to short-list potential markers for any condition of interest

Shodhaka Life Sciences Pvt. Ltd.(1 August 2017)

Home  |   About Startbioinfo  |  Other web-portals   

A frequently used procedure for short-listing differentially expressed genes is to focus on genes with high fold-change of differential expression, that have a minimal threshold for statistical significance of the difference in the expression levels (P<0.05), and further short-list them via functional analysis. This popular approach keeps the focus on well-studied genes only, and does not prioritize the genes and transcripts having consistent association with a condition of interest. This is because not all genes are studied equally and functional information available in the databases is limited for many genes in multiple contexts. Hence, there is a possibility that a less studied gene that has highest consistency in being differently expressed is never picked up for further consideration.

Our group observed that, in certain datasets the mRNAs highly up-regulated in the cancer-condition sometimes have a contradiction in one or more of the disease-sample(s), by having a lower expression compared to expression level in one or more of the control-samples. Such a molecule may not be a reliable candidate marker. Many of the genes with the highest fold change may also not have very significant P-values. Hence, we developed a new scoring method that we call 'Strength of Expression-Based Association' (SEBA) scoring method to make a better use of RNA-seq data. This method allows a hierarchical listing of mRNAs or genes based on consistency in differential transcription. An equal weight is considered for each of the two commonly used parameter: (a) significance of the difference in expression levels, and (b) fold change. Then, apart from combining the values for these two traditional parameters, a new level of stringency is included: the extent of agreement across individual samples with the general trend observed. This scoring system can help researchers to rank the transcripts depending on the 'consistency of difference in their relative quantity'. Hence, it is possible to present a list of up- and down-regulated transcripts that are sorted based on their extent of reproducibility of differential transcription in the condition of interest compared to control conditions. The SEBA scores may be useful indicators of the reproducibility, and hence reliability, of differential gene expression patterns. Thus, these scores can be used to depict a strength of expression-based association for any condition of interest.

Description of the scoring process:

The SEBA score depends on the P-value, fold change and the number of contradictions across groups. For each transcript that is down-regulated in any condition of interest with a P-value <0.05 is considered more reliable with 0% contradictions if the highest TPM or FPKM or RPKM value among any condition of interest sample was lower than the lowest value in the normal sample. The number of normal samples that contradict this expectation, i.e., cases where the expression value among normal samples is lower than the highest of any condition of interest, is noted per any condition of interest-down-regulated transcript. Across all transcripts, the number of such contradiction would be listed and scaled from 0-40, where '40' equate to 0% contradictions, and '0' represent highest number of contradictions observed. The process is reversed for up-regulated-transcripts. Similarly, log-fold-change values identified after comparing expression values in any condition of interest with control samples for each gene would be scaled from 0 to 30, where 30 represented the highest fold change. This process would be repeated for P-values, where 30 represented the lowest P-value. Finally, a cumulative score (a maximum of 100 per transcript) of these scaled values is derived, and this forms the 'Strength of Expression-Based Association (SEBA) score'. Up- and down-regulated transcripts would thus be hierarchically arranged based on their SEBA scores.

The scoring method is currently being improved and the resulting new algorithm is expected to become a reliable tool to discover new and promising biomarkers for various diseases.

Note: It was previously known as 'Strength of Association (StA)' scoring method.


copyright © Dr. Kshitish Acharya K; all rights reserved