Popis: |
The development of RNA sequencing (RNAseq) and corresponding emergence of public datasets have created new avenues of transcriptional marker search. The long non-coding RNAs (lncRNAs) constitute an emerging class of transcripts with a potential for high tissue specificity and function. Using a dedicated bioinformatics pipeline, we propose to construct a cell-specific catalogue of unannotated lncRNAs and to identify the strongest cell markers. This pipeline uses ab initio transcript identification, pseudoalignment and new methodologies such as a specific k-mer approach for naive quantification of expression in numerous RNAseq data.For an application model, we focused on Mesenchymal Stem Cells (MSCs), a type of adult multipotent stem-cells of diverse tissue origins. Frequently used in clinics, these cells lack extensive characterisation. Our pipeline was able to highlight different lncRNAs with high specificity for MSCs. In silico methodologies for functional prediction demonstrated that each candidate represents one specific state of MSCs biology. Together, these results suggest an approach that can be employed to harness lncRNA as cell marker, showing different candidates as potential actors in MSCs biology, while suggesting promising directions for future experimental investigations. |