Using a Solver Over the String Pattern Domain to Analyze Gene Promoter Sequences.

Autor: Rigotti, Christophe, Mitašiūnaitė, Ieva, Besson, Jérémy, Meyniel, Laurène, Boulicaut, Jean-François, Gandrillon, Olivier
Zdroj: Inductive Databases & Constraint-based Data Mining; 2010, p407-423, 17p
Abstrakt: This chapter illustrates how inductive querying techniques can be used to support knowledge discovery from genomic data. More precisely, it presents a data mining scenario to discover putative transcription factor binding sites in gene promoter sequences. We do not provide technical details about the used constraintbased data mining algorithms that have been previously described. Our contribution is to provide an abstract description of the scenario, its concrete instantiation and also a typical execution on real data. Our main extraction algorithm is a complete solver dedicated to the string pattern domain: it computes string patterns that satisfy a given conjunction of primitive constraints. We also discuss the processing steps necessary to turn it into a useful tool. In particular, we introduce a parameter tuning strategy, an appropriate measure to rank the patterns, and the post-processing approaches that can be and have been applied. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index