Grit V1.0.0 Documentation
Description
Predict transcription factor
binding sites for orthologue genes using mixed Student's t-test statistics.
Release
Source code 1.0.0 08/14/2022
Binary Windows, Mac OS, Linux
Fork me on Github
https://github.com/thua45/grit
Install
Go the source folder and run g++
main.cpp cdflib.cpp grit.cpp -std=c++11 -o grit
command, a binary named grit will be produced in the folder.
Under Windows OS try g++
main.cpp cdflib.cpp grit.cpp -std=c++11 -static -o grit.exe
instead.
Requirement
Minimum 16GB RAM, if you
install from the source code, the g++ complier is also required.
Usage
grit -m motif -i homoseq -b bgseq
[-n sampling_times] [-z sampling_size]
[-s rs] [-t pvalue] [- pscore] -o output
Options
-m PWMs for transcription
factors
-i
putative promoter sequence for orthologues genes
-b background sequences
-n sampling times, how many
times sampling from background sequences, default = 10
-z sampling size, default =
200
-s raw score, 1 for RS1, 2 for
RS2, default = 1
-t p-value threshold, TFBS
with p-value less than will p-value threshold be reported, default = 0.05
-p p-score threshold, TFBS
with p-score less than will be reported p-score threshold, default = 0
-o output, output file name
Data
motif file: Jaspar-2019.txt, Jaspar-2020.txt
promoter seq file: homoseq-v100-part1.txt, homoseq-v100-part2.txt, homoseq-v100-part3.txt, homoseq-v100-part4.txt, homoseq-v100-part5.txt, homoseq-v100-part6.txt, homoseq-v100-part7.txt, homoseq-v100-part8.txt
background seq file: bgseq-rdm2000.txt
results (Jaspar-2019): result-v100-part1.txt, result-v100-part2.txt, result-v100-part3.txt, result-v100-part4.txt, result-v100-part5.txt, result-v100-part6.txt, result-v100-part7.txt, result-v100-part8.txt
results (Jaspar-2020): result-v100-j20-part1.txt, result-v100-j20-part2.txt, result-v100-j20-part3.txt, result-v100-j20-part4.txt, result-v100-j20-part5.txt, result-v100-j20-part6.txt, result-v100-j20-part7.txt, result-v100-j20-part8.txt
External Links
Grit Online: Search Grit result
online
Flaver: Mining transcription factor using weighted
rank correlation statistics
Dataset for GSEA
GSEA dataset created based on Gritresults (Jaspar-2020): Grit_FDR_E-12_Cutoff_2_GSEA, Grit_FDR_E-9_Cutoff_2_GSEA, Grit_FDR_E-6_Cutoff_2_GSEA.txt
Important: before run these dataset with GSEA, should set max_size
paramater to 5000 or higher.
Example
An example run should like:
grit -m Jaspar-2019.txt -i homoseq-v100-part1.txt -b
bgseq-rdm20000.txt -n 10 -z 200 -s 1 -t 0.05 -p 0 -o result-v100-part1.txt
This command took three input
files: Jaspar-2019.txt, homoseq-v100-part1.txt, bgseq-rdm20000.txt. After
finished run it will produce an output file named: result-v100-part1.txt
Tools
Convert Jaspar Motif to Grit
Motif: jaspar2motif.py
Assessment
Benchmark using ReMap datasets: Benchmark-ReMap.xlsx
Benchmark using public available datasets: Benchmark-ChIP-Atlas.xlsx
References
Tinghua Huang, Hong Xiao, Qi Tian,
Zhen He, Min Yao. Identification of upstream transcription factor binding sites
in orthologous genes using mixed Student's t-test statistics. PloS Computation Biology, 2022
Tinghua Huang, Xinmiao
Huang, Binyu Wang, Hao He, Zhiqiang
Du, Min Yao, and Xuejun Gao. Flaver:
mining transcription factors in genome-wide transcriptome profiling data using
weighted rank correlation statistics
Contact
Dr. Tinghua
Huang, thua45@126.com
Dr. Min Yao, minyao@yangtzeu.edu.cn