Grit Documentation
Description
Predict transcription factor binding sites for
orthologue genes using mixed Student's t-test statistics.
Release
Source code 1.0.1 04/18/2023
Source code 1.0.0 08/14/2022
Document
Binary Windows, Mac OS, Linux
Fork me on Github
https://github.com/thua45/grit
Install
Go the source folder and run g++ main.cpp cdflib.cpp
grit.cpp -std=c++11 -lpthread -o grit command, a binary named grit will be produced
in the folder.
Under Windows OS try g++ main.cpp cdflib.cpp grit.cpp
-std=c++11 -static -lpthread -o grit.exe instead.
Requirement
Minimum 32GB RAM, if you install from the source code,
the g++ complier is also required.
Usage
grit -m motif -i homoseq -b bgseq [-n sampling_times]
[-z sampling_size] [-s rs] [-t pvalue] [- pscore] [-c cpus] -o output
Options
-m PWMs for transcription factors
-i putative promoter sequence for orthologues genes
-b background sequences
-n sampling times, how many times sampling from
background sequences, default = 10
-z sampling size, default = 200
-s raw score, 1 for RS1, 2 for RS2, default = 1
-t p-value threshold, TFBS with p-value less than will
p-value threshold be reported, default = 0.05
-p p-score threshold, TFBS with p-score less than will
be reported p-score threshold, default = 0
-c numbers of CPUs, for multiple threading
-u seed number, value <0 for random seed, default
-1
-o output, output file name
Data
motif file: Jaspar-2019.txt, Jaspar-2020.txt
promoter seq file: homoseq-v100-part1.txt, homoseq-v100-part2.txt, homoseq-v100-part3.txt, homoseq-v100-part4.txt, homoseq-v100-part5.txt, homoseq-v100-part6.txt, homoseq-v100-part7.txt, homoseq-v100-part8.txt
background seq file: bgseq-rdm2000.txt bgseq-rdm20000.txt
results (Jaspar-2019): result-v100-part1.txt, result-v100-part2.txt, result-v100-part3.txt, result-v100-part4.txt, result-v100-part5.txt, result-v100-part6.txt, result-v100-part7.txt, result-v100-part8.txt
results (Jaspar-2020): result-v100-j20-part1.txt, result-v100-j20-part2.txt, result-v100-j20-part3.txt, result-v100-j20-part4.txt, result-v100-j20-part5.txt, result-v100-j20-part6.txt, result-v100-j20-part7.txt, result-v100-j20-part8.txt
Data for
Arabidopsis_thaliana.TAIR10
motif file: Jaspar-Plant-2022.txt
promoter seq file: homoseq-v56.txt
background seq file: bgseq-rdm20000.txt
results (Jaspar-2022): result-v56-j22.txt
External Links
Grit Online: Search Grit result online
Flaver: Mining transcription factor using weighted rank correlation statistics
Dataset for GSEA
GSEA dataset created based on Gritresults
(Jaspar-2020): Grit_FDR_E-12_Cutoff_2_GSEA, Grit_FDR_E-9_Cutoff_2_GSEA, Grit_FDR_E-6_Cutoff_2_GSEA.txt
Important: before run these dataset with GSEA, should
set max_size paramater to 5000 or higher.
Example
An example run should like: grit -m Jaspar-2019.txt -i
homoseq-v100-part1.txt -b bgseq-rdm20000.txt -n 10 -z 200 -s 1 -t 0.05 -p 0 -c
8 -o result-v100-part1.txt
This command took three input files: Jaspar-2019.txt,
homoseq-v100-part1.txt, bgseq-rdm20000.txt. After finished run it will produce
an output file named: result-v100-part1.txt
Tools
Convert Jaspar Motif to Grit Motif: jaspar2motif.py
Assessment
Benchmark using ReMap datasets: Benchmark-ReMap.xlsx
Benchmark using public available datasets: Benchmark-ChIP-Atlas.xlsx
References
Tinghua Huang, Hong Xiao, Qi Tian, Zhen He, Min Yao.
Identification of upstream transcription factor binding sites in orthologous
genes using mixed Student's t-test statistics. PloS Computation Biology, 2022
Tinghua Huang, Xinmiao Huang, Binyu Wang, Hao He,
Zhiqiang Du, Min Yao, and Xuejun Gao. Flaver: mining transcription factors in
genome-wide transcriptome profiling data using weighted rank correlation
statistics
Contact
Dr. Tinghua Huang, thua45@126.com
Dr. Min Yao, minyao@yangtzeu.edu.cn
Dr. Jianwu Wang, wjw19802013@163.com