Grit V1.0.0 Documentation

 

Description

Predict transcription factor binding sites for orthologue genes using mixed Student's t-test statistics.

 

Release

Source code    1.0.0       08/14/2022

Binary     Windows, Mac OS, Linux

 

Fork me on Github

https://github.com/thua45/grit

 

Install

Go the source folder and run g++ main.cpp cdflib.cpp grit.cpp -std=c++11 -o grit command, a binary named grit will be produced in the folder.

Under Windows OS try g++ main.cpp cdflib.cpp grit.cpp -std=c++11 -static -o grit.exe instead.

 

Requirement

Minimum 16GB RAM, if you install from the source code, the g++ complier is also required.

 

Usage

grit -m motif -i homoseq -b bgseq [-n sampling_times] [-z sampling_size] [-s rs] [-t pvalue] [- pscore] -o output

 

Options

-m PWMs for transcription factors

-i putative promoter sequence for orthologues genes   

-b background sequences

-n sampling times, how many times sampling from background sequences, default = 10

-z sampling size, default = 200

-s raw score, 1 for RS1, 2 for RS2, default = 1

-t p-value threshold, TFBS with p-value less than will p-value threshold be reported, default = 0.05

-p p-score threshold, TFBS with p-score less than will be reported p-score threshold, default = 0

-o output, output file name

 

Data

motif file: Jaspar-2019.txt, Jaspar-2020.txt

promoter seq file: homoseq-v100-part1.txt, homoseq-v100-part2.txt, homoseq-v100-part3.txt, homoseq-v100-part4.txt, homoseq-v100-part5.txt, homoseq-v100-part6.txt, homoseq-v100-part7.txt, homoseq-v100-part8.txt

background seq file: bgseq-rdm2000.txt

results (Jaspar-2019): result-v100-part1.txt, result-v100-part2.txt, result-v100-part3.txt, result-v100-part4.txt, result-v100-part5.txt, result-v100-part6.txt, result-v100-part7.txt, result-v100-part8.txt

results (Jaspar-2020): result-v100-j20-part1.txt, result-v100-j20-part2.txt, result-v100-j20-part3.txt, result-v100-j20-part4.txt, result-v100-j20-part5.txt, result-v100-j20-part6.txt, result-v100-j20-part7.txt, result-v100-j20-part8.txt

 

External Links

Grit Online: Search Grit result online

Flaver: Mining transcription factor using weighted rank correlation statistics

 

Dataset for GSEA

GSEA dataset created based on Gritresults (Jaspar-2020): Grit_FDR_E-12_Cutoff_2_GSEA, Grit_FDR_E-9_Cutoff_2_GSEA, Grit_FDR_E-6_Cutoff_2_GSEA.txt

Important: before run these dataset with GSEA, should set max_size paramater to 5000 or higher.

 

Example

An example run should like: grit -m Jaspar-2019.txt -i homoseq-v100-part1.txt -b bgseq-rdm20000.txt -n 10 -z 200 -s 1 -t 0.05 -p 0 -o result-v100-part1.txt

This command took three input files: Jaspar-2019.txt, homoseq-v100-part1.txt, bgseq-rdm20000.txt. After finished run it will produce an output file named: result-v100-part1.txt

 

Tools

Convert Jaspar Motif to Grit Motif: jaspar2motif.py

 

Assessment

Benchmark using ReMap datasets: Benchmark-ReMap.xlsx

Benchmark using public available datasets: Benchmark-ChIP-Atlas.xlsx

 

References

Tinghua Huang, Hong Xiao, Qi Tian, Zhen He, Min Yao. Identification of upstream transcription factor binding sites in orthologous genes using mixed Student's t-test statistics. PloS Computation Biology, 2022

Tinghua Huang, Xinmiao Huang, Binyu Wang, Hao He, Zhiqiang Du, Min Yao, and Xuejun Gao. Flaver: mining transcription factors in genome-wide transcriptome profiling data using weighted rank correlation statistics

 

Contact

Dr. Tinghua Huang, thua45@126.com

Dr. Min Yao, minyao@yangtzeu.edu.cn