Help on gtftk Unix commands¶
Main parser arguments of gtftk¶
Getting help with -h¶
The -h argument can be used to get a synopsis for implemented commands.
$ gtftk -h
Usage: gtftk [-h] [-b] [-p] [-u] [-s] [-d] [-v] [-l] [-i] ...
A toolbox to handle GTF files.
Example:
gtftk get_example -f chromInfo -o simple.chromInfo ;
gtftk get_example | gtftk feature_size -t mature_rna | gtftk nb_exons |\
gtftk intron_sizes | gtftk exon_sizes | gtftk convergent -u 24 -d 24 -c simple.chromInfo | \
gtftk divergent -u 101 -d 10 -c simple.chromInfo | \
gtftk overlapping -u 0 -d 0 -t transcript -c simple.chromInfo -a | \
gtftk select_by_key -k feature -v transcript | gtftk tabulate -k "*" -b -x
Type 'gtftk sub-command -h' for more information.
Main command arguments:
-h, --help show this help message and exit
-b, --bash-comp Get a script to activate bash completion. (default: False)
-p, --plugin-tests Display bats tests for all plugin. (default: False)
-u, --plugin-tests-no-conn Display bats tests for plugins not relying on server conn. (default: False)
-s, --system-info Display some info about the system. (default: False)
-d, --plugin-path Print plugin path (default: False)
-v, --version show program's version number and exit
-l, --list-plugins Get the list of plugins. (default: False)
-i, --update-plugins Read the ~/.gtftk folder and update the plugin list. (default: False)
Available sub-commands/plugins:
------- editing --------
add_prefix Add a prefix or suffix to target values.
del_attr Delete attributes in the target gtf file.
discretize_key Create a new key through discretization of a numeric key.
join_attr Join attributes from a tabulated file.
join_multi_file Join attributes from mutiple files.
merge_attr Merge a set of attributes into a destination attribute.
----- information ------
add_exon_nb Add exon number transcript-wise.
apropos Search in all command description files those related to a user-defined keyword.
count Count the number of features in the gtf file.
count_key_values Count the number values for a set of keys.
feature_size Compute the size of features enclosed in the GTF.
get_attr_list Get the list of attributes from a GTF file.
get_attr_value_list Get the list of values observed for an attributes.
get_example Get example files including GTF.
get_feature_list Get the list of features enclosed in the GTF.
nb_exons Count the number of exons by transcript.
nb_transcripts Count the number of transcript per gene.
retrieve Retrieve a GTF file from ensembl.
seqid_list Returns the chromosome list.
tss_dist Computes the distance between TSS of gene transcripts.
------ selection -------
random_list Select a random list of genes or transcripts.
random_tx Select randomly up to m transcript for each gene.
rm_dup_tss If several transcripts of a gene share the same TSS, select only one representative.
select_by_go Select lines from a GTF file using a Gene Ontology ID.
select_by_intron_size Select transcripts by intron size.
select_by_key Select lines from a GTF based on attributes and values.
select_by_loc Select transcript/gene overlapping a genomic feature.
select_by_max_exon_nb For each gene select the transcript with the highest number of exons.
select_by_nb_exon Select transcripts based on the number of exons.
select_by_numeric_value Select lines from a GTF file based on a boolean test on numeric values.
select_by_regexp Select lines from a GTF file based on a regexp.
select_by_tx_size Select transcript based on their size (i.e size of mature/spliced transcript).
select_most_5p_tx Select the most 5' transcript of each gene.
short_long Get the shortest or longest transcript of each gene
------ conversion ------
bed_to_gtf Convert a bed file to a gtf but with lots of empty fields...
convert Convert a GTF to various format including bed.
convert_ensembl Convert the GTF file to ensembl format. Essentially add 'transcript'/'gene' features.
tabulate Convert a GTF to tabulated format.
------ annotation ------
closest_genes Find the n closest genes for each transcript.
convergent Find transcripts with convergent tts.
divergent Find transcripts with divergent promoters.
exon_sizes Add a new key to transcript features containing a comma-separated list of exon sizes.
intron_sizes Add a new key to transcript features containing a comma-separated list of intron sizes.
overlapping Find (non)overlapping transcripts.
tss_numbering Add the tss number to each transcript (5'->3').
------ ologram ------
ologram Statistics on bed file intersections with genomic features.
ologram_merge_runs Merge ologram runs, treating each as a superbatch.
ologram_merge_stats Build a heatmap from several ologram output files (tsv).
ologram_modl_treeify Build a tree representation from an OLOGRAM-MODL multiple combinations result files (tsv).
------- sequence -------
get_feat_seq Get feature sequence (e.g exon, UTR...).
get_tx_seq Get transcript sequences in fasta format.
----- coordinates ------
get_5p_3p_coords Get the 5p or 3p coordinate for each feature. TSS or TTS for a transcript.
intergenic Extract intergenic regions.
intronic Extract intronic regions.
midpoints Get the midpoint coordinates for the requested feature.
shift Transpose coordinates.
splicing_site Compute the locations of donor and acceptor splice sites.
------- coverage -------
coverage Compute bigwig coverage in body, promoter, tts...
mk_matrix Compute a coverage matrix (see profile).
profile Create coverage profile using a bigWig as input.
----- miscellaneous ----
bigwig_to_bed Convert a bigwig to a BED3 format.
col_from_tab Select columns from a tabulated file based on their names.
control_list Returns a list of gene matched for expression based on reference values.
get_ceas_records Convert a CEAS sqlite file back into a flat file.
great_reg_domains Attempt to compute labeled regions using GREAT 'association rule'
------------------------
Activating Bash completion¶
The code provided below can be useful to activate bash completion.
# Use the -b argument of gtftk
# This will produce a script that you
# should store in your .bashrc
gtftk -b
Or alternatively
echo "" >> ~/.bashrc
gtftk -b >> ~/.bashrc
Getting the list of funtional tests¶
One can get the list of implemented tests through the -p/–plugin-tests arguments. These tests may be run using bats (Bash Automated Testing System).
# gtftk --plugin-tests
Command-wide arguments¶
Description: The following arguments are available in almost all gtftk commands :
-h, –help : Argument list and details.
-i, –inputfile: The input file (may be <stdin>).
-o, –outputfile: The output file (may be <stdout>).
-D, –no-date: Do not add date to output file names.
-C, –add-chr: Add ‘chr’ to chromosome names before printing output.
-V, –verbosity: Increases output verbosity (can take value from 0 to 4).
-K –tmp-dir: Keep all temporary files into this folder.
-L, –logger-file: Stores the values of all command line arguments into a file.