Help on gtftk Unix commands

Main parser arguments of gtftk

Getting help with -h

The -h argument can be used to get a synopsis for implemented commands.

$ gtftk -h
  Usage: gtftk [-h] [-b] [-p] [-u] [-s] [-d] [-v] [-l] [-i]  ...

  A toolbox to handle GTF files.

  Example:

  gtftk get_example -f chromInfo -o simple.chromInfo ;
  gtftk get_example  | gtftk feature_size -t mature_rna | gtftk nb_exons |\
  gtftk intron_sizes | gtftk exon_sizes | gtftk convergent -u 24 -d 24  -c simple.chromInfo | \
  gtftk divergent -u 101 -d 10  -c simple.chromInfo  | \
  gtftk overlapping -u 0 -d 0 -t transcript -c simple.chromInfo -a |  \
  gtftk select_by_key -k feature -v transcript |   gtftk tabulate -k "*" -b -x

  Type 'gtftk sub-command -h' for more information.

    

Main command arguments:
 -h, --help                   show this help message and exit
 -b, --bash-comp              Get a script to activate bash completion. (default: False)
 -p, --plugin-tests           Display bats tests for all plugin. (default: False)
 -u, --plugin-tests-no-conn   Display bats tests for plugins not relying on server conn. (default: False)
 -s, --system-info            Display some info about the system. (default: False)
 -d, --plugin-path            Print plugin path (default: False)
 -v, --version                show program's version number and exit
 -l, --list-plugins           Get the list of plugins. (default: False)
 -i, --update-plugins         Read the ~/.gtftk folder and update the plugin list. (default: False)

Available sub-commands/plugins:
 
  
------- editing --------

   add_prefix                 Add a prefix or suffix to target values.
   del_attr                   Delete attributes in the target gtf file.
   discretize_key             Create a new key through discretization of a numeric key.
   join_attr                  Join attributes from a tabulated file.
   join_multi_file            Join attributes from mutiple files.
   merge_attr                 Merge a set of attributes into a destination attribute.
  
----- information ------

   add_exon_nb                Add exon number transcript-wise.
   apropos                    Search in all command description files those related to a user-defined keyword.
   count                      Count the number of features in the gtf file.
   count_key_values           Count the number values for a set of keys.
   feature_size               Compute the size of features enclosed in the GTF.
   get_attr_list              Get the list of attributes from a GTF file.
   get_attr_value_list        Get the list of values observed for an attributes.
   get_example                Get example files including GTF.
   get_feature_list           Get the list of features enclosed in the GTF.
   nb_exons                   Count the number of exons by transcript.
   nb_transcripts             Count the number of transcript per gene.
   retrieve                   Retrieve a GTF file from ensembl.
   seqid_list                 Returns the chromosome list.
   tss_dist                   Computes the distance between TSS of gene transcripts.
  
------ selection -------

   random_list                Select a random list of genes or transcripts.
   random_tx                  Select randomly up to m transcript for each gene.
   rm_dup_tss                 If several transcripts of a gene share the same TSS, select only one representative.
   select_by_go               Select lines from a GTF file using a Gene Ontology ID.
   select_by_intron_size      Select transcripts by intron size.
   select_by_key              Select lines from a GTF based on attributes and values.
   select_by_loc              Select transcript/gene overlapping a genomic feature.
   select_by_max_exon_nb      For each gene select the transcript with the highest number of exons.
   select_by_nb_exon          Select transcripts based on the number of exons.
   select_by_numeric_value    Select lines from a GTF file based on a boolean test on numeric values.
   select_by_regexp           Select lines from a GTF file based on a regexp.
   select_by_tx_size          Select transcript based on their size (i.e size of mature/spliced transcript).
   select_most_5p_tx          Select the most 5' transcript of each gene.
   short_long                 Get the shortest or longest transcript of each gene
  
------ conversion ------

   bed_to_gtf                 Convert a bed file to a gtf but with lots of empty fields...
   convert                    Convert a GTF to various format including bed.
   convert_ensembl            Convert the GTF file to ensembl format. Essentially add 'transcript'/'gene' features.
   tabulate                   Convert a GTF to tabulated format.
  
------ annotation ------

   closest_genes              Find the n closest genes for each transcript.
   convergent                 Find transcripts with convergent tts.
   divergent                  Find transcripts with divergent promoters.
   exon_sizes                 Add a new key to transcript features containing a comma-separated list of exon sizes.
   intron_sizes               Add a new key to transcript features containing a comma-separated list of intron sizes.
   overlapping                Find (non)overlapping transcripts.
   tss_numbering              Add the tss number to each transcript (5'->3').
  
------ ologram ------

   ologram                    Statistics on bed file intersections with genomic features.
   ologram_merge_runs         Merge ologram runs, treating each as a superbatch.
   ologram_merge_stats        Build a heatmap from several ologram output files (tsv).
   ologram_modl_treeify       Build a tree representation from an OLOGRAM-MODL multiple combinations result files (tsv).
  
------- sequence -------

   get_feat_seq               Get feature sequence (e.g exon, UTR...).
   get_tx_seq                 Get transcript sequences in fasta format.
  
----- coordinates ------

   get_5p_3p_coords           Get the 5p or 3p coordinate for each feature. TSS or TTS for a transcript.
   intergenic                 Extract intergenic regions.
   intronic                   Extract intronic regions.
   midpoints                  Get the midpoint coordinates for the requested feature.
   shift                      Transpose coordinates.
   splicing_site              Compute the locations of donor and acceptor splice sites.
  
------- coverage -------

   coverage                   Compute bigwig coverage in body, promoter, tts...
   mk_matrix                  Compute a coverage matrix (see profile).
   profile                    Create coverage profile using a bigWig as input.
  
----- miscellaneous ----

   bigwig_to_bed              Convert a bigwig to a BED3 format.
   col_from_tab               Select columns from a tabulated file based on their names.
   control_list               Returns a list of gene matched for expression based on reference values.
   get_ceas_records           Convert a CEAS sqlite file back into a flat file.
   great_reg_domains          Attempt to compute labeled regions using GREAT 'association rule'

------------------------

Activating Bash completion

The code provided below can be useful to activate bash completion.

# Use the -b argument of gtftk
# This will produce a script that you
# should store in your .bashrc
gtftk -b

Or alternatively

echo "" >> ~/.bashrc
gtftk -b >> ~/.bashrc

Getting the list of funtional tests

One can get the list of implemented tests through the -p/–plugin-tests arguments. These tests may be run using bats (Bash Automated Testing System).

# gtftk --plugin-tests

Command-wide arguments

Description: The following arguments are available in almost all gtftk commands :

  • -h, –help : Argument list and details.

  • -i, –inputfile: The input file (may be <stdin>).

  • -o, –outputfile: The output file (may be <stdout>).

  • -D, –no-date: Do not add date to output file names.

  • -C, –add-chr: Add ‘chr’ to chromosome names before printing output.

  • -V, –verbosity: Increases output verbosity (can take value from 0 to 4).

  • -K –tmp-dir: Keep all temporary files into this folder.

  • -L, –logger-file: Stores the values of all command line arguments into a file.