Commands from section ‘miscellaneous’

In this section we will require the following datasets:

$ gtftk get_example -q -d mini_real -f '*'
$ gtftk get_example -q -d simple -f '*'

control_list

Description: Returns a list of gene matched for expression based on reference values. Based on a reference gene list (or more generally IDs) this command tries to extract a set of other genes/IDs matched for signal/expression. The –reference-gene-file contains the list of reference IDs while the –inputfile contains a tuple gene/signal for all genes.

Example:

$ gtftk control_list -i mini_real_counts_ENCFF630HEX.tsv -r mini_real_control_1.txt -D -V 2 -s -l -p 1 -ju -if example_13.png -pf png
 |-- 10:52-INFO-control_list : 0 duplicate lines have been deleted in reference file.
 |-- 10:52-INFO-control_list : Found 50 genes of the reference in the provided signal file
 |-- 10:52-INFO-control_list : All reference genes were found.
 |-- 10:52-INFO-control_list : Searching for genes with matched signal.
 |-- 10:52-INFO-control_list : Preparing a dataframe for plotting.
 |-- 10:52-INFO-control_list : Saving diagram to file : example_13.png
 |-- 10:52-INFO-control_list : Be patient. This may be long for large datasets.
_images/example_13.png

Arguments:

$ gtftk control_list -h
  Usage: gtftk control_list --in-file TXT --reference-gene-file TXT [--out-dir DIR] [--log2] [--pseudo-count pseudo_count] [-pw page_width] [-ph page_height] [-pf {pdf,png}] [-dpi dpi] [--skip-first] [--rug] [--jitter] [-if user_img_file] [-c set_colors] [-h] [-V [verbosity]] [-D] [-C] [-K tmp_dir] [-A] [-L logger_file] [-W write_message_to_file]

  Description: 

     Based on a reference gene list (or more generally IDs) this command tries to extract a set of
     other genes/IDs matched for signal/expression. The --reference-gene-file contains the list of
     reference IDs while the -\inputfile contains a tuple gene/signal for all genes.

  Notes:
     *  --infile is a two columns tabulated file. The first column contains the list of ids
     (including reference IDs) and the second column contains the expression/signal values. This
     file should contain no header.
     *  Think about discarding any unwanted IDs from --infile before calling control_list.

optional arguments:
 --in-file, -i                A two columns tab-file. See notes. (default: None)
 --reference-gene-file, -r    The file containing the reference gene list (1 column, transcript ids). No header. (default: None)
 --out-dir, -o                Name of the output directory. (default: control_list)
 --log2, -l                   If selected, data will be log transformed. (default: False)
 --pseudo-count, -p           The value for a pseudo-count to be added. (default: 0)
 -pw, --page-width            Output pdf file width (e.g. 7 inches). (default: None)
 -ph, --page-height           Output file height (e.g. 5 inches). (default: None)
 -pf, --page-format           Output file format. (default: pdf)
 -dpi, --dpi                  Dpi to use. (default: 300)
 --skip-first, -s             Indicates that infile hase a header. (default: False)
 --rug, -u                    Add rugs to the diagram. (default: False)
 --jitter, -j                 Add jittered points. (default: False)
 -if, --user-img-file         Provide an alternative path for the image. (default: None)
 -c, --set-colors             Colors for the two sets (comma-separated). (default: #b2df8a,#6a3d9a)

Command-wise optional arguments:
 -h, --help                   Show this help message and exit.
 -V, --verbosity              Set output verbosity ([0-3]). (default: 0)
 -D, --no-date                Do not add date to output file names. (default: False)
 -C, --add-chr                Add 'chr' to chromosome names before printing output. (default: False)
 -K, --tmp-dir                Keep all temporary files into this folder. (default: None)
 -A, --keep-all               Try to keep all temporary files even if process does not terminate normally. (default: False)
 -L, --logger-file            Stores the arguments passed to the command into a file. (default: None)
 -W, --write-message-to-file  Store all message into a file. (default: None)

col_from_tab

Description: Select columns from a tabulated file based on their names.

Example:

$ gtftk select_by_key -t -i simple.gtf | gtftk tabulate -k '*' -x | gtftk col_from_tab -c transcript_id,gene_id
transcript_id	gene_id
G0001T002	G0001
G0001T001	G0001
G0002T001	G0002
G0003T001	G0003
G0004T002	G0004
G0004T001	G0004
G0005T001	G0005
G0006T001	G0006
G0006T002	G0006
G0007T001	G0007
G0007T002	G0007
G0008T001	G0008
G0009T002	G0009
G0009T001	G0009
G0010T001	G0010

Arguments:

$ gtftk col_from_tab -h
  Usage: gtftk col_from_tab [-i TXT] [-o TXT] -c columns [-n] [-u] [-s SEP] [-r OUT_SEP] [-m MORE_COL] [-H] [-h] [-V [verbosity]] [-D] [-C] [-K tmp_dir] [-A] [-L logger_file] [-W write_message_to_file]

  Description: 

     Select columns from a tabulated file based on their names.

Arguments:
 -i, --inputfile              The tabulated file. Default to STDIN (default: <stdin>)
 -o, --outputfile             Output file. (default: <stdout>)
 -c, --columns                The list (csv) of column names. (default: None)
 -n, --invert-match           Not/invert match. (default: False)
 -u, --unique                 Write non redondant lines. (default: False)
 -s, --separator              The separator of input columns. (default: )
 -r, --output-separator       The separator to be used for separating output columns. (default: )
 -m, --more-col               Add a named (last) column with a given value (e.g. -m col_name:value). (default: None)
 -H, --no-header              Don't print the header line. (default: False)

Command-wise optional arguments:
 -h, --help                   Show this help message and exit.
 -V, --verbosity              Set output verbosity ([0-3]). (default: 0)
 -D, --no-date                Do not add date to output file names. (default: False)
 -C, --add-chr                Add 'chr' to chromosome names before printing output. (default: False)
 -K, --tmp-dir                Keep all temporary files into this folder. (default: None)
 -A, --keep-all               Try to keep all temporary files even if process does not terminate normally. (default: False)
 -L, --logger-file            Stores the arguments passed to the command into a file. (default: None)
 -W, --write-message-to-file  Store all message into a file. (default: None)