Commands from section ‘miscellaneous’¶
In this section we will require the following datasets:
$ gtftk get_example -q -d mini_real -f '*'
$ gtftk get_example -q -d simple -f '*'
control_list¶
Description: Returns a list of gene matched for expression based on reference values. Based on a reference gene list (or more generally IDs) this command tries to extract a set of other genes/IDs matched for signal/expression. The –reference-gene-file contains the list of reference IDs while the –inputfile contains a tuple gene/signal for all genes.
Example:
$ gtftk control_list -i mini_real_counts_ENCFF630HEX.tsv -r mini_real_control_1.txt -D -V 2 -s -l -p 1 -ju -if example_13.png -pf png
|-- 10:52-INFO-control_list : 0 duplicate lines have been deleted in reference file.
|-- 10:52-INFO-control_list : Found 50 genes of the reference in the provided signal file
|-- 10:52-INFO-control_list : All reference genes were found.
|-- 10:52-INFO-control_list : Searching for genes with matched signal.
|-- 10:52-INFO-control_list : Preparing a dataframe for plotting.
|-- 10:52-INFO-control_list : Saving diagram to file : example_13.png
|-- 10:52-INFO-control_list : Be patient. This may be long for large datasets.
Arguments:
$ gtftk control_list -h
Usage: gtftk control_list --in-file TXT --reference-gene-file TXT [--out-dir DIR] [--log2] [--pseudo-count pseudo_count] [-pw page_width] [-ph page_height] [-pf {pdf,png}] [-dpi dpi] [--skip-first] [--rug] [--jitter] [-if user_img_file] [-c set_colors] [-h] [-V [verbosity]] [-D] [-C] [-K tmp_dir] [-A] [-L logger_file] [-W write_message_to_file]
Description:
Based on a reference gene list (or more generally IDs) this command tries to extract a set of
other genes/IDs matched for signal/expression. The --reference-gene-file contains the list of
reference IDs while the -\inputfile contains a tuple gene/signal for all genes.
Notes:
* --infile is a two columns tabulated file. The first column contains the list of ids
(including reference IDs) and the second column contains the expression/signal values. This
file should contain no header.
* Think about discarding any unwanted IDs from --infile before calling control_list.
optional arguments:
--in-file, -i A two columns tab-file. See notes. (default: None)
--reference-gene-file, -r The file containing the reference gene list (1 column, transcript ids). No header. (default: None)
--out-dir, -o Name of the output directory. (default: control_list)
--log2, -l If selected, data will be log transformed. (default: False)
--pseudo-count, -p The value for a pseudo-count to be added. (default: 0)
-pw, --page-width Output pdf file width (e.g. 7 inches). (default: None)
-ph, --page-height Output file height (e.g. 5 inches). (default: None)
-pf, --page-format Output file format. (default: pdf)
-dpi, --dpi Dpi to use. (default: 300)
--skip-first, -s Indicates that infile hase a header. (default: False)
--rug, -u Add rugs to the diagram. (default: False)
--jitter, -j Add jittered points. (default: False)
-if, --user-img-file Provide an alternative path for the image. (default: None)
-c, --set-colors Colors for the two sets (comma-separated). (default: #b2df8a,#6a3d9a)
Command-wise optional arguments:
-h, --help Show this help message and exit.
-V, --verbosity Set output verbosity ([0-3]). (default: 0)
-D, --no-date Do not add date to output file names. (default: False)
-C, --add-chr Add 'chr' to chromosome names before printing output. (default: False)
-K, --tmp-dir Keep all temporary files into this folder. (default: None)
-A, --keep-all Try to keep all temporary files even if process does not terminate normally. (default: False)
-L, --logger-file Stores the arguments passed to the command into a file. (default: None)
-W, --write-message-to-file Store all message into a file. (default: None)
col_from_tab¶
Description: Select columns from a tabulated file based on their names.
Example:
$ gtftk select_by_key -t -i simple.gtf | gtftk tabulate -k '*' -x | gtftk col_from_tab -c transcript_id,gene_id
transcript_id gene_id
G0001T002 G0001
G0001T001 G0001
G0002T001 G0002
G0003T001 G0003
G0004T002 G0004
G0004T001 G0004
G0005T001 G0005
G0006T001 G0006
G0006T002 G0006
G0007T001 G0007
G0007T002 G0007
G0008T001 G0008
G0009T002 G0009
G0009T001 G0009
G0010T001 G0010
Arguments:
$ gtftk col_from_tab -h
Usage: gtftk col_from_tab [-i TXT] [-o TXT] -c columns [-n] [-u] [-s SEP] [-r OUT_SEP] [-m MORE_COL] [-H] [-h] [-V [verbosity]] [-D] [-C] [-K tmp_dir] [-A] [-L logger_file] [-W write_message_to_file]
Description:
Select columns from a tabulated file based on their names.
Arguments:
-i, --inputfile The tabulated file. Default to STDIN (default: <stdin>)
-o, --outputfile Output file. (default: <stdout>)
-c, --columns The list (csv) of column names. (default: None)
-n, --invert-match Not/invert match. (default: False)
-u, --unique Write non redondant lines. (default: False)
-s, --separator The separator of input columns. (default: )
-r, --output-separator The separator to be used for separating output columns. (default: )
-m, --more-col Add a named (last) column with a given value (e.g. -m col_name:value). (default: None)
-H, --no-header Don't print the header line. (default: False)
Command-wise optional arguments:
-h, --help Show this help message and exit.
-V, --verbosity Set output verbosity ([0-3]). (default: 0)
-D, --no-date Do not add date to output file names. (default: False)
-C, --add-chr Add 'chr' to chromosome names before printing output. (default: False)
-K, --tmp-dir Keep all temporary files into this folder. (default: None)
-A, --keep-all Try to keep all temporary files even if process does not terminate normally. (default: False)
-L, --logger-file Stores the arguments passed to the command into a file. (default: None)
-W, --write-message-to-file Store all message into a file. (default: None)