# Creating a program computing intron sizes

## Objective

The final program that you will have to write is expected to do the following things:

• Load a gtf file (e.g. this one)
• Compute the size of introns.
• Draw the distribution of intron size.

## Constrains

• If needed/possible, use regular expression (e.g. to extract transcript name).
• Use an argument parser so that one can call it from command line.
• The color of the diagram should be proposed as an argument to the parser.

In the regular expression below ([^"]+) the parentheses are used to capture a motif (here, a succession of characters that are different from ‘"’. The found motif can be recovered using the group() method.

import re
a_string = 'cds_id "cds1"; transcript_id "tx1"; gene_id "g1";'
hit = re.search('transcript_id "([^"]+)"', a_string)
if hit:
print(hit.group(1))
## tx1

The argument parser can be used

# Declare an argument parser
import argparse
DESCRIPTION = "The program description."
help="The input file.",
default=None,
type=argparse.FileType('r'),
required=True)
args = dict(args.__dict__)