2dRNA Supplementary Material

Source Code

The 2dRNA is implemented in Python programming language, and it depends on no additional packages. If you want to use it, just download it and use it as follow:

    tar vxf 2dRNA.tar.gz && cd 2dRNA
    python main.py <command> [options...]
    

User Guide

There are 4 commands:
  • expand: Expand possible base pairs basing on given predictions. The input is DCA result, and the output is expaned base pairs based on the inputs.
  • filter: Remove isolated and impossible base pairs from given input. The input of this command is DCA-like results, and the output is cleaned one
  • predict: Combine the filter and expand subcommand to predict secondary structure from given inputs.
  • bench: Calculate statistical benchmark for testing set. The input is prediction and ground truth, and the output is the sensitivity, PPV, and MCC.
For expand command, the options are as below:
  • --help, -h: Show help messages.
  • --seq: The sequence of RNA in plain text file. It should contains only A,U,G and/or C characters, which represent Adenine, Uracil, Guanine, Cytosine, respectively. This option is required. The example sequence file is as below:
    GGCCUUAUGCACGGGAAAUACGCAUAUCAGUGAGGAUUCGUCCGAGAUUGUGUUUUUGCUGGUGUAAAUCAGCAGUUCCCCUGCAUAAGGCU
        
  • --di: The input DI file, which contains 3 columns: the ith index, the jth index, DI value. This option is required. The example DI file is like this:
    23 50 0.108853
    27 47 0.0943285
    34 43 0.0942708
    82 88 0.0857733
    26 48 0.0853497
    21 52 0.077596
    24 49 0.0767274
    20 53 0.0649661
    22 51 0.0546647
    28 46 0.0534889
    
For filter command, there are 5 options:
  • -h, --help: Show help messages.
  • --di: The input DI file, which is the same as above. This option is required.
  • --seq: The sequence of RNA, which is the same as above. It's required by our program.
  • --top-n: The top-N DI values you want 2dRNA preserve in prediction, which can be a string like "0.2L"(the sequence length of RNA * 0.2), "L/3"(the sequence length of RNA / 3), "40"(the absolute value of top-N). It defaults to "0.2L".
  • --pre-clean: Boolean option. Preserve top-N DI values after cleaning the non-standard DI pairs. This option is experimental, and if you want to understand the usage of it, please check the source code.
For predict command, there are 5 options:
  • -h, --help: Show help messages.
  • --di: The input DI-file, same as above. It's required.
  • --seq: The sequence file of predicted RNA. It's required, too.
  • --top-n: It's exactly the same as that in the above sub-command, and defaults to "0.2".
  • --pre-clean: Boolean option as that in the above sub-command.
For bench command, there are also 5 options available:
  • -h, --help: Show help message.
  • --predict: DI values or predicted base pairs in plain text file, which is formatted as below in 4 columns:
    1   C      71   G
    2   G      70   U
    3   C      69   G
    4   U      68   A
    5   U      67   A
    6   C      66   G
    7   A      65   U
    8   U      64   A
    9   A      63   U
    
  • --native: Native base pair file, which contains 2 columns like this:
    23 50
    27 47
    34 43
    82 88
    26 48
    
  • --top-n: The length of given RNA.

Example Usage

   tar 2dRNA.tar.gz && cd 2dRNA
   cd examples/
   bash 1Y26.sh
   bash 2XQD_Y.sh
   

Data sets

Address: 1037 Luoyu Road, Wuhan, China ©2010 Huazhong University of Science & Technology