aboutsummaryrefslogtreecommitdiffstats
path: root/README.org
blob: 8fb894127905fe91616cfa54fa30f3620bc0e5b3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#+title: percent nucleotide identity threshold (pnit?)

* input
csv file, first row and first column are names, every other item is
the % identity of the names corresponding to the current cell.

#+name: input-table-example
|      | seq1 | seq2 | seq3 |
| seq1 |      |      |      |
| seq2 | 0.9  |      |      |
| seq3 | 0.32 | 0.11 |      |

this shows ~seq2~ is 90% identical to ~seq1~, and ~seq3~ is 32% and 11%
identical to ~seq1~ and ~seq2~, respectively.

the csv file would look like this:
#+name: input-csv-example
#+begin_src text
  ,seq1,seq2,seq3
  seq1,,,
  seq2,0.9,,
  seq3,0.32,0.11,
#+end_src

* output
csv file, two column, representing a pair where the value is at least
as large a given threshold.

given [[input-table-example][the example input table]], at a threshold of 32%, we should get:
#+name: output-table-example-32
| seq2 | seq1 |
| seq3 | seq1 |

or, in csv:
#+name: output-csv-example
#+begin_src text
  seq2,seq1
  seq3,seq1
#+end_src

* runners
#+name: process
#+begin_src shell :results file :file n-401-94.csv :var threshold=94.0 filename="inputs/n-401.csv"
  guix shell perl -- ./pairwise.pl $threshold $filename
#+end_src

#+RESULTS: process
[[file:n-401-94.csv]]

#+call: process[:file n-402-90.5.csv](threshold=90.5, filename="n-402.csv")

#+RESULTS:
[[file:n-402-90.5.csv]]

#+call: process[:file n-402-93.5.csv](threshold=93.5, filename="n-402.csv")

#+RESULTS:
[[file:n-402-93.5.csv]]