aboutsummaryrefslogtreecommitdiffstats
path: root/README.org
diff options
context:
space:
mode:
Diffstat (limited to 'README.org')
-rw-r--r--README.org59
1 files changed, 59 insertions, 0 deletions
diff --git a/README.org b/README.org
new file mode 100644
index 0000000..8fb8941
--- /dev/null
+++ b/README.org
@@ -0,0 +1,59 @@
+#+title: percent nucleotide identity threshold (pnit?)
+
+* input
+csv file, first row and first column are names, every other item is
+the % identity of the names corresponding to the current cell.
+
+#+name: input-table-example
+| | seq1 | seq2 | seq3 |
+| seq1 | | | |
+| seq2 | 0.9 | | |
+| seq3 | 0.32 | 0.11 | |
+
+this shows ~seq2~ is 90% identical to ~seq1~, and ~seq3~ is 32% and 11%
+identical to ~seq1~ and ~seq2~, respectively.
+
+the csv file would look like this:
+#+name: input-csv-example
+#+begin_src text
+ ,seq1,seq2,seq3
+ seq1,,,
+ seq2,0.9,,
+ seq3,0.32,0.11,
+#+end_src
+
+* output
+csv file, two column, representing a pair where the value is at least
+as large a given threshold.
+
+given [[input-table-example][the example input table]], at a threshold of 32%, we should get:
+#+name: output-table-example-32
+| seq2 | seq1 |
+| seq3 | seq1 |
+
+or, in csv:
+#+name: output-csv-example
+#+begin_src text
+ seq2,seq1
+ seq3,seq1
+#+end_src
+
+* runners
+#+name: process
+#+begin_src shell :results file :file n-401-94.csv :var threshold=94.0 filename="inputs/n-401.csv"
+ guix shell perl -- ./pairwise.pl $threshold $filename
+#+end_src
+
+#+RESULTS: process
+[[file:n-401-94.csv]]
+
+#+call: process[:file n-402-90.5.csv](threshold=90.5, filename="n-402.csv")
+
+#+RESULTS:
+[[file:n-402-90.5.csv]]
+
+#+call: process[:file n-402-93.5.csv](threshold=93.5, filename="n-402.csv")
+
+#+RESULTS:
+[[file:n-402-93.5.csv]]
+