diff options
Diffstat (limited to 'README.org')
-rw-r--r-- | README.org | 59 |
1 files changed, 59 insertions, 0 deletions
diff --git a/README.org b/README.org new file mode 100644 index 0000000..8fb8941 --- /dev/null +++ b/README.org @@ -0,0 +1,59 @@ +#+title: percent nucleotide identity threshold (pnit?) + +* input +csv file, first row and first column are names, every other item is +the % identity of the names corresponding to the current cell. + +#+name: input-table-example +| | seq1 | seq2 | seq3 | +| seq1 | | | | +| seq2 | 0.9 | | | +| seq3 | 0.32 | 0.11 | | + +this shows ~seq2~ is 90% identical to ~seq1~, and ~seq3~ is 32% and 11% +identical to ~seq1~ and ~seq2~, respectively. + +the csv file would look like this: +#+name: input-csv-example +#+begin_src text + ,seq1,seq2,seq3 + seq1,,, + seq2,0.9,, + seq3,0.32,0.11, +#+end_src + +* output +csv file, two column, representing a pair where the value is at least +as large a given threshold. + +given [[input-table-example][the example input table]], at a threshold of 32%, we should get: +#+name: output-table-example-32 +| seq2 | seq1 | +| seq3 | seq1 | + +or, in csv: +#+name: output-csv-example +#+begin_src text + seq2,seq1 + seq3,seq1 +#+end_src + +* runners +#+name: process +#+begin_src shell :results file :file n-401-94.csv :var threshold=94.0 filename="inputs/n-401.csv" + guix shell perl -- ./pairwise.pl $threshold $filename +#+end_src + +#+RESULTS: process +[[file:n-401-94.csv]] + +#+call: process[:file n-402-90.5.csv](threshold=90.5, filename="n-402.csv") + +#+RESULTS: +[[file:n-402-90.5.csv]] + +#+call: process[:file n-402-93.5.csv](threshold=93.5, filename="n-402.csv") + +#+RESULTS: +[[file:n-402-93.5.csv]] + |