aboutsummaryrefslogtreecommitdiffstats

input

csv file, first row and first column are names, every other item is the % identity of the names corresponding to the current cell.

seq1 seq2 seq3
seq1
seq2 0.9
seq3 0.32 0.11

this shows seq2 is 90% identical to seq1, and seq3 is 32% and 11% identical to seq1 and seq2, respectively.

the csv file would look like this:

,seq1,seq2,seq3
seq1,,,
seq2,0.9,,
seq3,0.32,0.11,

output

csv file, two column, representing a pair where the value is at least as large a given threshold.

given the example input table, at a threshold of 32%, we should get:

seq2 seq1
seq3 seq1

or, in csv:

seq2,seq1
seq3,seq1

runners

guix shell perl -- ./pairwise.pl $threshold $filename

file:n-401-94.csv

file:n-402-90.5.csv

file:n-402-93.5.csv