From 122e63a1af0b6b637443bd6cfde1a984a948bc97 Mon Sep 17 00:00:00 2001 From: Brian Cully Date: Thu, 7 Jul 2022 15:08:08 -0400 Subject: Update notes with latest emails and babel for translating. --- NOTES.org | 185 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 181 insertions(+), 4 deletions(-) diff --git a/NOTES.org b/NOTES.org index ba7f44b..b75fcb1 100644 --- a/NOTES.org +++ b/NOTES.org @@ -1,14 +1,16 @@ -* This is the phi6 genome: +* POC + +** This is the phi6 genome: [[file:phi6 RefWT_from Lele.txt]] -* CSV file +** CSV file [[file:phi6 wt protein start stops.csv]] This is a CSV file with three columns: protein name, start nucleotide, ending nucleotide These numbers are inclusive. Everything else in the genome that’s not in at least one of those ranges (there’s one nucleotide overlaps between some reading frames) isn’t protein-coding. -* Standard genetic code +** Standard genetic code [[file:Genetic-Code-Amino-Acid-Codon-Chart-sidebyside-03.png]] The standard genetic code that you’ve used for some of my class projects applies, we will be using the single capital letter abbreviations for @@ -16,7 +18,7 @@ amino acids. Because of this please use lowercase “a, c, g, t” for nucleoti to “t” in your head) and has the single letter amino acids. The three stop codons (taa, tag, tga) should all code for the same thing — could be “STOP” could be an asterisk… you can have some creative control here :-) -* Test +** Test As a test that our coordinates are correct, can you spit out the protein sequence from each of those proteins? Each will start with a M (one with a V, it’s an “alternate start codon) and should stop with a stop. Please send me that as a text file. @@ -33,8 +35,183 @@ input 7500g output: a7500g P7 S34T +-- p7 is protein number from first email +-- S is orig aa +-- 34 is amino acid index inside p7 +-- T is new aa + +-- say non-coding instead "P7 S34T" if P can't be calculated + (sometimes the variant nucleotide will be in a protein-coding region but won’t change the called amino acid, this is normal and fine so we’ll see, for example, “S34S” Thanks! SD + +* mail 2 + +Here's data from Mansha -- I can help reformat as you wish. I will come and talk to you about it after shopping and getting dinner started... +SD + + +------------------------------------------------------------------------------------------------------------------------- +From: Mansha Seth-Pasricha +Sent: Friday, July 1, 2022 11:59 PM +To: Siobain Duffy +Subject: Re: Brian's ready for variant calls + +Sounds good. For 1 lineage there will be 13 text files. So sending 1 for starters- T10 ancestor. But the excel file has all 13 lineages- T10 and +evolved pops. Please use the middle sheet “v.v. Low stringency” for the data in the excel file. I could not name the sheet as freq 0.01, so it’s a +strange name. + + + +Thanks, + +Mansha + + + +From: Siobain Duffy +Date: Friday, July 1, 2022 at 5:55 PM +To: Mansha Seth-Pasricha +Subject: Re: Brian's ready for variant calls + +send both excel and text file for one set of lineages and we can see which works better? + +------------------------------------------------------------------------------------------------------------------------- + +From: Mansha Seth-Pasricha +Sent: Friday, July 1, 2022 4:37 PM +To: Siobain Duffy ; Mansha Pasricha +Subject: Re: Brian's ready for variant calls + + + +That’s awesome. Does he need the excel files with the variants organized per ancestor and evolved pops (the ones I’ve been showing you) or +the txt formats of the SNP output from varscan? + + + +As far UCSC goes, if we decide to go that route, we’ll basically be submitting it as a reference file with annotated ORF’s. Something like they +already have for human genome/other organismal genomes that folks BLAT against. I think we can still hold on to this thought. + + + +Thanks, + +Mansha + + + + + + + +From: Siobain Duffy +Date: Friday, July 1, 2022 at 3:52 PM +To: Mansha Pasricha , Mansha Seth-Pasricha +Subject: Brian's ready for variant calls + +He's got the phi6 concatenated genome parsed properly, and if you give me your data I can return your amino acid (or called as intergenic) +changes returned over the weekend. + + + +I know you're jazzed about the idea of the UCSC browser for phi6, but minimally you can use Brian's calls as something to check against? + +SD + +[[file:1T_copy_trimmed_WTRef_bow_sorted.bam.snp]] + +[[file:T10_Varscan_copy.xlsx]] + +[[file:t10-varscan.csv]] + +* Mail 3 +You've made Mansha very very happy. 4 other files to similarly treat 🙂 + + +------------------------------------------------------------------------------------------------------------------------- +From: Mansha Seth-Pasricha +Sent: Thursday, July 7, 2022 1:53 PM +To: Siobain Duffy +Subject: Re: T10 varscan results! + +Yes! Yes! Yes!! In case you can’t tell, I am jumping with joy. Thanks so much to Brian. + + + +Here you go on the other 4. Same thing- look in the low stringency tab. + + + +I found a visualizing software that aligns seqs and draws out the comparisons. I’ll see how/if I can get that to work. + + + + + +Thanks, + +Mansha + + + + + +From: Siobain Duffy +Date: Thursday, July 7, 2022 at 12:42 PM +To: Mansha Seth-Pasricha +Subject: T10 varscan results! + +Want to send more excel files? Brian understands how to interpret them now. + +SD + +[[file:E8A_Varscan.xlsx]] +[[file:E8G_Varscan.xlsx]] +[[file:E8K_Varscan.xlsx]] +[[file:T9_Varscan.xlsx]] + +[[file:E8A_Varscan.csv]] +[[file:E8G_Varscan.csv]] +[[file:E8K_Varscan.csv]] +[[file:T9_Varscan.csv]] + +#+name: mail-3-files +| E8A_Varscan.csv | +| E8G_Varscan.csv | +| E8K_Varscan.csv | +| T9_Varscan.csv | + + +* Sample runs +Protein runs: +#+begin_src shell + ./codon2aa.pl 'phi6 RefWT_from Lele.txt' 'phi6 wt protein start stops.csv' +#+end_src + + +Full conversion: +#+begin_src shell + ./varscan2codon.pl 'phi6 RefWT_from Lele.txt' 'phi6 wt protein start stops.csv' t10-varscan.csv +#+end_src + +Iter: +#+name: iter +#+begin_src shell :stdin mail-3-files + for i in $(cat); do + res=$(basename $i .csv).res.csv + ./varscan2codon.pl 'phi6 RefWT_from Lele.txt' 'phi6 wt protein start stops.csv' $i > $res + echo file:$res + done +#+end_src + +#+call: iter() + +#+RESULTS: +| file:E8A_Varscan.res.csv | +| file:E8G_Varscan.res.csv | +| file:E8K_Varscan.res.csv | +| file:T9_Varscan.res.csv | -- cgit v1.2.3