summaryrefslogtreecommitdiffstats
path: root/NOTES.org
blob: c77b460964fff1ef3cd8db37bfd0748e0267d05f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
#+title: Molecular Evolution Simulator for Amino Acids
#+STARTUP: content hideblocks

* Rules
- If the nucleotide change causes an amino acid change, mark the change as lethal, and use the previous sequence for further mutations
- if the nucleotide change doesn’t modify the amino acid, simply proceed with the change in place

- Students must translate the codons to amino acids themselves (possibly only the ones changed)
- have students mark lethality and proceed with forward movement
- print out at the end

** Text from Siobain
#+begin_quote
Translate your DNA sequence into amino acids.  If the mutation is synonymous (despite the mutation there is no change in amino acid), then allow the mutation to survive to all subsequent rounds of mutation.  If it is nonsynonymous, then the mutation was lethal, and you should mark the “lethal” box on the left.  Revert to the last functional sequence (the ancestral one or one that has accumulated some synonymous mutation(s)) and use that to go to the next round of mutation.

Continue to mutate for 10 rounds, though not all 10 of your sequences will survive.  Then insert the pdf you get from the webpage into this word document on the next page and answer the questions about your mutated sequences.
#+end_quote

** Inverse codon table
:PROPERTIES:
:header-args: :noweb yes
:END:

The javascript code will be primarily concerned with translating codons into amino acids, so I want to create a javascript hash table (which is really just an object) in order to do the lookup. So first I need a table that describes those translations, which I can then manipulate in code to get the desired javascript object which maps a given codon to an amino acid.

[[https://en.wikipedia.org/wiki/DNA_and_RNA_codon_tables#Inverse_DNA_codon_table][This table on Wikipedia]] is a good starting point, so I copied it into the table below. I also removed the redundancies, such as =Gln or Glu=, which had more specific definitions.

#+name: amino-acid-to-codon
| Amino acid | Codon                   |
|------------+-------------------------|
| Ala        | GCT GCC GCA GCG         |
| Arg        | CGT CGC CGA CGG AGA AGG |
| Asn        | AAT AAC                 |
| Asp        | GAT GAC                 |
| Cys        | TGT TGC                 |
| Gln        | CAA CAG                 |
| Glu        | GAA GAG                 |
| Gly        | GGT GGC GGA GGG         |
| His        | CAT CAC                 |
| Ile        | ATT ATC ATA             |
| Leu        | CTT CTC CTA CTG TTA TTG |
| Lys        | AAA AAG                 |
| Met        | ATG                     |
| Phe        | TTT TTC                 |
| Pro        | CCT CCC CCA CCG         |
| Ser        | TCT TCC TCA TCG AGT AGC |
| Thr        | ACT ACC ACA ACG         |
| Trp        | TGG                     |
| Tyr        | TAT TAC                 |
| Val        | GTT GTC GTA GTG         |
| STOP       | TAA TGA TAG             |

Now I take the raw table and translate it into a cons cell of ~(amino-acid . (list-of-codons))~ that is more natural in lisp. Since the =Codon= column above is a single string, I need to split it on white space into multiple codons.

#+name: aa-table-to-form
#+begin_src elisp :var raw-data=amino-acid-to-codon
  (mapcar (lambda (kvp)
            (cons (car kvp) (split-string (cadr kvp))))
          raw-data)
#+end_src

#+RESULTS: aa-table-to-form
| Ala  | GCT | GCC | GCA | GCG |     |     |
| Arg  | CGT | CGC | CGA | CGG | AGA | AGG |
| Asn  | AAT | AAC |     |     |     |     |
| Asp  | GAT | GAC |     |     |     |     |
| Cys  | TGT | TGC |     |     |     |     |
| Gln  | CAA | CAG |     |     |     |     |
| Glu  | GAA | GAG |     |     |     |     |
| Gly  | GGT | GGC | GGA | GGG |     |     |
| His  | CAT | CAC |     |     |     |     |
| Ile  | ATT | ATC | ATA |     |     |     |
| Leu  | CTT | CTC | CTA | CTG | TTA | TTG |
| Lys  | AAA | AAG |     |     |     |     |
| Met  | ATG |     |     |     |     |     |
| Phe  | TTT | TTC |     |     |     |     |
| Pro  | CCT | CCC | CCA | CCG |     |     |
| Ser  | TCT | TCC | TCA | TCG | AGT | AGC |
| Thr  | ACT | ACC | ACA | ACG |     |     |
| Trp  | TGG |     |     |     |     |     |
| Tyr  | TAT | TAC |     |     |     |     |
| Val  | GTT | GTC | GTA | GTG |     |     |
| STOP | TAA | TGA | TAG |     |     |     |

The last thing that needs to be done for usable output is changing the data from a list of ~(amino-acid . (list-of-codons))~ into ~((codon . amino-acid) (codon . amino-acid) …)~ because the final target of this manipulation is going to be a =json= object in the form ~{ codon: amino-acid }~.

#+name: aa-table-inverted
#+begin_src elisp :var raw-data=amino-acid-to-codon
  (let ((codon-alist (mapcar (lambda (aa-to-codons) (cons (cdr aa-to-codons) (car aa-to-codons)))
                             <<aa-table-to-form>>)))
    (apply 'append
           (mapcar (lambda (kvp)
                     (mapcar (lambda (codon)
                               (cons codon (cdr kvp)))
                             (car kvp)))
                   codon-alist)))
#+end_src

#+RESULTS: aa-table-inverted
: ((GCT . Ala) (GCC . Ala) (GCA . Ala) (GCG . Ala) (CGT . Arg) (CGC . Arg) (CGA . Arg) (CGG . Arg) (AGA . Arg) (AGG . Arg) (AAT . Asn) (AAC . Asn) (GAT . Asp) (GAC . Asp) (TGT . Cys) (TGC . Cys) (CAA . Gln) (CAG . Gln) (GAA . Glu) (GAG . Glu) (GGT . Gly) (GGC . Gly) (GGA . Gly) (GGG . Gly) (CAT . His) (CAC . His) (ATT . Ile) (ATC . Ile) (ATA . Ile) (CTT . Leu) (CTC . Leu) (CTA . Leu) (CTG . Leu) (TTA . Leu) (TTG . Leu) (AAA . Lys) (AAG . Lys) (ATG . Met) (TTT . Phe) (TTC . Phe) (CCT . Pro) (CCC . Pro) (CCA . Pro) (CCG . Pro) (TCT . Ser) (TCC . Ser) (TCA . Ser) (TCG . Ser) (AGT . Ser) (AGC . Ser) (ACT . Thr) (ACC . Thr) (ACA . Thr) (ACG . Thr) (TGG . Trp) (TAT . Tyr) (TAC . Tyr) (GTT . Val) (GTC . Val) (GTA . Val) (GTG . Val) (TAA . STOP) (TGA . STOP) (TAG . STOP))

Now that the lisp data are organized correctly, it’s a simple matter of translating the =sexp= into =json= with some simple string manipulation.

#+name: tbl-to-json
#+begin_src elisp :var raw-data=amino-acid-to-codon
  (let ((json-map (mapcar (lambda (kvp) (format "'%s': '%s'," (car kvp) (cdr kvp)))
                          <<aa-table-inverted>>)))
            (format "{\n%s\n}" (string-join json-map "\n")))
#+end_src

#+RESULTS: tbl-to-json
#+begin_example
{
'GCT': 'Ala',
'GCC': 'Ala',
'GCA': 'Ala',
'GCG': 'Ala',
'CGT': 'Arg',
'CGC': 'Arg',
'CGA': 'Arg',
'CGG': 'Arg',
'AGA': 'Arg',
'AGG': 'Arg',
'AAT': 'Asn',
'AAC': 'Asn',
'GAT': 'Asp',
'GAC': 'Asp',
'TGT': 'Cys',
'TGC': 'Cys',
'CAA': 'Gln',
'CAG': 'Gln',
'GAA': 'Glu',
'GAG': 'Glu',
'GGT': 'Gly',
'GGC': 'Gly',
'GGA': 'Gly',
'GGG': 'Gly',
'CAT': 'His',
'CAC': 'His',
'ATT': 'Ile',
'ATC': 'Ile',
'ATA': 'Ile',
'CTT': 'Leu',
'CTC': 'Leu',
'CTA': 'Leu',
'CTG': 'Leu',
'TTA': 'Leu',
'TTG': 'Leu',
'AAA': 'Lys',
'AAG': 'Lys',
'ATG': 'Met',
'TTT': 'Phe',
'TTC': 'Phe',
'CCT': 'Pro',
'CCC': 'Pro',
'CCA': 'Pro',
'CCG': 'Pro',
'TCT': 'Ser',
'TCC': 'Ser',
'TCA': 'Ser',
'TCG': 'Ser',
'AGT': 'Ser',
'AGC': 'Ser',
'ACT': 'Thr',
'ACC': 'Thr',
'ACA': 'Thr',
'ACG': 'Thr',
'TGG': 'Trp',
'TAT': 'Tyr',
'TAC': 'Tyr',
'GTT': 'Val',
'GTC': 'Val',
'GTA': 'Val',
'GTG': 'Val',
'TAA': 'STOP',
'TGA': 'STOP',
'TAG': 'STOP',
}
#+end_example

Finally, I need the complete amino acid list for a selector, so generate one from the initial table.

#+name: tbl-to-aa-list
#+begin_src elisp :var raw-data=amino-acid-to-codon :results raw
  (let ((aa-strings (mapcar (lambda (aalist)
                              (format "’%s’" (car aalist)))
                            raw-data)))
    (format "[%s]" (string-join aa-strings ", ")))
#+end_src

#+RESULTS: tbl-to-aa-list
[’Ala’, ’Arg’, ’Asn’, ’Asp’, ’Cys’, ’Gln’, ’Glu’, ’Gly’, ’His’, ’Ile’, ’Leu’, ’Lys’, ’Met’, ’Phe’, ’Pro’, ’Ser’, ’Thr’, ’Trp’, ’Tyr’, ’Val’, ’STOP’]

* work steps
1. group nucleotides by codon
2. add amino acid selection area to codon group
3. fill in the first genome’s amino acids on startup
4. Use existing infrastructure to do codon mutation, but after mutation, need a place to select/display the amino acid from the codon group.
5. Then need to verify student amino acid selection
6. Then have student mark lethality
7. Then clone either current or previous genome to next genome
8. go back to 4

* Misc
** Cartesian product fun in lisp
I don’t know why I did this when I knew I was just going to have to scrape a table anyway, but it was a fun exercise, and I don’t want to throw it away.
#+BEGIN_SRC elisp
  (let* ((builder (lambda (acc depth list)
                    (if (= depth 0)
                        (string-join acc)
                      (mapcar (lambda (e)
                                (funcall builder
                                         (cons e acc) (1- depth) list))
                              list))))
         (codons (flatten-list (funcall builder nil 3 '("A" "C" "T" "G")))))
    (string-join (mapcar (lambda (c) (format "’%s’: ," c))
                         codons)
                 "\n"))
#+END_SRC