The birth of evolutionary material, which is the seed of evolution, has enabled the life to have the potential of innovation to survive in nature through interaction with the external environment, and provided the basis for the emergence of various different species.
The information that forms the basis of the life is stored in a molecule called DNA, and it is made possible to produce protein molecules constituting the life according to the information recorded in DNA. In this process, molecules such as tRNA, which is involved in the process of synthesizing proteins as well as the transmitter of information, were born. They were forced to encrypt and transmit information in code, but it was very difficult to penetrate innumerable noise in the surrounding environment and transmit accurate information.
The genetic code is represented by four letters. U, C, G, and A, which are bases of nucleotides. The proteins they are coding for are composed of 20 amino acids, which can also be represented by 20 letter codes. Each amino acid is encoded in triplets of bases called codons. Since a triplet of the four different bases make up a single code, theoretically a maximum of 64 codons can be generated.
|Genetic Code from Wikipedia|
The important question here is how almost all organisms have this genetic code in common. The first hypothesis is that random combinations of 21 codes (20 amino acids and one stop code) in 64 possible codons are assigned. The problem with this hypothesis is that the arrangement of codons in the genetic code is not random. Because there is a link between some genetic codes and particular amino acids, it is difficult to see that this encoding occurred randomly.
If so, how to explain the origin of the genetic code? Consider some hypotheses. First, it can be assumed that the genetic code has been randomly generated. For example, ribozyme, which are presumed to play a similar role as tRNA in the primordial environment, have different affinities with amino acids, suggesting that various codons appear as a result of their mutations. Once the coding for a sufficient number of peptides is present, significant random changes can be dangerous to survive since then, so they choose the "frozen" state. Another promising hypothesis is that the genetic code is the result of constant evolution of the fitness function and error minimization after the genetic code is first created by chance. In this case, it is not so different from the deep learning which is widely used nowadays.
Natural selection assigned the genetic code to minimize the effect of mutation. Recently, some studies have hypothesized that the longer than triplet codons such as quadruplets were first used and then turned into the triplets, as the longer the codon length, the greater the redundancy of the codon, more error resistant. Of course, this is not good in terms of information efficiency, so it seems that the current triplet has been selected through natural selection.
|The codon size reduction hypothesis of the triplet genetic code origin from PLoS One|
When the genetic code is viewed as a channel of information, the noise (the error), which must be inevitably encountered through interaction with the external environment, pose an essential question to the life system. How can genetic codes be able to accurately and effectively transmit and translate information while tolerating such noise?
To answer this question, T. Tlusty presented a "rate-distortion" model in 2010. According to his model, the genetic code was born as a result of the interaction of three conflicting evolutionary forces. The three forces are the need for amino acid diversity, error-tolerance, and minimal resource cost. In this process, the code appeared when the mapping of codons and amino acids was no longer random.
|Molecular codes as noisy information channels from the paper by T. Tlusty|
As another explanation, J. Jee et al. applied game theory between natural selection and information channels. Models using game theory interpret that the structure of the RNA string comes into the cell to prevent the genetic code from being used "deceptively". For example, in the RNA world of the primitive earth, a genetic code was generated as a mechanism to prevent its transformation from the ancient virus.
For any kind of model, these hypotheses can be easily understood in terms of information coding theory. From the viewpoint of the organism that encodes and decodes information, the early decoding tool would have low-quality transmission of information and a high error rate of transmitted information. To solve this problem, the life system would have responded by adding redundancy to the message, similar to the one we used to develop the telecommunication technology. However, as the life evolved, it gradually removed redundancy and found a more effective and smaller code, which is a triplet genetic code that we can now find in almost all organisms. Therefore, the current genetic code can be said to be the least error-prone and message-efficient code that is the result of evolution over billions of years. As if we are developing machine learning code optimizing cost and error tradeoffs.
What can information-asymmetric games tell us about the context of Crick's ‘frozen accident’?
Codon Size Reduction as the Origin of the Triplet Genetic Code
A rate-distortion scenario for the emergence and evolution of noisy molecular codes
A model for the emergence of the genetic code as a transition in a noisy information channel