As discussed in the previous blog “What is mRNA?“, mRNA is an intermediate stage in converting DNA code to a protein. mRNA is based upon a reference DNA gene sequence which is converted to a protein by a ribosome in the rough endoplasmic reticulum. So how do we know the mRNA sequence codes for the protein we want? For example, COVID spike protein.
How do we derive an mRNA sequence?
Many of the methods for deriving and producing an mRNA sequence are proprietary technologies that companies have invested a lot of time, money and research perfecting. As such, the companies have not shared intellectual property publicly.
Researchers most likely isolated the region of the COVID-19 RNA sequence coding for the spike protein as well as reverse-engineering the COVID-19 spike protein based on the publicly available literature. The method used to analyse the RNA sequence used computational models trained on previous DNA and RNA sequences coding for proteins. This allows researchers to trial RNA sequences to predict the protein produced. This likely required understanding of the protein structure where reverse engineering the COVID-19 spike protein can isolate peptides that correspond to a certain mRNA sequence. As discussed in the previous blog “What is mRNA?“, this is the three based triplet or codon associated with a particular peptide.
Researchers can reverse engineer by protein sequencing, a process that breaks up a polypeptide chain into building smaller polypeptides or single peptides in a controlled manner using various enzymes. The smaller peptides can be identified using mass spectrometry1. Mass spectrometry is a method that measures the mass-to-charge ratio of ions which acts similar to a fingerprint. The mass-to-charge ratio can be used to identify an individual protein or group of proteins. The mass-to-charge is based on the molecular structure and what elements are contained. Once a good amount of polypeptides peptides are sequenced, a process of re-assembly can be conducted on the overlapping segments carried out computationally. The overlapping segments should produce a predicted DNA and RNA sequence for the protein which can be compared to the COVID-19 RNA sequence. New peptide and protein sequencing platforms are being developed however mostly remain theoretical. Two notable methods are biological nanopores which can struggle to determine individual amino acids2 and single-molecule Edman Fluoro-sequencing for peptide labelling3.
These predicted protein sequences and structures can be compared to nuclear magnetic resonance (NMR) imaging of the COVID-19 spike protein as further validation4. As mentioned in the previous blog “What is mRNA?“, the tertiary protein structure self assembles where minor changes in the sequence can have dramatic effects on the protein structure.
Confirming the derived mRNA sequence
There are numerous methods to confirm the mRNA sequence codes for the protein but often they focus on these 3 principles5.
- Sequence-dependent vs. sequence-independent methods
- Local and global similarities
- Superimposed structure
The sequence-dependent and independent methods compare the DNA and mRNA sequences against the known COVID-19 sequence for the spike protein. The independent methods can compare the size and length of the DNA or protein chains using electrophoresis or SDS-PAGE6.
The local and global similarities can be analysed using nuclear magnetic resonance (NMR), comparing the 3D structure for similarities as well as analysing the local changes with the arrangement and charges in regions of interest.
The superimposed structures are important to confirm the protein is not a mirror image and is rotated correctly. An example is our hands which are mirror images of each other but cannot be superimposed on top of each other. If they are superimposed proteins, it can lead to the wrong antibodies produced and not recognising the COVID-19 virus spike proteins.
Confirming the above can take a long time and be a lengthy process but once achieved, it forms the basis for knowing the mRNA codes for the spike protein reliably. These studies and approvals are required before clinical trials which focus on the production and level of mRNA required to elicit an immune response. However, we might not want a perfect replication of the COVID spike protein…
We may not want a perfect replicate of the COVID spike protein as the spike protein may only be associated with a single variant of COVID. Therefore, if the synthesized spike protein is broader, we may be able to elicit an immune response to COVID variants that evolve over time. This could be isolating a fundamental backbone to all COVID spike proteins.
What benefit does mRNA vaccines have over protein vaccines?
The major benefit when comparing mRNA manufacture to protein manufacture is only DNA and RNA polymerase are required7. This is almost a one-step reaction where no additional cell machinery is required compared to protein production. Protein production often requires cells to mass-produce protein and requires additional purification steps.
The validation of the mRNA strands can be achieved using chromatography which uses light to quantify the amount of mRNA present as well as the purity of the mRNA. In regards to eliciting an immune response to train the immune system, proteins and mRNA function in a similar way presenting the protein to the immune system on the surface of cells. The only difference is mRNA vaccines use our own cells, ribosomes, to produce the COVID spike protein to present to our bodies and as mentioned before, the mRNA never enters our nucleus affecting our DNA.
Deriving an mRNA sequence requires a lot of processes and repeated testing to confirm the mRNA codes for the protein of interest, in this case, COVID spike protein. Why mRNA works and produces the COVID spike protein reliably is because of the rigorous testing using mass spectrometry, RNA sequences comparison, NMR, electrophoresis and antibody testing. The main reason we use mRNA over protein vaccines is that mRNA is manufactured using only DNA and RNA polymerase compared to protein manufacture making it easier to scale for mass-production.
Hopefully, this brief overview of the tests involved in validating the mRNA and COVID spike protein hopefully provides some clarity to an otherwise obscure subject.
- Alfaro, J. A., Bohländer, P., Dai, M., Filius, M., Howard, C. J., van Kooten, X. F., … Joo, C. (2021). The emerging landscape of single-molecule protein sequencing technologies. Nature Methods, Vol. 18, pp. 604–617. https://doi.org/10.1038/s41592-021-01143-1
- Hu, Z. L., Huo, M. Z., Ying, Y. L., & Long, Y. T. (2021). Biological Nanopore Approach for Single-Molecule Protein Sequencing. Angewandte Chemie – International Edition, Vol. 60, pp. 14738–14749. https://doi.org/10.1002/anie.202013462
- Swaminathan, J., Boulgakov, A. A., Hernandez, E. T., Bardo, A. M., Bachman, J. L., Marotta, J., … Marcotte, E. M. (2018). Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures. Nature Biotechnology, 36(11), 1076–1091. https://doi.org/10.1038/nbt.4278
- Cavalli, A., Salvatella, X., Dobson, C. M., & Vendruscolo, M. (2007). Protein structure determination from NMR chemical shifts. Proceedings of the National Academy of Sciences of the United States of America, 104(23), 9615–9620. https://doi.org/10.1073/pnas.0610313104
- Kufareva, I., & Abagyan, R. (2012). Methods of protein structure comparison. Methods in Molecular Biology, 857, 231–257. https://doi.org/10.1007/978-1-61779-588-6_10
- ThermoFisher Scientific (2021) Overview of Electrophoresis [Accessed 19/10/2021] https://www.thermofisher.com/uk/en/home/life-science/protein-biology/protein-biology-learning-center/protein-biology-resource-library/pierce-protein-methods/overview-electrophoresis.html
- Rosa, S. S., Prazeres, D. M. F., Azevedo, A. M., & Marques, M. P. C. (2021). mRNA vaccines manufacturing: Challenges and bottlenecks. Vaccine, Vol. 39, pp. 2190–2200. https://doi.org/10.1016/j.vaccine.2021.03.038