The SARS-CoV-2 coronavirus has a remarkably large RNA genome of approximately 30,000 bases in length that encodes 29 proteins, including four primary proteins: the envelope, membrane, spike and nucleocapsid N. A major question is how the long RNA of 30,000 bases gets compacted and packaged into a virus particle with a specific shape like a corona. The packaging of the RNA genome is done by the highly abundant N protein that sticks to multiple places on the long viral RNA chain protecting it from degradation and condensing it into a small volume to form the mature virion. Once the virion infects cells using the specificity of the S protein protruding from the virus the N protein is released from viral RNA to allow for viral transcription and replication.
The SARS-CoV-2 nucleocapsid protein N contains two structured RNA binding domains, the NTD and CTD, which bind at multiple places on the long viral RNA genome.
Our data support a model for N-RNA binding in which strong interactions between the NTD and single-stranded RNAs attach the NTD to multiple specific sites on the RNA, while weak interactions at the NTD’s secondary face and at the CTD promote nonspecific interactions that result in compaction of the RNA. Genome compaction through dynamic multiple interactions would be especially relevant for coronaviruses, which have extremely large single-stranded RNA genomes. There are multiple sites on the RNA and also as we show here multiple sites the N protein, some lead to binding initiation, while others lead to genome packaging. Our high resolution techniques can differentiate between these two.
Using a multidisciplinary approach, including NMR, fluorescence anisotropy, RNA gel shifts assays, RNA structure prediction and protein mapping, we examine binding between the folded domains of the N protein and a 1000 base region of viral genomic RNA, and to models of short 14-base oligonucleotides designed to mimic single stranded (ssRNA) and paired RNA (dsRNA). We find that there is a primary binding domain on the protein that binds strongly to ssRNA and we assign this as the primary driving force. We also identify a secondary RNA-binding face that competes with the well-characterized primary face, and also other regions on the protein that bind weakly but cause condensation of the virus particle. We can specifically identify which interactions leads to genome compaction.
This work is fundamental to the scientific community’s understanding of essential protein-RNA interactions of SARS-CoV-2 and sheds light on how different interactions and RNA-binding sites on N relate to the versatility of N-RNA complexes in viral function.