Your genome is the instructions for making and maintaining you, written in DNA. The human genome has 3.2 billion letters (nucleobases) and contains around 20,000 genes, the instructions for making proteins. That said, genes only make up about 1-5% of your genome. The rest of the DNA between the genes used to be called junk DNA as it wasn’t thought to be important. But we now know that DNA between genes is important for regulating the genes and the genome. For example, it can switch genes on and off at the right time. But there is still a lot of the genome which we do not know the function of, thus we still have some junk DNA. 

Genome sequencing refers to the process by which scientists find out the order of DNA nucleotides (bases) in a genome. However, this process alone doesn’t offer much, as it needs to be decoded for us to be able to read and understand it. That said, it is an important first step in finding genes, and scientists also study the entire genome sequence to help them understand how the genome as a whole works, such as how genes work together to direct the growth, development and maintenance of an entire organism. Scientists will also study the parts of the genome outside the genes and try to uncover more about junk DNA. 

The process by which genome sequencing occurs is as follows. First, DNA is collected from donors. Machines then sequence the DNA in small chunks, called a  read.  There are two approaches to this. The first is a clone-by-clone approach (slower but reliable), which involves first breaking the genome up into relatively large chunks, called clones (about 150,000 base pairs long), then cutting each clone into smaller, overlapping pieces that are the right size for sequencing (about 500 base pairs each). The second strategy, called the whole-genome shotgun method (faster), involves breaking the genome up into small pieces and sequencing the pieces. In both cases, they are then reconstructed using mapping technology. The reads from the sequencing machine are matched to a ‘reference genome sequence’. The software finds where each read belongs on the genome. The reference sequence is used by scientists worldwide. It is a representative example of a human genome sequence, made up of DNA sequences from 13 anonymous donors, so it is not any single person. The reference sequence was the result of the original Human Genome Project (a  13-year-long, publicly funded project initiated in 1990 with the objective of determining the  DNA  sequence of the entire euchromatic human  genome  within 15 years). The position of most of our genes is known, so the next step is to identify the differences between an individual’s genome and the reference. 

Every person has millions of differences to the reference sequence. The differences are called variants. It is important to note at this stage that the human  genome  is mostly the  same  in all people, but there are variants which account for about 0.001 per cent of each person’s DNA and contribute to differences in appearance and health. People who are closely related have more similar DNA. These variants might be a single letter, or a string of letters may be in a different place or missing. Most of the differences are completely harmless – they are the reason we are different from each other. Some differences could be causing a genetic disease (with even just a change of one letter). Scientists use a range of software to filter millions of differences down to just a few that could be harmful. Any change that is likely to be the cause of someone’s symptoms or disease is given back to the NHS. They then confirm the result in their laboratories. The findings and any implications are then discussed with the patient. If it is not clear that  a change is causing disease, it is sent to researchers for further analysis. 

Looking at the genome of a person affected by a rare disease can help find which DNA changes might be causing the problem. In  cancer, the tumour  cells have developed a different genome to the healthy cells. Comparing the normal and cancer genomes may give clues about ways to treat the cancer. When the genome sequences of patients with the same condition are compared, it is possible to see patterns. These patterns can be put together with health information. Once this is done, we may be able to link particular patterns to whether people are likely to become ill, and if so, how severe their illness is likely to be. 

Furthermore, in the future of  personalised medicine, whole-genome sequence data may be an important tool to guide therapeutic intervention.  For some patients, knowing more about their genome may mean that a particular treatment can be recommended, or a treatment could even be developed specifically for that patient. 

About the author