22
these cells started out with exactly the same genetic material as one another. And we do mean exactly – this has to be the case, because they came from just one starter cell, that zygote. So the cells have become completely different even though they came from one cell with just one blueprint.
23
What’s much more interesting is the exploration of how cells use the same genetic information in different ways. Perhaps even more important is how the cells remember and keep on doing it. Cells in our bone marrow keep on producing blood cells, cells in our liver keep on producing liver cells. Why does this happen?
24
It’s a perfectly reasonable suggestion – cells could simply lose genetic material they aren’t going to use. As they differentiate, cells could jettison hundreds of genes they no longer need. There could of course be a slightly less drastic variation on this – maybe the cells shut down genes they aren’t using. And maybe they do this so effectively that these genes can never ever be switched on again in that cell, i.e. the genes are irreversibly inactivated. The key experiments that examined these eminently reasonable hypotheses – loss of genes, or irreversible inactivation – involved an ugly toad and an elegant man.
25
The work has its origins in experiments performed many decades ago in England by John Gurdon, first in Oxford and subsequently Cambridge.
26
John Gurdon used non-fertilised toad eggs in his work.
26
jelly-like mass develop into tadpoles
26
The eggs John Gurdon worked on were a little like these, but hadn’t been exposed to sperm.
26
There were good reasons why he chose to use toad eggs in his experiments. The eggs of amphibians are generally very big, are laid in large numbers outside the body and are see-through. All these features make amphibians a very handy experimental species in developmental biology, as the eggs are technically relatively easy to handle. Certainly a lot better than a human egg, which is hard to obtain, very fragile to handle, is not transparent and is so small that we need a microscope just to see it.
27
John Gurdon worked on the African clawed toad (Xenopus laevis, to give it its official title), one of those John Malkovich ugly-handsome animals, and investigated what happens to cells as they develop and differentiate and age. He wanted to see if a tissue cell from an adult toad still contained all the genetic material it had started with, or if it had lost or irreversibly inactivated some as the cell became more specialised. The way he did this was to take a nucleus from the cell of an adult toad and insert it into an unfertilised egg that had had its own nucleus removed. This technique is called somatic cell nuclear transfer (SCNT), and will come up over and over again. ‘Somatic’ comes from the Greek word for ‘body’.
27
After he’d performed the SCNT, John Gurdon kept the eggs in a suitable environment (much like a child with a tank of frogspawn) and waited to see if any of these cultured eggs hatched into little toad tadpoles.
The experiments were designed to test the following hypothesis: ‘As cells become more specialised (differentiated) they undergo an irreversible loss/inactivation of genetic material.’ There were two possible outcomes to these experiments:
Either
The hypothesis was correct and the ‘adult’ nucleus has lost some of the original blueprint for creating a new individual. Under these circumstances an adult nucleus will never be able to replace the nucleus in an egg and so will never generate a new healthy toad, with all its varied and differentiated tissues.
Or
The hypothesis was wrong, and new toads can be created by removing the nucleus from an egg and replacing it with one from adult tissues.
28
Other researchers had started to look at this before John Gurdon decided to tackle the problem – two scientists called Briggs and King using a different amphibian, the frog Rana pipiens. In 1952 they transplanted the nuclei from cells at a very early stage of development into an egg lacking its own original nucleus and they obtained viable frogs. This demonstrated that it was technically possible to transfer a nucleus from another cell into an ‘empty’ egg without killing the cell. However, Briggs and King then published a second paper using the same system but transferring a nucleus from a more developed cell type and this time they couldn’t create any frogs. The difference in the cells used for the nuclei in the two papers seems astonishingly minor – just one day older and no froglets. This supported the hypothesis that some sort of irreversible inactivation event had taken place as the cells differentiated. A lesser man than John Gurdon might have been put off by this. Instead he spent over a decade working on the problem.
29
And that was the genius of John Gurdon’s work. When he performed his experiments what he was attempting was exceptionally challenging with the technology of the time. If he failed to generate toads from the adult nuclei this could simply mean his technique had something wrong with it. No matter how many times he did the experiment without getting any toads, this wouldn’t actually prove the hypothesis. But if he did generate live toads from eggs where the original nucleus had been replaced by the adult nucleus he would have disproved the hypothesis. He would have demonstrated beyond doubt that when cells differentiate, their genetic material isn’t irreversibly lost or changed. The beauty of this approach is that just one such toad would topple the entire theory – and topple it he did.
29
John Gurdon is incredibly generous in his acknowledgement of the collegiate nature of scientific research, and the benefits he obtained from being in dynamic laboratories and universities. He was lucky to start his work in a well set-up laboratory which had a new piece of equipment which produced ultraviolet light. This enabled him to kill off the original nuclei of the recipient eggs without causing too much damage, and also ‘softened up’ the cell so that he could use tiny glass hypodermic needles to inject donor nuclei. Other workers in the lab had, in some unrelated research, developed a strain of toads which had a mutation with an easily detectable, but non-damaging effect. Like almost all mutations this was carried in the nucleus, not the cytoplasm. The cytoplasm is the thick liquid inside cells, in which the nucleus sits. So John Gurdon used eggs from one strain and donor nuclei from the mutated strain. This way he would be able to show unequivocally that any resulting toads had been coded for by the donor nuclei, and weren’t just the result of experimental error, as could happen if a few recipient nuclei had been left over after treatment.
30
John Gurdon spent around fifteen years, starting in the late 1950s, demonstrating that in fact nuclei from specialised cells are able to create whole animals if placed in the right environment i.e. an unfertilised egg4. The more differentiated/specialised the donor cell was, the less successful the process in terms of numbers of animals, but that’s the beauty of disproving a hypothesis – we might need a lot of toad eggs to start with but we don’t need to end up with many live toads to make our case. Just one non-murderous doctor will do it, remember?
30
So John Gurdon showed us that although there is something in cells that can keep specific genes turned on or switched off in different cell types, whatever this something is, it can’t be loss or permanent inactivation of genetic material, because if he put an adult nucleus into the right environment – in this case an ‘empty’ unfertilised egg – it forgot all about this memory of which cell type it came from. It went back to being a naive nucleus from an embryo and started the whole developmental process again.
30
Epigenetics is the ‘something’ in these cells. The epigenetic system controls how the genes in DNA are used, in some cases for hundreds of cell division cycles, and the effects are inherited from when cells divide. Epigenetic modifications to the essential blueprint exist over and above the genetic code, on top of it, and program cells for decades. But under the right circumstances, this layer of epigenetic information can be removed to reveal the same shiny DNA sequence that was always there. That’s what happened when John Gurdon placed the nuclei from fully differentiated cells into the unfertilised egg cells.
32
Waddington presented his metaphorical epigenetic landscape in 1957 to exemplify concepts of developmental biology5. The landscape merits quite a bit of discussion. As you can see, there is a ball at the top of a hill. As the ball rolls down the hill, it can roll into one of several troughs towards the bottom of the hill. Visually this immediately suggests various things to us, because we have all at some point in our childhood rolled balls down hills, or stairs, or something.
33
What do we immediately understand when we see the image of Waddington’s landscape? We know that once a ball has reached the bottom it is likely to stay there unless we do something to it. We know that to get the ball back up to the top will be harder than rolling it down the hill in the first place. We also know that to roll the ball out of one trough and into another will be hard. It might even be easier to roll it part or all of the way back up and then direct it into a new trough, than to try and roll it directly from one trough to another. This is especially true if the two troughs we’re interested in are separated by more than one hillock.
33
This image is incredibly powerful in helping to visualise what might be happening during cellular development. The ball at the top of the hill is the zygote, the single cell that results from the fusion of one egg and one sperm. As the various cells of the body begin to differentiate (become more specialised), each cell is like a ball that has rolled further down the hill and headed into one of the troughs. Once it has gone as far as it can go, it’s going to stay there. Unless something extraordinarily dramatic happens, that cell is never going to turn into another cell type (jump across to another trough). Nor is it going to move back up to the top of the hill and then roll down again to give rise to all sorts of different cell types.
34
John Gurdon’s experiments had shown that sometimes, if he pushed hard enough, he could move a cell from the very bottom of a trough at the bottom of the hill, right the way back up to the top. From there it can roll down and become any other cell type once more.
38
for most purposes we don’t need to go as far as this stage for cloning to be useful for humans. What we need are cells that have the potential to turn into lots of other cell types. These are the cells that are known as stem cells, and they are metaphorically near the top of Waddington’s epigenetic landscape. The reason we need such cells lies in the nature of the diseases that are major problems in the developed world.
39
In the rich parts of our planet the diseases that kill most of us are chronic. They take a long time to develop and often they take a long time to kill us when they do.
39
But the heart is different. Cardiomyocytes are referred to as ‘terminally differentiated’ – they have gone right to the bottom of Waddington’s hill and are stuck in a particular trough. Unlike bone marrow or liver, the heart doesn’t have an accessible reservoir of less specialised cells (cardiac stem cells) that could turn into new cardiomyocytes. So, the long-term problem that follows a heart attack is that our bodies can’t make new cardiac muscle cells. The body does the only thing it can and replaces the dead cardiomyocytes with connective tissue, and the heart never beats in quite the same way it did before.
39
Similar things happen in so many diseases – the insulin-secreting cells that are lost when teenagers develop type 1 diabetes, the brain cells that are lost in Alzheimer’s disease, the cartilage producing cells that disappear during osteoarthritis – the list goes on and on. It would be great if we could replace these with new cells, identical to our own. This way we wouldn’t have to deal with all the rejection issues that make organ transplants such a challenge, or with the lack of availability of donors. Using stem cells in this way is referred to as therapeutic cloning; creating cells identical to a specific individual in order to treat a disease.
40
For over 40 years we’ve known that in theory this could be possible. John Gurdon’s work and all that followed after him showed that adult cells contain the blueprints for all the cells of the body if we can only find the correct way of accessing them. John Gurdon had taken nuclei from adult toads, put them into toad eggs and been able to push those nuclei all the way back up Waddington’s landscape and create new animals. The adult nuclei had been – and this word is critical – reprogrammed. Ian Wilmut and Keith Campbell had done pretty much the same thing with sheep. The important common feature to recognise here is that in each case the reprogramming only worked when the adult nucleus was placed inside an unfertilised egg. It was the egg that was really important. We can’t clone an animal by taking an adult nucleus and putting it into some other cell type.
40
When we’re first taught about cells in school it’s almost as if the nucleus is all powerful and the rest of the cell – the cytoplasm – is a bag of liquid that doesn’t really do much. Nothing could be further from the truth, and this is especially the case for the egg, because the toads and Dolly have taught us that the cytoplasm of the egg is absolutely key. Something, or some things, in that egg cytoplasm actively reprogrammed the adult nucleus that the experimenters injected into it. These unknown factors moved a nucleus from the bottom of one of Waddington’s troughs right back to the top of the landscape.
42
Remember that ball at the top of Waddington’s landscape. In cellular terms it’s the zygote and it’s referred to as totipotent, that is, it has the potential to form every cell in the body, including the placenta. Of course, zygotes by definition are rather limited in number and most scientists working in very early development use cells from a bit later, the famous embryonic stem (ES) cells. These are created as a result of normal developmental pathways. The zygote divides a few times to create a bundle of cells called the blastocyst. Although the blastocyst typically has less than 150 cells it’s already an early embryo with two distinct compartments. There’s an outer layer called the trophectoderm, which will eventually form the placenta and other extra-embryonic tissues, and an inner cell mass (ICM).
42
Figure 2.1 shows what the blastocyst looks like. The drawing is in two dimensions but in reality the blastocyst is a three-dimensional structure, so the actual shape is that of a tennis ball that’s had a golf ball glued inside it.
42
The cells of the ICM can be grown in the lab in culture dishes. They’re fiddly to maintain and require specialised culture conditions and careful handling, but do it right and they reward us by dividing a limitless number of times and staying the same as the parent cell. These are the ES cells and as their full name suggests, they can form every cell of the embryo and ultimately of the mature animal. They aren’t totipotent – they can’t make placenta – so they are called pluripotent because they make pretty much anything else.
42
Figure 2.1 A diagram of the mammalian blastocyst. The cells of the trophectoderm will give rise to the placenta. During normal development, the cells of the Inner Cell Mass (ICM) will give rise to the tissues of the embryo. Under laboratory conditions, the cells of the ICM can be grown in culture as pluripotent embryonic stem (ES) cells.
43
These ES cells have been invaluable for understanding what’s important for keeping cells in a pluripotent state. Over the years a number of leading scientists including Azim Surani in Cambridge, Austin Smith in Edinburgh, Rudolf Jaenisch in Boston and Shinya Yamanaka in Kyoto have devoted huge amounts of time to identifying the genes and proteins expressed (switched on) in ES cells. They particularly tried to identify genes that keep the ES cells in a pluripotent state. These genes are extraordinarily important because ES cells seem to be very prone to turn into other cell types in culture if you don’t keep the conditions just right. Just a small change in culture conditions, for example, and a culture dish full of one-time ES cells can differentiate into cardiomyocytes and do what heart cells do best: they beat along in time with one another. A slightly different change in conditions – altering the delicate balance of chemicals in the culture fluid, for example, can divert the ES cells away from the cardiac route and start the development of cells that give rise to the neurons in our brains.
43
Scientists working on ES cells identified a whole slew of genes that were important for keeping the cells pluripotent. The functions of the various genes they identified weren’t necessarily identical. Some were important for self-renewal, i.e. one ES dividing to form two ES cells, whereas others were required to stop the cells from differentiating1
8
DNA.
Sometimes, when we read about biology, we could be forgiven for thinking that those three letters explain everything. Here, for example, are just a few of the statements made on 26 June 2000, when researchers announced that the human genome had been sequenced1:
Today we are learning the language in which God created life.
US President Bill Clinton
We now have the possibility of achieving all we ever hoped for from medicine.
UK Science Minister Lord Sainsbury
Mapping the human genome has been compared with putting a man on the moon, but I believe it is more than that. This is the outstanding achievement not only of our lifetime, but in terms of human history.
Michael Dexter, The Wellcome Trust
10
From these quotations, and many others like them, we might well think that researchers could have relaxed a bit after June 2000 because most human health and disease problems could now be sorted out really easily. After all, we had the blueprint for humankind. All we needed to do was get a bit better at understanding this set of instructions, so we could fill in a few details.
Unfortunately, these statements have proved at best premature. The reality is rather different.
10
We talk about DNA as if it’s a template, like a mould for a car part in a factory. In the factory, molten metal or plastic gets poured into the mould thousands of times and, unless something goes wrong in the process, out pop thousands of identical car parts.
10
But DNA isn’t really like that. It’s more like a script. Think of Romeo and Juliet, for example. In 1936 George Cukor directed Leslie Howard and Norma Shearer in a film version. Sixty years later Baz Luhrmann directed Leonardo DiCaprio and Claire Danes in another movie version of this play. Both productions used Shakespeare’s script, yet the two movies are entirely different. Identical starting points, different outcomes.
10
That’s what happens when cells read the genetic code that’s in DNA. The same script can result in different productions.
13
Scientists have known for some time that genetics plays a strong role in determining if a person will develop this illness. We know this because if one of a pair of identical twins has schizophrenia, there is a 50 per cent chance that their twin will also have the condition. This is much higher than the 1 per cent risk in the general population.
14
Here’s a third case study. A small child, less than three years old, is abused and neglected by his or her parents. Eventually, the state intervenes and the child is taken away from the biological parents and placed with foster or adoptive parents. These new carers love and cherish the child, doing everything they can to create a secure home, full of affection. The child stays with these new parents throughout the rest of its childhood and adolescence, and into young adulthood.
Sometimes everything works out well for this person. They grow up into a happy, stable individual indistinguishable from all their peers who had normal, non-abusive childhoods. But often, tragically, it doesn’t work out this way. Children who have suffered from abuse or neglect in their early years grow up with a substantially higher risk of adult mental health problems than the general population. All too often the child grows up into an adult at high risk of depression, self-harm, drug abuse and suicide.
15
Why is it so difficult to override the effects of early childhood exposure to neglect or abuse?
15
But these stories are linked at a very fundamental biological level. They are all examples of epigenetics. Epigenetics is the new discipline that is revolutionising biology. Whenever two genetically identical individuals are non-identical in some way we can measure, this is called epigenetics. When a change in environment has biological consequences that last long after the event itself has vanished into distant memory, we are seeing an epigenetic effect in action.
15
When scientists talk about epigenetics they are referring to all the cases where the genetic code alone isn’t enough to describe what’s happening – there must be something else going on as well.
16
This is one of the ways that epigenetics is described scientifically, where things which are genetically identical can actually appear quite different to one another. But there has to be a mechanism that brings out this mismatch between the genetic script and the final outcome. These epigenetic effects must be caused by some sort of physical change, some alterations in the vast array of molecules that make up the cells of every living organism. This leads us to the other way of viewing epigenetics – the molecular description. In this model, epigenetics can be defined as the set of modifications to our genetic material that change the ways genes are switched on or off, but which don’t alter the genes themselves.
16
Although it may seem confusing that the word ‘epigenetics’ can have two different meanings, it’s just because we are describing the same event at two different levels. It’s a bit like looking at the pictures in old newspapers with a magnifying glass, and seeing that they are made up of dots. If we didn’t have a magnifying glass we might have thought that each picture was just made in one solid piece and we’d probably never have been able to work out how so many new images could be created each day. On the other hand, if all we ever did was look through the magnifying glass, all we would see would be dots, and we’d never see the incredible image that they formed together and which we’d see if we could only step back and look at the big picture.
16
The revolution that has happened very recently in biology is that for the first time we are actually starting to understand how amazing epigenetic phenomena are caused. We’re no longer just seeing the large image, we can now also analyse the individual dots that created it.
Note: fuck
16
Crucially, this means that we are finally starting to unravel the missing link between nature and nurture; how our environment talks to us and alters us, sometimes forever.
17
The ‘epi’ in epigenetics is derived from Greek and means at, on, to, upon, over or beside. The DNA in our cells is not some pure, unadulterated molecule. Small chemical groups can be added at specific regions of DNA. Our DNA is also smothered in special proteins. These proteins can themselves be covered with additional small chemicals. None of these molecular amendments changes the underlying genetic code. But adding these chemical groups to the DNA, or to the associated proteins, or removing them, changes the expression of nearby genes. These changes in gene expression alter the functions of cells, and the very nature of the cells themselves. Sometimes, if these patterns of chemical modifications are put on or taken off at a critical period in development, the pattern can be set for the rest of our lives, even if we live to be over a hundred years of age.
17
There’s no debate that the DNA blueprint is a starting point. A very important starting point and absolutely necessary, without a doubt. But it isn’t a sufficient explanation for all the sometimes wonderful, sometimes awful, complexity of life. If the DNA sequence was all that mattered, identical twins would always be absolutely identical in every way. Babies born to malnourished mothers would gain weight as easily as other babies who had a healthier start in life. And as we shall see in Chapter 1, we would all look like big amorphous blobs, because all the cells in our bodies would be completely identical.
63
The field is now possibly at risk of swinging a bit too far in the opposite direction, with hard-line epigeneticists almost minimizing the significance of the DNA code. The truth is, of course, somewhere in between.
63
In the Introduction, we described DNA as a script. In the theatre, if a script is lousy then even a wonderful director and a terrific cast won’t be able to create a great production. On the other hand, we have probably all suffered through terrible productions of our favourite plays. Even if the script is perfect, the final outcome can be awful if the interpretation is poor. In the same way, genetics and epigenetics work intimately together to create the miracles that are us and every organic thing around us.
64
DNA is the fundamental information source in our cells, their basic blueprint. DNA itself isn’t the real business end of things, in the sense that it doesn’t carry out all the thousands of activities required just to keep us alive. That job is mainly performed by the proteins. It’s proteins that carry oxygen around our bloodstream, that turn chips and burgers into sugars and other nutrients that can be absorbed from our guts and used to power our brains, that contract our muscles so we can turn the pages of this book. But DNA is what carries the codes for all these proteins.
64
If DNA is a code, then it must contain symbols that can be read. It must act like a language. This is indeed exactly what the DNA code does. It might seem odd when we think how complicated we humans are, but our DNA is a language with only four letters. These letters are known as bases, and their full names are adenine, cytosine, guanine and thymine. They are abbreviated to A, C, G and T. It’s worth remembering C, cytosine, in particular, because this is the most important of all the bases in epigenetics.
64
One of the easiest ways to visualise DNA mentally is as a zip. It’s not a perfect analogy, but it will get us started. Of course, one of the most obvious things that we know about a zip is that it is formed of two strips facing each other. This is also true of DNA. The four bases of DNA are the teeth on the zip. The bases on each side of the zip can link up to each other chemically and hold the zip together. Two bases facing each other and joined up like this are known as a base-pair. The fabric strips that the teeth are stitched on to on a zip are the DNA backbones. There are always two backbones facing each other, like the two sides of the zip, and DNA is therefore referred to as double-stranded. The two sides of the zip are basically twisted around to form a spiral structure – the famous double helix. Figure 3.1 is a stylised representation of what the DNA double helix looks like.
65
The analogy will only get us so far, however, and that’s because the teeth of the DNA zip aren’t all equivalent. If one of the teeth is an A base, it can only link up with a T base on the opposite strand. Similarly, if there is a G base on one strand, it can only link up with a C on the other one. This is known as the base-pairing principle. If an A tried to link with a C on the opposite strand it would throw the whole shape of the DNA out of kilter, a bit like a faulty tooth on a zip.
66
The majority of cell types reproduce by first copying their entire DNA, and then dividing it equally between two daughter cells. This DNA replication is essential. Without it, daughter cells could end up with no DNA, which in most cases would render them completely useless, like a computer that’s lost its operating software.
66
It’s the copying of DNA before each cell division that shows why the base-pairing principle is so important. Hundreds of scientists have spent their entire careers working out the details of how DNA gets faithfully copied. Here’s the gist of it. The two strands of DNA are pulled apart and then the huge number of proteins involved in the copying (known as the replication complex) get to work.
66
Figure 3.2 shows in principle what happens. The replication complex moves along each single strand of DNA, and builds up a new strand facing it. The complex recognises a specific base – base C for example – and always puts a G in the opposite position on the strand that it’s building. That’s why the base-pairing principle is so important. Because C has to pair up with G, and A has to pair up with T, the cells can use the existing DNA as a template to make the new strands. Each daughter cell ends up with a new perfect copy of the DNA, in which one of the strands came from the original DNA molecule and the other was newly synthesised.
67
Even in nature, in a system which has evolved over billions of years, nothing is perfect and occasionally the replication machinery makes a mistake. It might try to insert a T where a C should really go. When this happens the error is almost always repaired very quickly by another set of proteins that can recognise that this has happened, take out the wrong base and put in the right one. This is the DNA repair machinery, and one of the reasons it’s able to act is because when the wrong bases pair up, it recognises that the DNA ‘zip’ isn’t done up properly.
67
The cell puts a huge amount of energy into keeping the DNA copies completely faithful to the original template. This makes sense if we go back to our model of DNA as a script.
67
Figure 3.2 The first stage in replication of DNA is the separation of the two strands of the double helix. The bases on each separated backbone act as the template for the creation of a new strand. This ensures that the two new double-stranded DNA molecules have exactly the same base sequence as the parent molecule. Each new double helix of DNA has one backbone that was originally part of the parent molecule (in black) and one freshly synthesised backbone (in white).
68
a script needs to be reproduced faithfully. It can be the same with our DNA – one inappropriate change (a mutation) can have devastating effects. This is particularly true if the mutation is present in an egg or a sperm, as this can ultimately lead to the birth of an individual in whom all the cells carry the mutation. Some mutations have devastating clinical effects. These range from children who age so prematurely that a ten-year-old has the body of a person of 70, to women who are pretty much predestined to develop aggressive and difficult to treat breast cancer before they are 40 years of age. Thankfully, these sorts of genetic mutations and conditions are relatively rare compared with the types of diseases that afflict most people.
68
The 50,000,000,000,000 or so cells in a human body are all the result of perfect replication of DNA, time after time after time, whenever cells divide after the formation of that single-cell zygote from Chapter 1. This is all the more impressive when we realise just how much DNA has to be reproduced each time one cell divides to form two daughter cells. Each cell contains six billion base-pairs of DNA (half originally came from your father and half from your mother). This sequence of six billion base-pairs is what we call the genome. So every single cell division in the human body was the result of copying 6,000,000,000 bases of DNA. Using the same type of calculation as in Chapter 1, if we count one base-pair every second without stopping, it would take a mere 190 years to count all the bases in the genome of a cell. When we consider that a baby is born just nine months after the creation of the single-celled zygote, we can see that our cells must be able to replicate DNA really fast.
68
The three billion base-pairs we inherit from each parent aren’t formed of one long string of DNA. They are arranged into smaller bundles, which are the chromosomes. We’ll delve deeper into these in Chapter 9.
69
Let’s go back to the more fundamental question of what these six billion base-pairs of DNA actually do, and how the script works. More specifically how can a code that only has four letters (A, C, G and T) create the thousands and thousands of different proteins found in our cells? The answer is surprisingly elegant. It could be described as the modular paradigm of molecular biology but it’s probably far more useful to think of it as Lego.
69
Lego used to have a great advertising slogan ‘It’s a new toy every day’, and it was very accurate. A large box of Lego contains a limited number of designs, essentially a fairly small range of bricks of certain shapes, sizes and colours. Yet it’s possible to use these bricks to create models of everything from ducks to houses, and from planes to hippos. Proteins are rather like that. The ‘bricks’ in proteins are quite small molecules called amino acids, and there are twenty standard amino acids (different Lego bricks) in our cells. But these twenty amino acids can be joined together in an incredible array of combinations of all sorts of diversity and length, to create an enormous number of proteins.
69
That still leaves the problem of how even as few as twenty amino acids can be encoded by just four bases in DNA. The way this works is that the cell machinery ‘reads’ DNA in blocks of three base-pairs at a time. Each block of three is known as a codon and may be AAA, or GCG or any other combination of A, C, G and T. From just four bases it’s possible to create sixty-four different codons, more than enough for the twenty amino acids. Some amino acids are coded for by more than one codon. For example, the amino acid called lysine is coded for by AAA and AAG. A few codons don’t code for amino acids at all. Instead they act as signals to tell the cellular machinery that it’s at the end of a protein-coding sequence. These are referred to as stop codons.
70
How exactly does the DNA in our chromosomes act as a script for producing proteins? It does it through an intermediary protein, a molecule called messenger RNA (mRNA). mRNA is very like DNA although it does differ in a few significant details. Its backbone is slightly different from DNA (hence RNA, which stands for ribonucleic acid rather than deoxyribonucleic acid); it is single-stranded (only one backbone); it replaces the T base with a very similar but slightly different one called U (we don’t need to go into the reason it does this here). When a particular DNA stretch is ‘read’ so that a protein can be produced using that bit of script, a huge complex of proteins unzips the right piece of DNA and makes mRNA copies. The complex uses the base-pairing principle to make perfect mRNA copies. The mRNA molecules are then used as temporary templates at specialised structures in the cell that produce protein. These read the three letter codon code and stitch together the right amino acids to form the longer protein chains. There is of course a lot more to it than all this, but that’s probably sufficient detail.
70
An analogy from everyday life may be useful here. The process of moving from DNA to mRNA to protein is a bit like controlling an image from a digital photograph. Let’s say we take a photograph on a digital camera of the most amazing thing in the world. We want other people to have access to the image, but we don’t want them to be able to change the original in any way. The raw data file from the camera is like the DNA blueprint. We copy it into another format, that can’t be changed very much – a PDF maybe – and then we email out thousands of copies of this PDF, to everyone who asks for it. The PDF is the messenger RNA. If people want to, they can print paper copies from this PDF, as many as they want, and these paper copies are the proteins. So everyone in the world can print the image, but there is only one original file.
71
Why so complicated, why not just have a direct mechanism? There are a number of good reasons that evolution has favoured this indirect method. One of them is to prevent damage to the script, the original image file. When DNA is unzipped it is relatively susceptible to damage and that’s something that cells have evolved to avoid. The indirect way in which DNA codes for proteins minimises the period of time for which a particular stretch of DNA is open and vulnerable. The other reason this indirect method has been favoured by evolution is that it allows a lot of control over the amount of a specific protein that’s produced, and this creates flexibility.
71
epigenetics is one of the mechanisms a cell uses to control the amount of a particular protein that is produced, especially by controlling how many mRNA copies are made from the original template.
72
The last few paragraphs have all been about how genes encode proteins. How many genes are there in our cells? This seems like a simple question but oddly enough there is no agreed figure on this. This is because scientists can’t agree on how to define a gene. It used to be quite straightforward – a gene was a stretch of DNA that encoded a protein. We now know that this is far too simplistic. However, it’s certainly true to say that all proteins are encoded by genes, even if not all genes encode proteins. There are about 20,000 to 24,000 protein-encoding genes in our DNA, a much lower estimate than the 100,000 that scientists thought was a good guess just ten years ago1.
73
Most genes in human cells have quite a similar structure. There’s a region at the beginning called the promoter, which binds the protein complexes that copy the DNA to form mRNA. The protein complexes move along through what’s known as the body of the gene, making a long mRNA strand, until they finally fall off at the end of the gene.
73
Imagine a gene body that is 3,000 base-pairs long, a perfectly sensible length for a gene. The mRNA will also be 3,000 base-pairs long. Each amino acid is encoded by a codon composed of three bases, so we would predict that this mRNA will encode a protein that is 1,000 amino acids long. But, perhaps unexpectedly, what we find is that the protein is usually considerably shorter than this.
73
If the sequence of a gene is typed out it looks like a long string of combinations of the letters A, C, G and T. But if we analyse this with the right software, we find that we can divide that long string into two types of sequences. The first type is called an exon (for expressed sequence) and an exon can code for a run of amino acids. The second type is called an intron (for inexpressed sequence). This doesn’t code for a run of amino acids. Instead it contains lots of the ‘stop’ codons that signal that the protein should come to an end.
73
When the mRNA is first copied from the DNA it contains the whole run of exons and introns. Once this long RNA molecule has been created, another multi-sub-unit protein complex comes along. It removes all the intron sequences and then joins up the exons to create an mRNA that codes for a continuous run of amino acids. This editing process is called splicing.
73
This again seems extremely complicated, but there’s a very good reason that this complex mechanism has been favoured by evolution. It’s because it enables a cell to use a relatively small number of genes to create a much bigger number of proteins. The way this works is shown in Figure 3.3.
74
The initial mRNA contains all the exons and all the introns. Then it’s spliced to remove the introns. But during this splicing some of the exons may also be removed. Some exons will be retained in the final mRNA, others will be skipped over. The various proteins that this creates may have quite similar functions, or they may differ dramatically. The cell can express different proteins depending on what that cell has to do at a particular time, or because of different signals that it receives. If we define a gene as something that encodes a protein, this mechanism means that just 20,000 or so genes can code for far more than just 20,000 proteins.
74
Figure 3.3 The DNA molecule is shown at the very top of this diagram. The exons, which code for stretches of amino acids, are shown in the dark boxes. The introns, which don’t code for amino acid sequences, are represented by the white boxes. When the DNA is first copied into RNA, indicated by the first arrow, the RNA contains both the exons and the introns. The cellular machinery then removes some or all of the introns (the process known as splicing). The final messenger RNA molecules can thereby code for a variety of proteins from the same gene, as represented by the various words shown in the diagram. For simplicity, all the introns and exons have been drawn as the same size, but in reality they can vary widely.
75
Whenever we describe the genome we talk about it in very two-dimensional terms, almost like a railway track. Peter Fraser’s laboratory at the Babraham Institute outside Cambridge has published some extraordinary work showing it’s probably nothing like this at all. He works on the genes that code for the proteins required to make haemoglobin, the pigment in red blood cells that carries oxygen all around the body. There are a number of different proteins needed to create the final pigment, and they lie on different chromosomes. Doctor Fraser has shown that in cells that produce large amounts of haemoglobin, these chromosome regions become floppy and loop out like tentacles sticking out of the body of an octopus. These floppy regions mingle together in a small area of the cell nucleus, waving about until they can find each other. By doing this, there is an increased chance that all the proteins needed to create the functional haemoglobin pigment will be expressed together at the same time2
379
Schoenfelder et al. (2010), Nat Genet. 42: 53–61.
2
75
Each cell in our body contains 6,000,000,000 base-pairs. About 120,000,000 of these code for proteins. One hundred and twenty million sounds like a lot, but it’s actually only 2 per cent of the total amount. So although we think of proteins as being the most important things our cells produce, about 98 per cent of our genome doesn’t code for protein.
75
Until recently, the reason that we have so much DNA when so little of it leads to a protein was a complete mystery. In the last ten years we’ve finally started to get a grip on this, and once again it’s connected with regulating gene expression through epigenetic mechanisms. It’s now time to move on to the molecular biology of epigenetics.
6
Chapter 4: Life As We Know It Now
77
The cells of the retina express a different set of genes from the cells in the bladder, for example. But how do the different cell types switch different sets of genes on or off?
78
If the DNA stays the same in different cell types in one individual, how can the incredibly precise patterns of gene expression be transmitted down through the generations of cell division?
78
Our analogy of actors reading a script is again useful. Baz Luhrmann hands Leonardo DiCaprio Shakespeare’s script for Romeo and Juliet, on which the director has written or typed various notes – directions, camera placements and lots of additional technical information. Whenever Leo’s copy of the script is photocopied, Baz Luhrmann’s additional information is copied along with it. Claire Danes also has the script for Romeo and Juliet. The notes on her copy are different from those on her co-star’s, but will also survive photocopying. That’s how epigenetic regulation of gene expression occurs – different cells have the same DNA blueprint (the original author’s script) but carrying varied molecular modifications (the shooting script) which can be transmitted from mother cell to daughter cell during cell division.
78
These modifications to DNA don’t change the essential nature of the A, C, G and T alphabet of our genetic script, our blueprint. When a gene is switched on and copied to make mRNA, that mRNA has exactly the same sequence, controlled by the base-pairing rules, irrespective of whether or not the gene is carrying an epigenetic addition. Similarly, when the DNA is copied to form new chromosomes for cell division, the same A, C, G and T sequences are copied.
79
Since epigenetic modifications don’t change what a gene codes for, what do they do? Basically, they can dramatically change how well a gene is expressed, or if it is expressed at all. Epigenetic modifications can also be passed on when a cell divides, so this provides a mechanism for how control of gene expression stays consistent from mother cell to daughter cell. That’s why skin stem cells only give rise to more skin cells, not to any other cell type.
80
The first epigenetic modification to be identified was DNA methylation. Methylation means the addition of a methyl group to another chemical, in this case DNA. A methyl group is very small. It’s just one carbon atom linked to three hydrogen atoms. Chemists describe atoms and molecules by their ‘molecular weight’, where the atom of each element has a different weight. The average molecular weight of a base-pair is around 600 Da (the Da stands for Daltons, the unit that is used for molecular weight). A methyl group only weighs 15 Da. By adding a methyl group the weight of the base-pair is only increased by 2.5 per cent. A bit like sticking a grape on a tennis ball.
80
Figure 4.1 shows what DNA methylation looks like chemically.
The base shown is C – cytosine. It’s the only one of the four DNA bases that gets methylated, to form 5-methylcytosine. The ‘5’ refers to the position on the ring where the methyl is added, not to the number of methyl groups; there’s always only one of these. This methylation reaction is carried out in our cells, and those of most other organisms, by one of three enzymes called DNMT1, DNMT3A or DNMT3B. DNMT stands for DNA methyltransferase. The DNMTs are examples of epigenetic ‘writers’ – enzymes that create the epigenetic code. Most of the time these enzymes will only add a methyl group to a C that is followed by a G. C followed by G is known as CpG.
80
Figure 4.1 The chemical structures of the DNA base cytosine and its epigenetically modified form, 5-methylcytosine. C: carbon; H: hydrogen; N: nitrogen; O: oxygen. For simplicity, some carbon atoms have not been explicitly shown, but are present where there is a junction of two lines.
81
This CpG methylation is an epigenetic modification, which is also known as an epigenetic mark. The chemical group is ‘stuck onto’ DNA but doesn’t actually alter the underlying genetic sequence. The C has been decorated rather than changed. Given that the modification is so small, it’s perhaps surprising that it will come up over and over again in this book, and in any discussion of epigenetics. This is because methylation of DNA has profound effects on how genes are expressed, and ultimately on cellular, tissue and whole-body functions.
81
In the early 1980s it was shown that if you injected DNA into mammalian cells, the amount of methylation on the injected DNA affected how well it was transcribed into RNA. The more methylated the injected DNA was, the less transcription that occurred1. In other words, high levels of DNA methylation were associated with genes that were switched off. However, it wasn’t clear how significant this was for the genes normally found in the nuclei of cells, rather than ones that were injected into cells.
380
- Kruczek and Doerfler (1982), EMBO J. 1:409–14.
81
The key work in establishing the importance of methylation in mammalian cells came out of the laboratory of Adrian Bird, who has spent most of his scientific career in Edinburgh, Conrad Waddington’s old stomping ground. Professor Bird is a Fellow of the Royal Society and a former Governor of the Wellcome Trust, the enormously influential independent funding agency in UK science.
82
In 1985 Adrian Bird published a key paper in Cell showing that most CpG motifs were not randomly distributed throughout the genome. Instead the majority of CpG pairs were concentrated just upstream of certain genes, in the promoter region2. Promoters are the stretches of the genome where the DNA transcription complexes bind and start copying DNA to form RNA. Regions where there is a high concentration of CpG motifs are called CpG islands.
82
In about 60 per cent of the genes that code for proteins, the promoters lie within CpG islands. When these genes are active, the levels of methylation in the CpG island are low. The CpG islands tend to be highly methylated only when the genes are switched off. Different cell types express different genes, so unsurprisingly the patterns of CpG island methylation are also different across different cell types.
82
For quite some time there was considerable debate about what this association meant. It was the old cause or effect debate. One interpretation was that DNA methylation was essentially a historical modification – genes were repressed by some unknown mechanism and then the DNA became methylated. In this model, DNA methylation was just a downstream consequence of gene repression. The other interpretation was that the CpG island became methylated, and it was this methylation that switched the gene off. In this model the epigenetic modification actually causes the change in gene expression. Although there is still the occasional argument about this among competing labs, the vast majority of scientists in this field now believe that the data generated in the quarter of a century since Adrian Bird’s paper are consistent with the second, causal model. Under most circumstances, methylation of the CpG island at the start of a gene turns that gene off.
83
Adrian Bird went on to investigate how DNA methylation switches genes off. He showed that when DNA is methylated, it binds a protein called MeCP2 (Methyl CpG binding protein 2)3. However, this protein won’t bind to unmethylated CpG motifs, which is pretty amazing when we look back at Figure 4.1 and think how similar the methylated and unmethylated forms of cytosine really are. The enzymes that add the methyl group to DNA have been described as writers of the epigenetic code. MeCP2 doesn’t add any modifications to DNA. Its role is to enable the cell to interpret the modifications on a DNA region. MeCP2 is an example of a ‘reader’ of the epigenetic code.
83
Once MeCP2 binds to 5-methylcytosine in a gene promoter it seems to do a number of things. It attracts other proteins that also help to switch the gene off4. It may also stop the DNA transcription machinery from binding to the gene promoter, and this prevents mRNA messenger molecule from being produced5. Where genes and their promoters are very heavily methylated, binding of MeCP2 seems to be part of a process where that region of a chromosome gets shut down almost permanently. The DNA becomes incredibly tightly coiled up and the gene transcription machinery can’t get access to the base-pairs to make mRNA copies.
380
-
Bird et al. (1985), Cell 40: 91–99.
-
Lewis et al. (1992), Cell 69: 905–14.
-
Nan et al. (1998), Nature 393: 386–9.
-
For a recent review of the actions of MeCP2, see Adkins and Georgel (2011), Biochem Cell Biol. 89: 1–11.
-
Guy et al. (2007), Science 315: 1143–7.
-
The most important papers from the Allis lab in 1996 were: Brownell et al. (1996), Cell 84: 843–51; Vettese-Dadey et al. (1996), EMBO J. 15: 2508–18; Kuo et al. (1996), Nature 383: 269–72.
83
This is one of the reasons why DNA methylation is so important. Remember those 85 year old neurons in the brains of senior citizens? For over eight decades DNA methylation has kept certain regions of the genome incredibly tightly compacted and so the neuron has kept certain genes completely repressed. This is why our brain cells never produce haemoglobin, for example, or digestive enzymes.
84
But what about the other situation, the example of skin stem cells dividing very frequently but always just creating new skin cells, rather than some other cell type such as bone? In this situation, the pattern of DNA methylation is passed from mother cell to daughter cells. When the two strands of the DNA double helix separate, each gets copied using the base-pairing principle, as we saw in Chapter 3. Figure 4.2 illustrates what happens when this replication occurs in a region where the CpG is methylated on the C.
84
Figure 4.2 This schematic shows how DNA methylation patterns can be preserved when DNA is replicated. The methyl group is represented by the black circle. Following separation of the parent DNA double helix in step 1, and replication of the DNA strands in step 2, the new strands are ‘checked’ by the DNA methyltransferase 1 (DNMT1) enzyme. DNMT1 can recognise that a methyl group at a cytosine motif on one strand of a DNA molecule is not matched on the newly synthesised strand. DNMT1 transfers a methyl group to the cytosine on the new strand (step 3). This only occurs where a C and a G are next to each other in a CpG motif. This process ensures that the DNA methylation patterns are maintained following DNA replication and cell division.
84
DNMT1 can recognise if a CpG motif is only methylated on one strand. When DNMT1 detects this imbalance, it replaces the ‘missing’ methylation on the newly copied strand. The daughter cells will therefore end up with the same DNA methylation patterns as the parent cell. As a consequence, they will repress the same genes as the parent cell and the skin cells will stay as skin cells.
89
Neurons are very different from skin cells. If both cells types use DNA methylation to switch off certain genes, and to keep them switched off, they must be using the methylation at different sets of genes. Otherwise they would all be expressing the same genes, to the same extent, and they would inevitably then be the same types of cells instead of being neurons and skin cells.
89
The solution to how two cell types can use the same mechanism to create such different outcomes lies in how DNA methylation gets targeted to different regions of the genome in different cell types. This takes us into the second great area of molecular epigenetics. Proteins.
90
DNA is often described as if it’s a naked molecule, i.e. DNA and nothing else. If we visualise it at all in our minds, a DNA double helix probably looks like a very long twisty railway track. This is pretty much how we described it in the previous chapter. But in reality it’s actually nothing like that, and many of the great breakthroughs in epigenetics came about when scientists began to appreciate this fully.
90
DNA is intimately associated with proteins, and in particular with proteins called histones. At the moment most attention in epigenetics and gene regulation is focused on four particular histone proteins called H2A, H2B, H3 and H4. These histones have a structure known as ‘globular’, as they are folded into compact ball-like shapes. However, each also has a loose floppy chain of amino acids that sticks out of the ball, which is called the histone tail. Two copies of each of these four histone proteins come together to form a tight structure called the histone octamer (so called because it’s formed of eight individual histones).
90
It might be easiest to think of this octamer as eight ping-pong balls stacked on top of each other in two layers. DNA coils tightly around this protein stack like a long liquorice whip around marshmallows, to form a structure called the nucleosome. One hundred and forty seven base-pairs of DNA coil around each nucleosome. Figure 4.3 is a very simplified representation of the structure of a nucleosome, where the white strand is DNA and the grey wiggles are the histone tails.
90
If we had read anything about histones even just fifteen years ago, they would probably have been described as ‘packaging proteins’, and left at that. It’s certainly true that DNA has to be packaged. The nucleus of a cell is usually only about 10 microns in diameter – that’s 1/100th of a millimetre – and if the DNA in a cell was just left all floppy and loose it could stretch for 2 metres. The DNA is curled tightly around the histone octamers and these are all stacked closely on top of each other.
91
Certain regions of our chromosomes have an extreme form of that sort of structure almost all the time. These tend to be regions that don’t really code for any genes. Instead, they are structural regions such as the very ends of chromosomes, or areas that are important for separating chromosomes after DNA has been duplicated for cell division.
91
Figure 4.3 The histone octamer (2 molecules each of histones H2A, H2B, H3 and H4) stacked tightly together, and with DNA wrapped around it, forms the basic unit of chromatin called the nucleosome.
91
The regions of DNA that are really heavily methylated also have this hyper-condensed structure and the methylation is very important in establishing this configuration. It’s one of the mechanisms used to keep certain genes switched off for decades in long-lived cell types such as neurons.
91
But what about those regions that aren’t screwed down tight, where there are genes that are switched on or have the potential to be switched on? This is where the histones really come into play. There is so much more to histones than just acting as a molecular reel for wrapping DNA around. If DNA methylation represents the semi-permanent additional notes on our script of Romeo and Juliet, histone modifications are the more tentative additions. They may be like pencil marks, that survive a few rounds of photocopying but eventually fade out. They may be even more transient, like Post-It notes, used very temporarily.
92
A substantial number of the breakthroughs in this field have come from the lab of Professor David Allis at Rockefeller University in New York.
92
In a remarkable flurry of papers in 1996, he and his colleagues showed that histone proteins were chemically modified in cells, and that this modification increased expression of genes near a specific modified nucleosome8.
92
The histone modification that David Allis identified was called acetylation. This is the addition of a chemical group called an acetyl, in this case to a specific amino acid named lysine on the floppy tail of one of the histones. Figure 4.4 shows the structures of lysine and acetyl-lysine, and we can again see that the modification is relatively small. Like DNA methylation, lysine acetylation is an epigenetic mechanism for altering gene expression which doesn’t change the underlying gene sequence.
92
Figure 4.4 The chemical structures of the amino acid lysine and its epigenetically modified form, acetyl-lysine. C: carbon; H: hydrogen; N: nitrogen; O: oxygen. For simplicity, some carbon atoms have not been explicitly shown, but are present where there is a junction of two lines.
93
So back in 1996 there was a nice simple story. DNA methylation turned genes off and histone acetylation turned genes on. But gene expression is much more subtle than genes being either on or off. Gene expression is rarely an on-off toggle switch; it’s much more like the volume dial on a traditional radio. So perhaps it was unsurprising that there turned out to be more than one histone modification. In fact, more than 50 different epigenetic modifications to histone proteins have been identified since David Allis’s initial work, both by him and by a large number of other laboratories9. These modifications all alter gene expression but not always in the same way. Some histone modifications push gene expression up, others drive it down. The pattern of modifications is referred to as a histone code10. The problem that epigeneticists face is that this is a code that is extraordinarily difficult to read.
380
-
A useful review by one of the leading researchers in the field is Kouzarides, T. (2007) Cell 128: 693–705.
-
Jenuwein and Allis (2001), Science 293: 1074–80.
93
Imagine a chromosome as the trunk of a very big Christmas tree. The branches sticking out all over the tree are the histone tails and these can be decorated with epigenetic modifications. We pick up the purple baubles and we put one, two or three purple baubles on some of the branches. We also have green icicle decorations and we can put either one or two of these on some branches, some of which already have purple baubles on them. Then we pick up the red stars but are told we can’t put these on a branch if the adjacent branch has any purple baubles. The gold snowflakes and green icicles can’t be present on the same branch. And so it goes on, with increasingly complex rules and patterns. Eventually, we’ve used all our decorations and we wind the lights around the tree. The bulbs represent individual genes. By a magical piece of software programming, the brightness of each bulb is determined by the precise conformation of the decorations surrounding it. The likelihood is that we would really struggle to predict the brightness of most of the bulbs because the pattern of Christmas decorations is so complicated.
94
That’s where scientists currently are in terms of predicting how all the various histone modification combinations work together to influence gene expression. It’s reasonably clear in many cases what individual modifications can do, but it’s not yet possible to make accurate predictions from complex combinations.
94
There are major efforts being made to learn how to understand this code, with multiple labs throughout the world collaborating or competing in the use of the fastest and most complex technologies to address this problem. The reason for this is that although we may not be able to read the code properly yet, we know enough about it to understand that it’s extremely important.
95
Some of the key evidence comes from developmental biology, the field from which so many great epigenetic investigators have emerged. As we have already described, the single-celled zygote divides, and very quickly daughter cells start to take on discrete functions. The first noticeable event is that the cells of the early embryo split into the inner cell mass (ICM) and the trophoectoderm. The ICM cells in particular start to differentiate to form an increasing number of different cell types. This rolling of the cells down the epigenetic landscape is, to quite a large degree, a self-perpetuating system.
95
The key concept to grasp at this stage is the way that waves of gene expression and epigenetic modifications follow on from each other. A useful analogy for this is the game of Mousetrap, first produced in the early 1960s and still on sale today. Players have to build an insanely complex mouse trap during the course of the game. The trap is activated at one end by the simple act of releasing a ball. This ball passes down and through all sorts of contraptions including a slide, a kicking boot, a flight of steps and a man jumping off a diving board. As long as the pieces have been put together properly, the whole ridiculous cascade operates perfectly, and the toy mice get caught under a net. If one of the pieces is just slightly mis-aligned, the crazy sequence judders to a halt and the trap doesn’t work.
96
The developing embryo is like Mousetrap. The zygote is pre-loaded with certain proteins, mainly from the egg cytoplasm. These egg-derived proteins move into the nucleus and bind to target genes, which we’ll call Boots (in honour of Mousetrap), and regulate their expression. They also attract a select few epigenetic enzymes to the Boots genes. These epigenetic enzymes may also have been ‘donated’ from the egg cytoplasm and they set up longer-lasting modifications to the DNA and histone proteins of chromatin, also influencing how these Boots genes are switched on or off. The Boots proteins bind to the Divers genes, and switch these on. Some of these Divers genes may themselves encode epigenetic enzymes, which will form complexes on members of the Slides family of genes, and so on. The genetic and epigenetic proteins work together in a seamless orderly procession, just like the events in Mousetrap once the ball has been released. Sometimes a cell will express a little more or a little less of a key factor, one whose expression is on a finely balanced threshold. This has the potential to alter the developmental path that the cell takes, as if twenty Mousetrap games had been connected up. Slight deviations in how the pieces were fitted together, or how the ball rolled at critical moments, would trigger one trap and not another.
96
The names in our analogy are made up, but we can apply this to a real example. One of the key proteins in the very earliest stages of embryonic development is Oct4. Oct4 protein binds to certain key genes, and also attracts a specific epigenetic enzyme. This enzyme modifies the chromatin and alters the regulation of that gene. Both Oct4 and the epigenetic enzyme with which it works are essential for development of the early embryo. If either is absent, the zygote can’t even develop as far as creating an ICM.
97
The patterns of gene expression in the early embryo eventually feed back on themselves. When certain proteins are expressed, they can bind to the Oct4 promoter and switch off expression of this gene. Under normal circumstances, somatic cells just don’t express Oct4. It would be too dangerous for them to do so because Oct4 could disrupt the normal patterns of gene expression in differentiated cells, and make them more like stem cells.
97
This is exactly what Shinya Yamanaka did when he used Oct4 as a reprogramming factor. By artificially creating very high levels of Oct4 in differentiated cells, he was able to ‘fool’ the cells into acting like early developmental cells. Even the epigenetic modifications were reset – that’s how powerful this gene is.
97
Normal development has yielded important evidence of the significance of epigenetic modifications in controlling cell fate. Cases where development goes awry have also shown us how important epigenetics can be.
For example, a 2010 publication in Nature Genetics identified the mutations that cause a rare disease called Kabuki syndrome. Kabuki syndrome is a complex developmental disorder with a range of symptoms that include mental retardation, short stature, facial abnormalities and cleft palate. The paper showed that Kabuki syndrome is caused by mutations in a gene called MLL211. The MLL2 protein is an epigenetic writer that adds methyl groups to a specific lysine amino acid at position 4 on histone H3. Patients with this mutation are unable to write their epigenetic code properly, and this leads to their symptoms.
97
Human diseases can also be caused by mutations in enzymes that remove epigenetic modifications, i.e. ‘erasers’ of the epigenetic code. Mutations in a gene called PHF8, which removes methyl groups from a lysine at position 20 on histone H3, cause a syndrome of mental retardation and cleft palate12. In these cases, the patient’s cells put epigenetic modifications on without problems, but don’t remove them properly.
380
-
Ng et al. (2010), Nat Genet. 42: 790–3.
-
Laumonnier et al. (2005), J Med Genet. 42: 780–
98
It’s interesting that although the MLL2 and PHF8 proteins have different roles, the clinical symptoms caused by mutations in these genes have overlaps in their presentation. Both lead to cleft palate and mental retardation. Both of these symptoms are classically considered as reflecting problems during development. Epigenetic pathways are important throughout life, but seem to be particularly significant during development.
98
In addition to these histone writers and erasers there are over 100 proteins that act as ‘readers’ of this histone code by binding to epigenetic marks. These readers attract other proteins and build up complexes that switch on or turn off gene expression. This is similar to the way that MeCP2 helps turn off expression of genes that are carrying DNA methylation.
98
Histone modifications are different to DNA methylation in a very important way. DNA methylation is a very stable epigenetic change. Once a DNA region has become methylated it will tend to stay methylated under most conditions. That’s why this epigenetic modification is so important for keeping neurons as neurons, and why there are no teeth in our eyeballs. Although DNA methylation can be removed in cells, this is usually only under very specific circumstances and it’s quite unusual for this to happen.
99
Most histone modifications are much more plastic than this. A specific modification can be put on a histone at a particular gene, removed and then later put back on again. This happens in response to all sorts of stimuli from outside the cell nucleus. The stimuli can vary enormously. In some cell types the histone code may change in response to hormones. These include insulin signalling to our muscle cells, or oestrogen affecting the cells of the breast during the menstrual cycle. In the brain the histone code can change in response to addictive drugs such as cocaine, whereas in the cells lining the gut, the pattern of epigenetic modifications will alter depending on the amounts of fatty acids produced by the bacteria in our intestines. These changes in the histone code are one of the key ways in which nurture (the environment) interacts with nature (our genes) to create the complexity of every higher organism on earth.
99
Histone modifications also allow cells to ‘try out’ particular patterns of gene expression, especially during development. Genes become temporarily inactivated when repressive histone modifications (those which drive gene expression down) are established on the histones near those genes. If there is an advantage to the cell in those genes being switched off, the histone modifications may last long enough to lead to DNA methylation. The histone modifications attract reader proteins that build up complexes of other proteins on the nucleosome. In some cases the complexes may include DNMT3A or DNMT3B, two of the enzymes that deposit methyl groups on CpG DNA motifs. Under these circumstances, the DNMT3A or 3B can ‘reach across’ from the complex on the histone and methylate the adjacent DNA. If enough DNA methylation takes place, expression of the gene will shut down. In extreme circumstances the whole chromosome region may become hyper-compacted and inactivated for multiple cell divisions, or for decades in a non-dividing cell like a neuron.
100
Why have organisms evolved such complex patterns of histone modifications to regulate gene expression? The systems seem particularly complex when you contrast them with the fairly all-or-nothing effects of DNA methylation. One of the reasons is probably because the complexity allows sophisticated fine-tuning of gene expression. Because of this, cells and organisms can adapt their gene expression appropriately in response to changes in their environment, such as availability of nutrients or exposure to viruses. But as we shall see in the next chapter, this fine-tuning can result in some very strange consequences indeed.