This site is not maintained. Click here for the new website of Richard Dawkins.

← Tree of Life Project Aims for Every Twig and Leaf

Tree of Life Project Aims for Every Twig and Leaf - Comments

Jos Gibbons's Avatar Comment 1 by Jos Gibbons

There are more possible trees for just 25 species than there are stars in the universe.

Does anyone know an approximate formula for the number of possible trees for N species with N large? Or, if not a good approximate formula for that number, one for its logarithm or something like that? (Compare Stirling's approximation N! = root(2piN)((N/e) to the N), whose logarithmic form is sometimes made even worse as an approximation by dropping the lnroot(2piN) term from ln N!, giving N! = (N/e) to the N, which is way* off. Interpret "something like that" as double logarithm etc.)

Mon, 04 Jun 2012 21:43:54 UTC | #945579

zengardener's Avatar Comment 2 by zengardener

An exciting and very worthwhile project that will never be complete.

Mon, 04 Jun 2012 21:52:30 UTC | #945582

Opisthokont's Avatar Comment 3 by Opisthokont

The exact formula for the number of possible trees for N species is N! x (N-1)! / 2(N-1).

Mon, 04 Jun 2012 22:02:54 UTC | #945586

God fearing Atheist's Avatar Comment 4 by God fearing Atheist

Comment 1 by Jos Gibbons :

Does anyone know an approximate formula for the number of possible trees for N species with N large?

My first stab is an algorithm (assuming a binary tree):-

1) Start with one species (LUCA), and N-1 unassigned species.

2) Assign a species as a branch of each existing branch, and decrement the unassigned species.

In this case, there is one branch which forks in two, and there are N-2 unassigned species.

3) In general goto 2. If there are 2 assigned species, there are two branches from which the third species can branch, making 2 trees of 3 species, and N-3 left to assign.

Each tree of 3 species can have a branch at 3 places, giving 6 trees of 4 species in total.

Each tree of 4 species can have a branch at 4 places, giving 24 trees of 5 species in total.

" " " 5 " " " " " " 5 " " 120 trees " 6 " " "

Thf. N species = 1 * 2 * 3 * 4 * 5 * 6 * ... * N = N!

This is probably taught in Computer Science 101, so I'm about to Google it ...

Mon, 04 Jun 2012 22:18:51 UTC | #945588

God fearing Atheist's Avatar Comment 5 by God fearing Atheist

Google:- link

Binary tree with n+1 leaves = n-th Catalan number = (2n!) /(n+1)!n!

Mon, 04 Jun 2012 22:32:08 UTC | #945590

God fearing Atheist's Avatar Comment 6 by God fearing Atheist

Comment 4 by God fearing Atheist :

Oops, that algorithm produces duplicate trees! The Catalan numbers don't go up as fast.

Mon, 04 Jun 2012 22:49:38 UTC | #945592

RobertJames's Avatar Comment 7 by RobertJames

I'd say impossible but I could wrong. To my understanding the Darwinian tree of life idea, though attractive is not generally regarded as a good way to represent it because it now appears that cross species swapping of genetic information is far more common than previously realised, certainly at bacterial level, but also it seems at plant and animal levels. DNA wedged into a genome through bacterial action throws a real spanner in the works. They would probably have to preclude bacteria from the tree in order for it to make any sense but their exclusion would leave gapping holes. One thing is for certain, it wont look much like a tree; more like a Jackson Pollock on LSD. But what an achievement if they can pull it off.

Tue, 05 Jun 2012 00:10:04 UTC | #945611

Michael Gray's Avatar Comment 8 by Michael Gray

The formula that assume that species form a tree is bound to fail, as the true structure is more like a ragged net, or rete.

Vis: some species split and then recombine in some way later down the track, formed a closed polygon. This cannot be visualised by a pure tree structure.

I fear this effort is doomed, if they don't take this basic fact into account.

Tue, 05 Jun 2012 05:40:04 UTC | #945643

Alan4discussion's Avatar Comment 9 by Alan4discussion

@Op Now Dr. Katz and a number of other colleagues are doing something new. They are drawing a tree of life that includes every known species. A tree, in other words, with about two million branches.

I wish them luck with that one!!!

Botanists for example have numerous arguments about what constitutes a genera, species, sub-species, variety or cultivar, with frequent diversity over an extended habitat range!

Then there is the feature of ring species in animals. -

http://rationalwiki.org/wiki/Ring_species - Ring species are a midpoint in the process of speciation. It happens when a species has split into several populations, some of which cannot interbreed, but some of which can breed with two or more populations that cannot breed with each other. Or A can breed with B, B can breed with C, but A cannot breed with C.

The standard example of ring species is the circumpolar species "ring" of gulls of genus Larus. The range of these gulls forms a ring around the North Pole; there are seven populations and each population can breed with the previous and next, but the first and last cannot interbreed. This example is actually far more complicated, as again is typical in the real world - there are several other taxonomically unclear examples which belong in the same superspecies complex, such as L. michahellis, L. hyperboreus and L. cachinnans.

Diversification and adaptation is an on-going process, with species only being gradually separated into separate non-interbreeding populations, by geographical separation or natural selection by different life-styles.

...and that's before we even look at horizontal genetic exchanges in microbes. - http://www.sci.sdsu.edu/~smaloy/MicrobialGenetics/topics/genetic-exchange/exchange/exchange.html

Tue, 05 Jun 2012 11:14:10 UTC | #945664

gos's Avatar Comment 10 by gos

I'd like to try and answer some of the possible technical difficulties raised above as well as I can (I'm an interested layman).

It is my understanding that the project is a tree of life for all organisms currently alive, which organises them according to relative relatedness.

Comment 8 by Michael Gray :

Vis: some species split and then recombine in some way later down the track, formed a closed polygon. This cannot be visualised by a pure tree structure.

If we understand that the branches separating the end points of the tree are not intended to be illustrative of any evolutionary "path," but merely indicators of how related species are, and that we are solely dealing with species currently alive, I don't see that this is a problem. A species is either in a split state at any moment, in which case it will be represented as two end points, or it will have recombined, in which case the split will not be represented in the tree. This is not a failing unless the tree is supposed to represent extinct species as well, which I don't think that it is supposed to.

If there are past species represented in the tree as well (one would be tempted to stick in any species that one has DNA for, as well as educated guesses for dinosaurs, et.al.) then the assumption of a strict tree form would give an error for this scenario. The modern species would be shown as a descendant of one of the "split" species, and the other "split" species would shown as an evolutionary dead end. This would be an error, but it's good to keep in mind that it would be a purely local error, and have absolutely no effect on the accuracy of the rest of the tree. Such errors could thus potentially be spotted by other means than the main algorithm and corrected one by one.

Comment 9 by Alan4discussion :

Botanists for example have numerous arguments about what constitutes a genera, species, sub-species, variety or cultivar, with frequent diversity over an extended habitat range!

Isn't this really a moot point? Arguments over what should be defined as a genera/species/subspecies/etc. are arguments over where lines should be drawn between groups of organisms, not how the groups should be connected. At worst, this will lead to disagreements whether certain end points in the tree are split too fine or too coarse (so to speak), and this only happens when we demand that we use already contentious terms such as "species" for each end point, rather than agreeing that they stand for groups of organisms that are coherently genetically differentiable from other groups, and that we are only using the term "species" as a placeholder.

Then there is the feature of ring species in animals. -

End points representing ring species clearly call for a slightly different look than end points representing more "traditional" species, but this a local "bluriness" that has more to do with representation of end points than any real problem for the project. A ring species is clearly a group of organisms that is coherently genetically differentiable from other groups (i.e. they are all more closely related to one another, even the ones that can't interbreed, than they are to all other organisms).

I don't know enough about horizontal genetic exchange to comment on it. It would obviously create technical problems in using an alogrithm based on genetic difference to calculate relatedness (because you'd have to be able to differentiate between genetic material from an ancestor and material donated horizontally), but I'm curious to whether the actual "parent" of an individual is thrown into doubt.

Tue, 05 Jun 2012 14:32:35 UTC | #945693

Alan4discussion's Avatar Comment 11 by Alan4discussion

Comment 10 by gos

There are some points which need clarification:

If we understand that the branches separating the end points of the tree are not intended to be illustrative of any evolutionary "path," but merely indicators of how related species are,

They are both indicators of the evolutionary path and an indication of the relatedness of present species.

and that we are solely dealing with species currently alive, I don't see that this is a problem.

No! The tree of life goes all the way back to LUCA and beyond. - http://en.wikipedia.org/wiki/Last_universal_ancestor - There is a "tree of life" illutrated on this link, showing the primary branches.

A species is either in a split state at any moment, in which case it will be represented as two end points, or it will have recombined, in which case the split will not be represented in the tree.

While the tree is good for illustrative purposes, considering ALL genera and species as simple branches is an over-simplification. Some are more like the crests or fans of a "cristate" branch - see pictures - (http://toptropicals.com/catalog/uid/euphorbia_lactea.htm) where it has spread out but has not separated into separate branches. It is when selection or geographical separation, opens up gaps, that, in time, the variations progress to become separate species.

This is not a failing unless the tree is supposed to represent extinct species as well, which I don't think that it is supposed to.

It must represent the on-going diversification including past and extinct species, as all species are "transitional species" which are continuing to evolve. Extinct species would be represented by dead end branches. Present species would be branches or spreading branches according to environmental selection pressures.

If there are past species represented in the tree as well (one would be tempted to stick in any species that one has DNA for, as well as educated guesses for dinosaurs, et.al.) then the assumption of a strict tree form would give an error for this scenario. The modern species would be shown as a descendant of one of the "split" species, and the other "split" species would shown as an evolutionary dead end. This would be an error, but it's good to keep in mind that it would be a purely local error, and have absolutely no effect on the accuracy of the rest of the tree. Such errors could thus potentially be spotted by other means than the main algorithm and corrected one by one.

There is no preserved DNA beyond a certain point, so many inferences are made from fossils. Dinosaur DNA is pure Hollywood!

Furthermore, due to degradation of the DNA molecules, a process which correlates loosely with factors such as time, temperature and presence of free water, upper limits exist beyond which no DNA is deemed likely to survive. Current estimates suggest that in optimal environments, i.e. environments which are very cold, such as permafrost or ice, an upper limit of around 1 million years exists. - http://en.wikipedia.org/wiki/Ancient_DNA

There is much that is not known. We have not yet discovered all the present-day species on Earth, let alone all the past ones.

End points representing ring species clearly call for a slightly different look than end points representing more "traditional" species, but this a local "bluriness" that has more to do with representation of end points than any real problem for the project.

This was one of the issue with pre-Darwin biologists. They had the continuum of life fitted into distinct little boxes of categories, based on classifying individual collected samples. Genetics looks at whole populations, which gives a much more diverse picture.

Tue, 05 Jun 2012 15:45:32 UTC | #945704

Alan4discussion's Avatar Comment 12 by Alan4discussion

Comment 10 by gos

I don't know enough about horizontal genetic exchange to comment on it. It would obviously create technical problems in using an alogrithm based on genetic difference to calculate relatedness (because you'd have to be able to differentiate between genetic material from an ancestor and material donated horizontally), but I'm curious to whether the actual "parent" of an individual is thrown into doubt.

This is non-sexual reproduction with a multitude of parental sources, all of which had ancestors with a multitude of parental sources. The majority of simple life forms have not yet been discovered or classified. This makes some studies very difficult!

A universal common ancestor is at least 10 to the power of2860 times more probable than having multiple ancestors…
A model with a single common ancestor but allowing for some gene swapping among species was... 10 to the power of 3489 times more probable than the best multi-ancestor model...http://en.wikipedia.org/wiki/Last_universal_ancestor

Tue, 05 Jun 2012 15:55:44 UTC | #945705

Opisthokont's Avatar Comment 13 by Opisthokont

Aargh! The equation I posted had an error: for some reason the caret was not printed. The actual equation is N! x (N-1)! / 2 xx (N-1), where 'xx' means exponentiation. That sounds like it reduces the number by a lot, but in fact the double factorials quickly swamp the exponential.

Tue, 05 Jun 2012 16:42:23 UTC | #945714

gos's Avatar Comment 14 by gos

Comment 11 by Alan4discussion :

one would be tempted to stick in any species that one has DNA for, as well as educated guesses for dinosaurs, et.al.

There is no preserved DNA beyond a certain point, so many inferences are made from fossils. Dinosaur DNA is pure Hollywood!

Yes, I know. This is why I said "any species that one has DNA for, as well as educated guesses for dinosaurs" (emphasis added).

Tue, 05 Jun 2012 18:24:30 UTC | #945731

Alan4discussion's Avatar Comment 15 by Alan4discussion

Comment 10 by gos

Arguments over what should be defined as a genera/species/subspecies/etc. are arguments over where lines should be drawn between groups of organisms, not how the groups should be connected. At worst, this will lead to disagreements whether certain end points in the tree are split too fine or too coarse (so to speak), and this only happens when we demand that we use already contentious terms such as "species" for each end point, rather than agreeing that they stand for groups of organisms that are coherently genetically differentiable from other groups, and that we are only using the term "species" as a placeholder.

Science, as we know is revised in the light of new evidence. A lot of the problems with the classification of species and genera arise from earlier classifications, based on phenotypic features, where later examination of the DNA has shown these to be in error. As many species have not yet had their DNA examined there is quite a lot of room for disagreement about precise relationships between related organisms, and the status (Genera, species, sub-species, variety, hybrid etc) of specific individual specimens.

and that we are only using the term "species" as a placeholder

Indeed so! Evolving living things do not fit exactly into defined classifications. The example I gave of "ring species" illustrates the :-

Problem of definition
The problem, then, is whether to quantify the whole ring as a single species (despite the fact that not all individuals can interbreed) or to classify each population as a distinct species (despite the fact that it can interbreed with its near neighbours). Ring species illustrate that the species concept is not as clear-cut as it is often thought to be. - http://en.wikipedia.org/wiki/Ring_species#Other_examples

Comment 14 by gos - Yes, I know. This is why I said "any species that one has DNA for, as well as educated guesses for dinosaurs" (emphasis added).

I agree, but wished to emphasize and clarify the point that DNA can only be used for recent species.

Tue, 05 Jun 2012 23:17:14 UTC | #945775

mildcat's Avatar Comment 16 by mildcat

I am home.

Wed, 06 Jun 2012 04:19:27 UTC | #945801

gos's Avatar Comment 17 by gos

Comment 11 by Alan4discussion :

It must represent the on-going diversification including past and extinct species, as all species are "transitional species" which are continuing to evolve.

This is a non-sequitur. It is true that "all species are "transitional species" which are continuing to evolve," but it does not follow that the project being described must represent on-going diversification.

An idealised "tree of life" like that which you are describing, with all known species, both living and extinct, an extrapolated LUCA, and lots of other information which could be encoded in such a diagram, including, but not limited to: the distance from the LUCA indicating at what time the species lived, the thickness of the lines indicating genetic diversity within a "species" (up to "cristate" branch thickness), local non-treelike parts to indicate recombining species and possibly horizontal gene transfer; would be awesome (and I mean that in the sense of awe-inspiring).

However, I read the article carefully, and there isn't really anything in it to indicate that they are trying to do more than the less ambitious (but still quite ambitious) project of drawing up a tree to illustrate the relatedness of currently living species.

Wed, 06 Jun 2012 09:51:36 UTC | #945833

Alan4discussion's Avatar Comment 18 by Alan4discussion

@OP Now Dr. Katz and a number of other colleagues are doing something new. They are drawing a tree of life that includes every known species. A tree, in other words, with about two million branches.

“I think it is an amazing step forward for our community if it can be pulled off,” said Robert P. Guralnick, an expert on evolutionary trees at the University of Colorado who is not part of the project.

I also think it would be an amazing step - IF IT CAN BE PULLED OFF! I also think two million branches is a gross under-estimate.

Comment 17 by gos

Comment 11 by Alan4discussion :

It must represent the on-going diversification including past and extinct species, as all species are "transitional species" which are continuing to evolve.

This is a non-sequitur. It is true that "all species are "transitional species" which are continuing to evolve," but it does not follow that the project being described must represent on-going diversification.

I am not sure where you get the non-sequitur idea from. It certainly must include and track past diversification to date!

However, I read the article carefully, and there isn't really anything in it to indicate that they are trying to do more than the less ambitious (but still quite ambitious) project of drawing up a tree to illustrate the relatedness of currently living species.

This is not possible without identifying the past branching which led to the present separate, but related species and genera. Once you get back to fossils, it is very difficult to tell extinct side-branches from branches leading to modern species. I am not suggesting speculation on future branching, although such trends ARE illustrated in examples like the ring-species of gulls I linked earlier.

The modern definition of classification groups depends upon each species in the group evolving from a single ancestral type with the basic group characteristics - plants all share an ancestor with simple plant characteristics, but the ancestor they share with fungi is neither distinctly plant or fungus, so they have been designated into different Kingdoms. Based upon this criteria, many zoologists think that the Animal Kingdom should be splintered into at least two Kingdoms. The Protista and the Monera are often "made up" of multiple Kingdoms in advanced books on the subjects.
Keep in mind, like all aspects of classification, this fits into the convenience of human labeling, which doesn't always comfortably fit what the real organisms are doing. - http://faculty.fmcc.suny.edu/mcdarby/animals&plantsbook/History/02-Explaining-Life-Classification.htm

I have read and participated in arguments about the relationships within limited family groups of plant genera, species, subspecies and varieties. These can take years even for a very limited area which is only drawing a local branch structure for a particular family. (http://faculty.fmcc.suny.edu/mcdarby/animals&plantsbook/History/02-Explaining-Life-Classification.htm) -
I do not think those who embarked on this project have any idea of the scale of the task they have selected.

http://www.sciencedaily.com/releases/1998/08/980825080732.htm - Now, for the first time, a team of researchers from the University of Georgia has made a direct estimate of the total number of bacteria on Earth -- and the number makes the globe's human population look downright puny.

The group, led by microbiologist William. B. Whitman, estimates the number to be five million trillion trillion -- that's a five with 30 zeroes after it. Look at it this way. If each bacterium were a penny, the stack would reach a trillion light years. These almost incomprehensible numbers give only a sketch of the vast pervasiveness of bacteria in the natural world.

The study could open new areas of inquiry, especially about the rate of mutations and how bacteria operate in nature. The new numbers also point out once again that events that are extremely rare in the laboratory could occur frequently in nature. In the meantime, despite the new estimate of total bacteria, researchers have their hands full just listing the number of bacterial species.

I don't think anyone is going to track the horizontal gene exchange in all of these, any-time soon!

@OP link - Some scientists are reserving judgment on the project until they can actually see the tree on a computer screen. Roderic D. M. Page, a professor of taxonomy at the University of Glasgow, called the Open Tree of Life team “first class,” but added: “Displaying large trees is a hard problem that has so far resisted solution.

Wed, 06 Jun 2012 15:28:54 UTC | #945881

Alan4discussion's Avatar Comment 19 by Alan4discussion

@OP link - Some scientists are reserving judgment on the project until they can actually see the tree on a computer screen. Roderic D. M. Page, a professor of taxonomy at the University of Glasgow, called the Open Tree of Life team “first class,” but added: “Displaying large trees is a hard problem that has so far resisted solution.

While it is possible to display ancestry of SELECTED species as the trunk an branches of a tree, there is no reason to believe (other than in the earliest stages after abiogenesis), that life millions of years in the past, was significantly less diverse than at present.

Therefore any comprehensive tree display would simply be a spherical tangle of branches with insufficient space in the centre to display the earlier diversity!
It is not possible to have a large succession of "Russian-doll" type concentric hollow spheres where the volume of the individual inner spheres is equal to the volume of the individual outer enclosing ones. (The outer ones would thin down to nothing with microscopically short branches).

@OP Now Dr. Katz and a number of other colleagues are doing something new. They are drawing a tree of life that includes every known species.

The error is in this objective. To be intelligible, lines of evolution have to be presented separately to have space on the diagram to illustrate numerous branches.

Thu, 07 Jun 2012 10:14:05 UTC | #946098