Shomarka Omar Y. Keita, M.D. Dphil.  (SOYKeita@yahoo.com)


        Genetic and other biological data have become prominent in the exploration of historical topics of varying time depths. Geneticists are writing an increasing number of papers in which they engage in the construction of “historical” narratives using genetics, or use genetic data in “historical”/chronological frameworks derived from other evidence. These works, at least in theory, encompass the integration and/or reconciliation of evidence from various subjects: historical linguistics, archaeology, ethnology, and history. Such studies are invariably challenging and problematic because of the effort needed to fully control new data and interpretations/conceptualizations, but also the older literature of multiple disciplines, and the terminology of those disciplines. They also require some familiarity with the paradigms and methods of conceptualization in other disciplines, and the debates within fields, like history and archaeology about interpretations and evidence. Incomplete integration or understanding may result in shortcomings which hurt the overall effort and call into question the final interpretations.  In this paper cases are explored in which broader and more effective use of non-genetic data and interdisciplinary collaboration may lead to more interpretive possibilities, data exploration, and hypotheses, while reducing potential errors. Some attention will be given to conceptualizing how various kinds of evidence should be ranked in exploring biological population histories, and how these may relate to culture and political histories.

Population studies that integrate genetics and aspects of culture in order to examine historical topics, or that use history or archaeology to help explain geographical patterns of genetic variation have become prominent in the literature.  This body of work includes, but is not restricted to the exploration of the relationship between patterns of genetic and linguistic differentiation, genetic variation in politically complex societies, and the possible association of genetic variation and the spread of cultural innovations such as the practice of plant and animal domestication.  In all of these examples geography and migration are major background themes, because genes and aspects of culture are seen as moving across space, but not necessarily together.  Such work requires the effective use of historical linguistics, archaeology, ethnology, and written and oral historical texts in the context of population genetics data and theory. The concept of population history actually forces us to consider an integrated biocultural view for various time depths, but the holistic approach is admittedly difficult, and this is not always appreciated by non-historians in writing what is sometimes called “genetic” history.   Studies limited solely to linguistics, ethnology, or genetics, may successfully restrict explanations to the parameters of their own research traditions—and meet the expected canons of those disciplines.  However interdisciplinary studies have to satisfy the canons of multiple disciplines and attend to the issue of tension between the fields, at least in theory. Genes, language, and culture are not intrinsically related or linked, and their association is casual, not causal; at the level of population they may or may not travel together and may change relationships over time.

          These kinds of studies engender numerous difficulties in themselves due to the absence of equal data quality from multiple disciplines.  However, they are made more complicated because of the difficulty of keeping abreast of the developments in many disciplines, not only points of contention and various views of a topic, but major re-interpretations of evidence and the nuanced opinions found within specialized fields. This problem is shared by researchers and readers alike, and it could be argued that all such published work give some review of previous ideas and research, and the associated sociology of knowledge when appropriate.

          Common agreed upon criteria to evaluate multidisciplinary studies still await development. Until that time assessments must use the canons of the involved disciplines. Using standard philosophical perspectives will also help: broadly speaking explanation in science is based on natural laws or principles, but in history on the role of contingent unique events, not amenable to ‘natural laws’ in the conventional sense.  Population “histories” have a double standard to meet, namely adequacy in terms of biological theory, and the credibility and fairness required by historians in order to acknowledge and avoid bias in descriptions, interpretations and explanations (McCullagh 2000, 2004)—or admit their existence, and to produce interpretations that not just “credible” but also “fair”, “balanced” and “not misleading”. Apparently most historians do care and argue about these issues although in the context of post modernist thought and practice the idea his arisen that general interpretations only express a personal point of view. Most certainly no scientist would espouse such a perspective in doing what might be called the work of classic science; however, when such work, specifically genetics is used as part of history, social science, and perhaps in medicine a new element enters into enterprise which has now become synthetic.  The credibility of a causal explanation should be supported by objective evidence which supports one narrative more than another. In some cases this may not be possible, and this should be admitted as well.

               In the most general terms historical works can be evaluated by examining how well they explicitly present and discuss alternative explanations, avoid generalizing from incomplete information, and show proper consideration for all available information or differing views of the same evidence. Broadly speaking historical writing, according to McCullagh (2000) historical writing can be biased in four ways via: 1) the misinterpretation of evidence which could include an incomplete review or misreading of data, 2) the presentation of unfair or unbalanced accounts which may be true at some level, but otherwise omit significant information, 3) a general description of the past that implies facts that are known to be false, and 4) a selectivity in the presentation of causal explanations, where not all of the important causes are given to the reader, hence leaving a false impression. These shortcomings can be by mistake in which case they are simply unjustifiable or wrong. Interdisciplinary work may be more prone to these kinds of errors. For McCullagh (2000) they become biased when they occur because the scholar wants the particular outcome produced by these failures. To this can be added outcomes that are expected or subconsciously produced due to embedded cultural beliefs. These may not always be recognizable. It is possible however, to produce credible accounts that are not comprehensive as long as this is explained; these have to be judged with this in mind.

          Geneticists working in isolation as “historians” have a heavy burden due to all of the issues mentioned above. The pattern of genes across geographical and/or ethnocultural spaces cannot be used as the sole evidence for constructing a narrative of “population” history, or “genetic history.” One theoretical issue, not often confronted, is the ranking of data in terms of what primarily determines the framework of a narrative (Keita et al. 2010). Should genetic data be used to construct the narrative, i.e. construct its framework, with other data being fit into patterns of genetic findings? Or should the genetic data be subsumed to historical frameworks determined by texts, archaeology, linguistics, etc? Is “reciprocal illumination” possible without falling into circular reasoning, and what are the criteria of its limits? When do genetic data become “primary”? It is clear that some concepts, facts, or “factoids” could lead to erroneous narratives, and some of these are rooted in bad concepts or conceptual schema. Historical explanations and general interpretations of the past are expected to be fair and not misleading (McCullagh 2000), even if there is no claim that they are comprehensive (McCullagh 2004). Here I suggest that better narratives are likely to result when teams work in what might be called historical genetics.

          Studies of African-based or related topics provide interesting cases with which to explore the utility of good collaboration in multidisciplinary work.  One historical anecdote serves to illustrate this. Petrus Camper, the famous 18th century Dutch anatomist and inventor of the facial angle, criticized contemporary artists for their renditions of the African Magus in nativity paintings.  Camper noted that the depictions were simply Europeans painted black or dark brown, and that this was morphologically incorrect in terms of the bone structure. He pointed out that the physiognomy, in terms of nose shape and facial profile, were incorrect. However, Camper overstated his case. He apparently based his criticism on his knowledge of the physiognomy of certain Enslaved Africans from the forest belt of West Africa, who primarily possessed the so-called “Negro type” anatomical conformation, which is a stereotype and was often caricatured. Even in Camper’s time it was known by some Europeans that there was also variation in the anatomical “complex” associated with dark skin and frizzy hair.  There are dark-skinned Africans with physiognomies that are similar to those of “Europeans” in having narrow noses, orthognathism etc, but the trait complexes in question should best be described with anatomical terms, and not in group terms from the racial paradigm that implies that individuals with a given morphological complex all have a common “origin.”  Had Camper known about the range of variation even in West Africa he perhaps would have made other criticisms. This anatomical complex composed of  a narrow nose, orthognathism, and narrow face, no matter where found, was once interpreted as being necessarily of “Caucasian” or “Caucasoid” origin via migration or gene flow in a model of “racial” origins (Keita and Kittles 1997)  which assumed a linkage of these traits, versus their being the products of evolution. In other words they were “historicized” as coming from a particular source population as opposed to being the product of “natural history.” In the tropical African context Hiernaux (1975) suggested the term “Elongated African” for this physiognomy, and those close to it, a phrase that emphasizes a relatively narrower nose and face that evolved in a hot-dry climate but has some variation. The author would prefer that the phrase “elongated African trend” be used; its counterpart would be “broad African” trend.  Some migration and admixture are not denied as a part of African biohistory in some regions, but most of the gene flow has been so long ago as to have been reworked by African selection pressures and circumstances, and constitute a  part of an African genuine biological history (Hiernaux 1975). Overlap in a range of biological traits between biogeographical Africans and non-Africans should be expected based on evolutionary theory and the concept of serial founder effect.

             The primary purpose here is not criticism, but an exploration of cases, because in general multidisciplinary cooperation has not been maximized, and African participation has been minimal. While older literature is featured for examples, this is not a review. While some critique is implicit in this discussion it is only in the context of an examination of data or ideas in order to increase hypothesis generation, and not for criticism’s own sake.  Topical “cases” are presented in an effort to illustrate the value of having a multidisciplinary team. The analyses are intended to illustrate the range of possible explanations by integrating various lines of evidence. It is important to be vigilant against answering one discipline’s question with another’s data.

 Genetics and Bantu speakers

 The PN2/M2 biallelic lineage in part maps to the distribution of the family, as does haplotype IV of the TaqI49a,f RFLP system, which in Africa and adjacent regions apparently marks the same clade (see al-Zahery et al. 2003, Underhhill personal communication). The spread of this family is frequently identified with the distribution of these variants in nearly a causal fashion. In other words M2 is said to be a marker of the Bantu expansion, which some earlier writers even thought had gone into West Africa (see e.g. Guthrie 1962).

          However, haplotype IV/ M2 is found in very high frequencies in Africa west of the Cameroons from Nigeria to Atlantic, reaching a frequency of ~80% in a sample from Senegal. Just as interesting is its reported frequency in one study of Egypt (27%) and Nubia (39%) (Lucotte and Mercier 2003).  There are no Bantu speakers in these regions and no evidence that they were ever there. Hence the “Bantu expansion”, a problematic concept especially as often conceived, in any case cannot be used to explain their presence. Furthermore, the Bantu expansion should not be conceived as having been a mass movement of a single people, analogous to an mfecane, or the migration of the Banu Hilal.

            Archaeology and historical linguistics help explore possible credible explanations.  The M2/ haplotype IV marker is found at great frequencies in Niger-Congo speakers in general. It is likely that M2 existed in the early ancestral family—proto-Niger Congo—and got distributed into all of its branches as the family differentiated through space and time. This explanation does not work for Egypt and Nubia since languages spoken there belong to other families. However, archaeological data indicate a late pleistocene recolonization of the eastern Sahara after a probable population hiatus between 50,000 to 15,000 years ago (Wendorf and Schild 2001). The peoples involved can be expected to have been highly diverse. This marker may have entered the Nile Valley with mid-Holocene population Saharan migrations into the Nile Valley (Hassan 1988), which contributed to the peopling of the valley.

            Another possibility is that Nilo-Saharan—to which Nubian belongs—and Niger Congo form a larger language phylum called Kongo-Saharan (Gregersen 1972) or Niger-Saharan (Blench 1995) whose earliest speakers shared a biohistorical heritage that included the M2/RFLP IV marker.  This could also explain the substantial frequencies in Egyptian Nubians and NC speakers as a whole, and not just Bantu speakers.

           However there is a caveat: from a strictly biological point of view it is important to note that M2 emerged likely before any of the language families based on standard estimates of the ages. It is wrong to treat them in effect as having a basal or causal relationship. This would also be true of Afro-Asiatic. On a more interesting and even intriguing level it is worthwhile making the observation that the P2 marker, which is ancestral to both the M2 and M35 or 215/M35, and is therefore older than either, is father to the male clades whose bearers are speakers of the three different language macrofamilies in Africa, with one of them- being Afroasiatic. This is intriguing because it is not known what ancestral language the father of these two lineages spoke. Was it a language that went extinct? Was there an early proto-African language family that led to the current language families? Most western linguists would say “no” to this second question at the time of this writing. In any case the simplistic racialized gene language maps that were once drawn as a validation, in my opinion, of a preconception falls apart when the Y chromosome lineages are examined against language families. The majority of the Afroasiatic speaking males in Africa are connected to speakers of other families via their common P2 “father”, and are therefore genealogically connected in a way that they are not related to Indo-European speakers. This will be surprising to those thinking of northern African peoples in terms of those who most resemble Europeans or Near Eastern neighbors. The E haplogroup places their male affiliation in Africa. A critical narrative to explain the mtDNA profiles has not yet been developed.

           A multidisciplinary approach clearly can help to avoid over generalizations with regard to Bantu speakers. Misinterpretations can skew interpretations.  This in turn could lead to poor study designs in future work. Another issue is the use of Bantu as a euphemism for “Negro” (Robertson and Bradley 2000) from the old unscientific racial schema, which seems to be how some geneticists and morphologists are using the term; while this issue is beyond the scope of this essay it deserves mentioning given the emphasis placed on the sequencing of a “Bantu” genome, which strictly speaking would mean looking at genes that were thought to have arisen at the time the Bantu linguistic branch emerged.

Genes, food production and Afroasiatic

          The spread of animal and plant domesticaion is now frequently argued to be associated with the dispersal of language families, both via demic diffusion, the actual migration of expanding populations over space (Ammerman and Cavalli-Sforza 1984), instead of cultural diffusion.  Such movement would have implications for spatial patterns of genetic variation. This claim for a coterminous spread of genes, language [families] and/or food production is most successfully made by reconciling data and interpretations from genetics, historical linguistics, and archaeology.   When sufficient evidence is available a multidisciplinary approach requires testing all models.

             In terms of Africa one interesting case regards the Afroasiatic language phylum or family. The spread of domesticated wheat, barley, goats and sheep into northern Africa from southwest Asia (the Near East) and its relation to the Afroasiatic language family is a good example. A minority argue that Afro-asiatic is Asian in origin (which requires opposing standard linguistics evidence), the majority that it is African. Either position on origins influences any subsequent interpretations. Unfortunately both models are not usually presented in published work; an imbalanced picture is presented. Most genetic studies and/or interpretations have proceeded from models in which either ancestral Afroasiatic or a later branch, e.g. Berber (Arredi et al. 2004) are posited to have come from the Near East with the spread of food production, without the consideration of alternatives which is problematic when not simply wrong. Generally population migration versus cultural diffusion is given the primary responsibility as the mechanism of change. Here is explored the ramifications of taking into account the views of mainstream linguists, as well of archaeology, in the examination of the genetic data. The issue the construction of the most credible narrative, that is balanced and fair.

             The study of individual Afroasiatic languages, and then later the family itself, formally called Hamito-Semitic, or Semito-Hamitic, has a long history. The great interest, at least in part, is because this family gave rise to the Semitic languages found in the texts used in Judaism, Christianity and Islam.  Ancient Egyptian is the other member that received substantial attention.  Afro-Asiatic has another distinction related to the study of African peoples, because it is connected to a theory of biological and cultural origins called the “Hamitic hypothesis” promulgated by C.G. Seligman (see Sanders 1969). This thesis posited that Southwest Asian peoples called “Hamites” , distant relations of Semites, came into Africa bringing a new language family, pastoralism, “superior” culture, and unique (and “superior”) biology marked by narrower noses and faces, straighter hair, and lighter skin color, although curiously not the skills of plant agriculture.  Nearly all instances of culture in Africa deemed to be noteworthy from a western perspective were attributed ultimately to contact with Hamites.  Certain body builds and physiognomies when not found among “Hamitic” speakers were said to be due to “Hamitic” (read ““Caucasian”) admixture. Thus many African societies by this myth were rendered non-African; aspects of this way of thinking continue.

          The Hamitic hypothesis was overturned largely by Greenberg, and abandoned by Africanist historians. The majority of historical linguists who study the family read the evidence as supporting that Afro-Asiatic is a family of African origin and history, with one branch, Semitic, having had an ancestor that was carried to Asia (the Near East). They generally agree, based on evidence, that the proto-language was spoken by hunters and gatherers, not seed farmers or pastoralists (i.e. food producers) which is a very important observation.  The principle of greatest diversity indicates Africa to be the geographic cradle, and of the concept of “least moves” that the Horn of Africa or southeastern Sahara is the more specific locale of the ancestor. It is generally true that the place of greatest diversity of an ‘evolving’ population or language, all other things being equal, is its likely place of origin. Care has to be taken to understand that in theory if a broad sample of a population migrated that the level of diversity or something close might travel with it. Of equal interest is that the branch of Afroasiatic, Omotic, believed to be the least derived in its relationship to the ancestral proto-family is found only in the Horn of Africa, specifically Ethiopia; if this branch is not considered to be in the family it becomes the nearest relative to the group and still anchors the proto-family in Africa. Studies involving genetics, Afroasiatic, and food production should always examine hypotheses based on the different models of geographical origins, and the internal structuring of a language family.

             What happens when genetic and linguistic data are considered in terms of the African origins model?   The recent spread of Arabic has to be ignored and the presence of Semitic languages in Ethiopia--based on the standard linguistic interpretations.  The Y chromosome data are used because they are the most complete for the greatest number of members of the family.  The core distribution of the family runs from eastern Africa north to Egypt and west along the Mediterranean littoral. The subfamilies spoken or once spoken here are Cushitic (Kenya, Tanzania, Somali, Ethiopia, and Sudan), ancient Egyptian in the Egyptian Nile Valley, and Berber spoken in Siwa Oasis of Egypt and west to Morocco, and south into Mali and Mauretania.  Collating the data from various studies indicate that the TaqI 49 a,f  haplotype V is the predominant variant in this region overall; it has notably lower frequencies in the Near East (Table I).  This is exactly what would be expected for an African model of origins—assuming that the language family was initially and primarily spread by the migration of humans, and not by cultural diffusion.  Furthermore genetic studies that were not concerned with language confirm this. For example Underhill et al state that the M35 lineage was carried into the Near East from Africa in the “Mesolithic”; this M35 in Africa corresponds to the TaqI 49 a,f  V.   It is far more likely that pre-proto-Semitic speakers went into the Near East at this time where Semitic languages have long since dominated, but the predominant haplotypes in most Semitic speaking populations are variants VII and VIII in the TaqI 49 a,f system (al-Zahery), which are equivalent to M89 associated lineages.  The most parsimonious explanation is that ancestral Semitic was adopted by those bearing haplotypes VII and VIII.  (Incidentally it is not argued here that there has not been some two-way migration between Africa and Asia, but in this instance the predominant migration was from Africa; the underived M35 lineage is found primarily in East Africa, and most of the variation in derived and related clades are in Africa.) 

                Wheat and barley agriculture in Africa is first attested in northern Egypt around 5200 BC dating to some 2000 years after its emergence in Asia (the Near East); very significantly it appears as another item in a foraging strategy, not as a total change in subsistence pattern (Wetterstrom 1993).  This is not consistent with a mass migration hypothesis involving farmers from Asia (the Near East) who would have come as settler colonists.  By this time Afroasiatic had already begun to differentiate or was already differentiated at some level. Among the earliest agriculturalists in Asia (the Near East) were speakers of common Semitic based on vocabulary reconstructions (Diakonoff 1998). If there had been mass migration of rapidly multiplying farmers into Africa those who came to be known as the ancient “Egyptians” would have been Semitic speakers. Curiously not even the terms for sheep, goat, wheat, or barley in ancient Egyptian are Semitic loan words; this is another interesting finding that has historical implications. This example reveals the value in examining older and newer linguistic work in relation to genetic studies, as well as archaeological data.           

             Afroasiatic and its speakers offer other fascinating findings which invite a critical historical enquiry.  Chadic speakers living primarily in the central Sahara and northern Nigeria have a very high frequency of an M89 lineage of an R1 sub-clade. These Chadic speakers phenotypically are generally very dark skinned with frizzy hair the facial conformation often thought of as representative of “the African” a point that will be revisited later in relationship to the classification of Chadic within Afroasiatic. Curiously the frequency of this lineage is far greater in some saharo-tropical population samples (over 90%) samples than it is in the Near East—its putative place of origin. The frequencies of R1 are not nearly as high in Nile Valley Egypt and other parts of Supra-saharan Africa. What are the historical explanations for this lineage and its distribution, or what could they be? Some writers have postulated that the presence of the lineage corresponds to bearers of the Chadic family of Afroasiatic spreading into the central Sahara based on the presence of the V88 marker and called R-V88 (Cruciani et al. 2010); for some this would be reminiscent of the Hamitic hypothesis but should not be confused with that project because of the relatively low frequency of this clade in other branches of Afroasiatic.  Interpretations of Y chromosome data including STR variability from Equatorial Guinea suggest that the V88 marker may have arisen on an R1 background south of the Chad basin and spread north instead of the reverse (Gonzalez et al. 2012). It may very well be that the language family spread independently of the lineage, with the lineage being in the region before the Chadic language family. The presence of an R sub-lineage could be simply conceptualized as a Paleolithic “back migration” without any necessary cultural implications; in this scenario it becomes a part of the biological ancestry of regional peoples subject to local adaptive evolutionary forces. Its distribution in the circum-Saharan areas may reflect later movements of Saharan peoples during mid-holocene droughts.  Another explanation is that some areas of the central Sahara and Nigeria were settled in part by Assyrian refugees in the last millennium BC (see Lange 2011), an interpretation of evidence that has not gained wide acceptance among historians.

            It is of some historical interest and still very revealing that although linguists some linguists in the early twentieth century thought that Chadic was Afroasiatic, others greatly resisted its classification into the family (Newman 1980). This was not based on linguistic data but rather on the fact that the Chadic speakers were “black” or “blacker” than other members of the family (Ruhlen 1991). This illustrates the problem, leaving alone the bias for a moment, of having expectations that all connections or relationships would be between entities perceived as being uniform in all respects. This issue is made even more ironic when it is understood that some groups of sampled Chadic speakers have a higher percentage of a “Eurasian” Y lineage than do others who were categorically seen as more akin to Europeans.

Genes, ethno-population origins and identity

             The subject of “origins” is complex because of the term’s various meanings. The “origin” of a population has multiple dimensions. These include language, biology, and culture. As noted these are not causally related or linked, but sometimes may “travel” together at least over the short term; assumptions cannot automatically be made about another issue is related to the question origins in terms of time.  How far back into the past can a specific ethnicity/ethnic group as known in the ethnographic present be traced? It is important to remember that the external biological traits of a socioculturally defined group can change based on intermarriage if social rules permit and cultural values are not exclusionary based on external phenotype; gene flow connotes changes ancestry (unseen genetic variation can obviously also change). Likewise the cultural traits of a group defined biologically (at any level) can likewise change by the adoption of new languages and customs. Of course the definitions can also change over time and space. All of these increase the work for the population biologist interested in the dimensions of history or reconstruction of geneaology. Research on the origins and current genetic profiles of groups with particular ethnic designations can illustrate the benefits of better multidisciplinary cooperation.  A case in point is that of the Fula, who are perhaps are best known in West African history as Islamic reformers in 19th century.

              The Fula, variously called Fulani, Fulbe, Pula, Peul, have sometimes been seen as an “anomaly” in tropical West Africa, due to the physiognomy and skin color of the most distinctive subgroups, which on average have been noted to usually be different than the West African stereotype. Many have narrow noses and faces and “copper” colored skin.  Fulani communities can be found in the Sahel belt from Senegal to Sudan, with southern extensions into the Cameroons. In a recent report the “origin” of the Fulani is said to be “unknown” with “tradition” relating them to “Hiksos and Nubians”, and their language is said to be related to Berber, an Afrosiatic family (Rosa et al. 2004). However, current research places the group’s “homeland” in West Africa (Keita et al. 2010, McIntosh and Scheinfeldt 2012), and the Fulani language in the Niger-Congo family (Ruhlen 1991). The Hyksos (‘Hiksos’) invaded Egypt from the Near East in about 1800 BC and likely were Semitic speakers. They were defeated and eventually returned to Asia. There is no evidence that they reached western Africa or penetrated into the populations of the Sahara or Sahel, and there is nothing about the Fulani that would seem to be primary evidence that indicates a specific Hyksos origin. The Nubians, who live in southern Egypt and Sudan, speak languages in the Nilo-Saharan family, and while there are populations that speak languages in this family far to the west of the Nile Valley there is nothing specifically Nubian that would make for convincing evidence that the Fulani have a specific Nubian origin. In short neither Hyksos nor Nubians can be shown to have likely been the ancestral group at the root of the historical origins of the Fulani. In terms of historiography these accounts are not justified.

             The Balanta are another example, a population having some mtDNA signatures that are unexpected in the Senegambia. Specifically they have haplogroups that are called “Eurasian.”  This group from the Senegambia region is said to have a tradition that they were related to “Sudanese” “from whom they could have separated 2000 years ago with the first spread of kushites (sic) migrations…” Another hypothesis is offered that derives them from “a Bantu branch which separated in the Pleistocene near the Nile, following camite (sic) invasions.”   Many West African peoples have traditions of origins from the east; some of these may reflect the influence of Islamic groups, or migrations related to climatic events.  The word “Sudan” is Arabic and would suggest a period of interactions later than the ancient ‘kushites’ who were not Arabs or Arabic speakers. Sudan also is a generic term, the last part of the phrase “Bilad as Sudan” (Land of the “Blacks”) and in African historiography “western Sudan” refers to the Sahel belt of West Africa.  Also current research suggests that the  ‘kushites’ of the Nile Valley are believed to have spoken a language from the Nilo-Saharan family. Furthermore, the Balanta do not speak a Bantu language, and this family is not thought to date to the Pleistocene, but rather be of late mid-Holocene date at the earliest. The Niger Congo (or Niger Kordofanian family) does have a branch in the Sudan—Kordofanian, but it is not linguistically close to the Balanta. Current research does not speak of ‘camitic’ invasions, and it is not certain to which ‘invasions’ the authors refer.  ‘Camitic” is a synonym for Hamitic, a term no longer in common usage in modern African studies because the theories associated with it have been proven wrong. There is little reason to think that the Balanta as a people, have come from the Sudan.  This does not deny migration since examples of long distance migration in Africa exist, such as Arabs coming into Africa, Fulani migrating across the Sahel, and West Africans making the Hajj and not returning. The traditions about long distance migration to explain origins may have some reality, but not in the literal terms that they are expressed; it has been established that they generally cannot be accepted at face value, especially if they use terms that are chronologically dissonant, and the events are very remote in time. Specialists in the historiographic use of oral traditions have developed methods to both interrogate and validate them.

             The ideas reported about the Balanta and Fulani are based on older literature and set up the investigators to give “unfair” accounts.  A non-West African origin for these peoples is not supportable based on the current research of specialist historians.  The distributions of these peoples, written and oral texts from the regions, linguistics and archaeology invite a priori a holistic approach based on more recent research. It is very important to say that the authors do not say that they necessarily accept the reported views, but unfortunately they do not present other alternatives, which makes it difficult for the readers.  It is not possible to know which ideas are of historical interest only, as opposed to being presented as current viable hypotheses.

               Another example of the origins issue conflates material culture similarities with identities, and non-standard linguistic analyses (see Arnaz-Villena 1999).  Genetic similarities in the circum-Mediterranean region have been related to population movements that also have linguistic correlates.  It is argued that many languages of the Mediterranean region are related in a family or grouping called Usko-Mediterranean (UM), which includes Etruscan, Berber, Egyptian and Basque. Early Nile Valley populations from the Badarian culture were suggested to have migrated to Spain based on pottery similarities. Other movements of ancient Saharans to Europe were postulated. The UM grouping is not a part of current language classifications, which does not mean that it should be discounted out of hand, but its unorthodoxy should be acknowledged, and its validity demonstrated by the totality of current linguistic methodology. It is not inconceivable that there were early movements back and forth between Mediterranean Europe and Africa, but the thesis of deep cultural unity implied by the Usko-Mediterranean hypothesis is not well supported. This case illustrates how collaboration would have strengthened or altered the hypotheses. Standard archaeological and linguistic interpretations would have been contributed by specialists.

Genetics and the Northern Nile Valley: Egypt

          An example of the utility of historical resources to explain spatial genetic variation can be found in studies of Egypt, a country not only at the crossroads of three continents but with a well studied history based on texts, archeology, and climatology. Egypt flanks the Nile Valley can be conceptualized as a linear oasis in the eastern Sahara desert, and has had numerous settlements dotted along its length since the mid-Holocene.

              Patterns of Y chromosome and mtDNA genetic variation in the Egyptian Nile Valley corridor both show opposing clines for different haplotypes (see Lucotte and Mercier  2003, for the Y chromosome study, and Krings et al. 1999 for the mtDNA analysis). One Y chromosome study using the TaqI 49a,f probe found geographical sub-structuring in Egypt of the most prevalent variants, which are haplotypes V, IV, and XI. Haplotype V, called “northern” decreases in frequency from north (northern Egypt), to south (Egyptian Nubia), and IV and XI (called “southern”) increase in the opposite direction. Upper (southern Egypt) had intermediate frequencies that were not overall statistically different from those in Nubia located immediately south of Upper Egypt (Lucotte and Mercier 2003). In another study mtDNA haplotypes designated “northern” and “southern” were also found to be distributed differentially (Krings et al. 1999) . The geographical labeling of variants clearly implies a model of explanation that posits two fairly distinct populations that have created a zone of intergradation in the Nile Valley by bidirectional gene flow.

            The observed distribution of these haplotypes has been suggested to perhaps be related to three ancient military campaigns: the invasion and/or occupation of  Nubia by Egypt on two occasions (Middle Kingdom ~2000-1600 BC, and New Kingdom ~1550-1300 BC), the last for 500 years, and in turn by of Egypt by Nubia (750 BC) for less than 100 years.  Later migrations from Europe and the Near East during the Graeco-Roman Islamic, and Ottoman periods may have made a significant contribution. The spatial distribution has been suggested to be related to movements caused by military conflicts in the two studies.  No alternative explanations were offered.      

             While the military explanations are somewhat attractive to explain the pattern of the most common variants they are likely to be inadequate (Keita 2005). At these dates in the Nile Valley population sizes were large and unless there was a specific policy of population replacement or marrying local women, soldiers are not likely to have so large a genetic impact as indicated by the Y chromosome data. The findings do encourage us to dig into non-genetic data to understand social practice which can affect gene frequencies. It is known in the late period that Alexander encouraged his troops to marry local [Egyptian] women. There is no historical evidence suggesting that foreign troops gave their daughters or sisters in marriage to those being occupied in any consistent fashion, thus making it less likely that the military incursions explain the mtDNA patterns.  Historiographically the claim based on mtDNA implies evidence that does not exist; also there has been an incomplete enumeration of reasonable possible causes for the patterns.

            A holistic approach that considers varying time depths will augment or supplant this explanation.  The primary issue concerns the “peopling” of Nile valley, and how this happened.  This means examining data that can plumb different time depths. The evidence can come from archaeology, various texts, in addition to the biological data. Is the north-south description of variants valid for all markers? Is it possible that the model positing the interaction of idealised discrete groups is false, and that a model that postulates more populations more as processes is more accurate? Archaeological data support the post-glacial re-peopling of the eastern Sahara after the long late pleistocene hiatus (Wendorf and Schild 1980, 2001), probably from multiple directions. These Saharans were mobile and interacted and it is hard to envision that there was no gene flow between populations. Evidence indicates holocene migration to the Nile Valley from the Sahara, thus contributing heavily to the population resident there (Wendorf and Schild 1980, Hassan 1988), and a cultural basis for interactions with the Sudan. During and after the rise of the Egyptian state archaeological and textual evidence supports ongoing interactions between Nubian and Egyptian populations before the three military invasions (see for example Williams 1986). In summary there are pre-Middle Kingdom factors that should be invoked to explain the distribution of the most prevalent alleles.

          There is always the issue of “back migrations to Africa” at various times. A distinction has to be made between Pleistocene or early holocene migrations of individuals or groups before the emergence of current linguistic families or ethnic groups, and those of known or reasonably known historical entities. Movements during the Paleolithic can be seen as the settlement and resettlement of humans that cannot be shown to have a likely connection with later known populations or peoples. Given that there is evidence for modern humans in northern Africa before there was migration out of Africa and any back-migration, one question is what happened to those people? Supra-Saharan Africa becomes fascinating in terms of its human biogeographical history, and even moreso when culture is considered. The mtDNA of Berber speakers and those in Egypt (now Arabic speaking) is highly variable, and includes a majority of lineages interpreted as Asian or Eurasian. (And it is to be noted that Rosa et al. (2004) found some of such lineages such in the region of Guinea—another interesting historical question.) Yet the Y chromosome data in northern Africa viewed overall, and especially if the impact of later migrations associated with polygamy clearly show a preponderance of African lineages from the E haplogroup. What are the historical scenarios which could explain this? The linguistic data do not show ancient loan words datable to before the Roman period, suggesting that if there were a large number of early speakers of non-African languages that they were bioculturally assimilated. Their genes and culture were reworked in an African environment, and their biocultural identity emerged in African environments of the Saharan and Supra-saharan regions.

Discussion and Conclusions

         These cases collectively illustrate the value of more aggressive collaboration with a range of specialists from other fields.  The examples could be easily multiplied. Multidisciplinary teams will help improve the accuracy of research design and interpretations. In research in which patterns of genetic variation are used as the primary data to drive interpretations specialists from other fields can help raise hypotheses about such singularly based interpretations.  In work that derives a framework from non-genetic data geneticists can help devise models that will account for known facts. The reciprocal illumination afforded by such give and take will increase the value of this work so that future studies have a firmer foundation.***


  1. Ammerman, A.J.& Cavalli-Sforza, L.L. The Neolithic Transition and the Genetics of Populations in Europe. (Princeton University Press, Princeton, 1984).
  2. Arnaz-Villena, A., Martinez-Laso,  J., & Alonzo-Garica,  J.  Iberia: population genetics, anthropology, and linguistics.  Hum.  Biol.  71, 725-743 (1999).
  3. Arredi, B, Poloni ES, Paracchini S, et al. A predominantly neolithic origin for Y-chromosomal DNA variation in North Africa. Am J Hum Genet. 75(2), 338-45 (2004).
  4. Blench, R. Is Niger-Congo simply a branch of Nilo-Saharan? In Proceedings: Fifth Nilo-saharan Linguistics Colloquium, Nice, 1992, 83-130.  Eds Nicolai, R. and Rottland, F. Rudiger Koppe (Koln 1995).
  5. Cruciani F., Trombetta, B., Selitto, D, et al. Human Y chromosome haplogroup R-V88: a paternal genetic record of early mid-Holocene trans-Saharan connections and the spread of Chadic languages. Eur J Hum Gen 18, 800-807 (2010)
  6. Diakonoff, I. The earliest Semitic society: linguistic evidence. J Semit Stud 43, 209-219 (1998).
  7. Gonzalez, M., Gomes, V., Lopez-Parra, A.M., et al. The genetic landscape of Equatorial Guinea and the origin and migration routes of the Y chromosome haplogroup R-V88.  Eur J Hum Gen  doi.10.1038 (2012)
  8. Gould, S. Petrus Camper’s angle. Natural History. August, 12-18 (1987).
  9. Gregerson, E.A. Kongo-Saharan. J. Afr. Ling. 11(1), 69–89 (1972).
  10. Guthrie, M. Bantu origins: a tentative new hypothesis. J. Afr. Lang. 1, 9-21. (1962)
  11. Hassan, F.  The predynastic of Egypt. J.World Prehis. 2 (2), 150-180 (1988).
  12. Hiernaux, J. The People of Africa. (New York, Charles Scribeners Sons, 1975)
  13. Keita, S.O.Y.  History in the interpretation of the pattern of p49a,f  TaqI RFLP Y-chromosome variation in Egypt: a consideration of multiple lines of evidence. Am J Hum Biol 17, 559-567 (2005).
  14. Keita, S.O.Y., Jackson, F., Borgelin, L., Maglo, K. Letter to the editor: commentary on the Fulani: history, genetics and linguistics, an adjunct to Hassan et al. 2008. AJPA 141, 665-667 (2010).
  15. Krings, M., Salem, A.H., Bauer, K., et al. MtDNA analysis of Nile River Valley populations:  a genetic corridor or a barrier to migration?  Am. J.  Hum. Genet. 64, 1166-1176 (1999). 
  16. Lange, D. The founding of Kanem by Assyrian refugees ca 600 BCE: documentary, linguistic, and archaeological evidence. Boston University Working Papers in African Studies. No. 265. (2011)
  17. Lucotte, G, & Mercier, G. Y-chromosome haplotypes in Egypt.  Am J Phys Anthropol 121, 63-66 (2003). 
  18. McCullagh, C. B. Bias in historical description, interpretation, and explanation. History and Theory 39, 39-66 (2000).
  19. McCullagh, C. B. What do historians argue about? History and Theory 43, 18-38. (2004)
  20. McIntosh, S., Scheinfeldt, L.B. It’s getting better all the time: comparative perspectives from Oceania and West Africa on genetic analysis and archaeology. Afr Arch Rev 29, 131-170  (2012).
  21. Newman P. The Classification of Chadic within Afroasiatic. (Leiden, University Press, 1980).
  22. Robertson J., Bradley R.A. A new paradigm: The African early iron age without Bantu migrations. History in Africa 27, 287-323 (2000).
  23. Rosa, A., Brehm, A., Kivisild T., Metspalu, E., Villems, R.  MtDNA of West Africa Guineans: Towards a better understanding of the Senegambia region. Ann. Hum. Gen. 68, 340-352 (2004).
  24. Ruhlen, M.  A Guide to the Worlds Languages: Volume 1, Classification. (Stanford University Press, New York, 1991).
  25. Sanders, E. The Hamitic hypothesis: Its origin and function in time perspective.   J.  Afr. Hist. 10, 521-32 (1969)
  26. Semino. O., Santachiara-Benerecetti, A.S., Falaschi, F., Cavalli-Sforza,  L.L., and Underhill, P. Ethiopians and Khoisan share the deepest clades of the human Y-chromosome phylogeny. Am. J. Hum. Genet. 70, 265-268 (2002). 
  27. Wendorf, F. & Schild, R. Prehistory of the Eastern Sahara.  (Academic Press, New York, 1980). 
  28. Wendorf., F, Schild, R., & Associates. Holocene Settlement of the Egyptian Sahara. Vol I. The Archaeology of Nabta Playa.  (Plenum, New York, 2001).
  29. Wetterstrom W. Foraging and farming in Egypt: the transition from hunting and gathering to horticulture in the Nile Valley. In  Shaw, T. Sinclair, P., Andah, B., Okpoko, A.,  editors,   The Archaeology of Africa: Food, Metals and Towns ( New York, Routledge, 1993, 165-226.)
  30. al-Zahery, N., Semino O, Benuzzi, et al. Y chromosome and mtDNA polymorphisms in Iraq, a crossroad of the early human dispersal and of post-Neolithic migrations.  Mol. Phyl. Evol. 28(3), 458-472 (2003). 

© W. Montague Cobb Research Laboratory

Download a PDF of this Article >