Tracing the Roots of the Koraga: The Discovery of a Lost Ancestor in India's Genetic Story


In the lush, coastal districts of South India, the Koraga people have lived for generations on the margins. Historically labelled as "untouchables," a regressive stereotype that persists today through practices like ajalu, they are among the region's underprivileged communities, their lives defined by subsistence: weaving baskets, gathering forest produce, and working as daily wage labourers. For centuries, social exclusion forced them into genetic isolation, marrying within their own community. This isolation, a source of historical hardship, has unwittingly transformed the Koraga into living vaults of ancient history. Now, their genes are speaking, and they are telling a story that challenges everything we thought we knew about the ancestral origins of over a billion people. 

Graphical abstract of the Reich et al. 2009 paper that
described Ancestral North Indian (ANI) and
Ancestral South Indian (ASI) genetic components

Graphical abstract of Kerdoncuff et al. 2025


For years, the prevailing narrative of Indian ancestry was a relatively simple trio of sources: indigenous Ancient Ancestral South Indians (AASI), related to Andamanese hunter-gatherers; farmers from the Iranian Plateau; and pastoralists from the Pontic-Caspian Steppe. But a landmark study of the Koraga, published in the European Journal of Human Genetics, reveals this picture was incomplete. The research, led by Dr. M.S. Mustak and Dr. Ranajit Das, uncovers a fourth, previously unknown ancestral source: the “Proto-Dravidians.” The study was driven by the foundational work of lead author Dr. Jaison Sequeira, who conceptualised the genetic modelling strategy based on the linguistic affinity between the Koraga, Oraon, and Brahui languages. To bridge the gap between genes and language, the team included Prof. George van Driem, an eminent linguist and Professor Emeritus at the University of Bern, ensuring the findings were interpreted through a robust linguistic lens.

A Genetic Island in a Vast Ocean 

The first clue to the Koraga’s uniqueness came from their position on the genetic map. Advanced statistical analyses like Principal Component Analysis (PCA) showed the Koraga clustering not with their geographical neighbours in South India, but with populations like the Gauḍ and Bāgdi from northern and eastern India. They formed a distinct genetic island, having "drifted away" from the main Indian population cline.

PCA plot based on GenomeAsia100K data, which is merged with newly generated Koraga data. The plot shows population structure in the Indian cline, i.e. Ancestral North Indian (ANI) and Ancestral South Indian (ASI). Red ellipses represent clusters which also include North Dravidian language communities sensu lato: Brahui, Koraga, Gauḍ, Tānti, Bāgdi, Oraon, Kōṭa, Birhor and Mahar. Kōṭa and Mahar are found in the ANI-ASI cline as well. Grey ellipses represent clusters that have drifted away from the ANI-ASI cline, which includes both Dravidian and other groups: Burusho, Kannaḍa and coastal populations of Karnāṭaka and Kerala, Pāniyā, Koṇḍa Reddy, Kamar and one Birhor sample.

Further tests confirmed their profound isolation. Outgroup f3 statistics showed that the Koraga share slightly more genetic drift with northwestern populations compared to their geographic neighbours. This could be due to deep ancestry sharing between them. The Koraga have undergone a dramatic "population-specific drift," a tell-tale sign of a community that has been genetically sealed off for a very long time. This isolation came at a cost. The study identified a "founder event"—a sharp reduction in population—that occurred between 750 and 1,020 years ago, with an intensity approximately five times stronger than that observed in the Ashkenazi Jews.


The researchers link this event to a period of social upheaval under dynasties like the Kadamba and Hoysaḷa, when the imposition of rigid caste system may have forced tribes like the Koraga into extreme seclusion. This bottleneck also amplified rare genetic variants, explaining the tribe's high prevalence of disorders like Loeys-Dietz syndrome, Cockayne syndrome, congenital blindness, and deafness, which contribute to lower life expectancy.

The Linguistic Bridge to an Ancient Past

The Koraga’s unique North Dravidian language was the second piece of the puzzle. Despite being surrounded by Tulu speakers, they preserved their own tongue. The study found a deep genetic link between the Koraga and two other isolated North Dravidian-speaking tribes: the Brahui of Pakistan and the Oraon of eastern India.

Using ALDER analysis to date this connection, the researchers made a stunning discovery: these three tribes, now separated by thousands of kilometres, last shared a common ancestor around 4,400 years ago (c. 2,370 BC)—the height of the Indus Valley Civilisation's Mature Harappan period. This suggests they are scattered remnants of a once-widespread Proto-Dravidian population.

Unveiling the Fourth Ancestor

The core of the research was to pinpoint the exact nature of the Koraga’s ancestry. Using sophisticated tools like qpAdm and f4-ratio tests, the team modelled the Koraga genome. They found it to be a blend of Ancient Ancestral South Indian (AASI) related groups, a small component from the ancient "Indus Periphery" people, and a significant portion—about 25-30%—linked to Early Neolithic farmers from sites like Ganj Dareh in the Zagros Mountains of Iran, dating back 10,000 years.

Crucially, this Iran Neolithic ancestry in the Koraga was distinct. It wasn't just a subset of the previously known "Iranian farmer-related" source. It represented a separate, deep-rooted branch. The admixture graphs consistently positioned the Koraga as an ancestral source for later populations.


To test this, the team tried to model the ancestry of modern Dravidian-speaking groups. Models that included only AASI, Steppe, and Iran Neolithic sources failed. But when they added the Koraga as a fourth source, the models succeeded. The Koraga-like component (Orange) was essential to explaining the genetic makeup of much of modern India. This component, the researchers propose, is the genetic signature of the Proto-Dravidians.

A New Map for Indian Ancestry

This discovery redraws the map of Indian prehistory. The study suggests that a distinct "Proto-Dravidian" population emerged in the region between the Iranian Plateau and the Indus Valley no later than 4,400 years ago. Their descendants, carrying this Koraga-like ancestry, dispersed across the subcontinent.

As they moved south, they formed the Dravidian-speaking communities we know today. Those who remained in the north were largely absorbed by later arrivals, including Indo-European-speaking Steppe pastoralists, adopting new languages but retaining a foundational layer of this ancient ancestry. The Koraga, isolated by social forces, became a frozen snapshot of that foundational population.

The research provides a powerful genetic corroboration of the "Elamo-Dravidian" linguistic hypothesis, which posits a deep link between the Elamite language of ancient Iran and the Dravidian languages of India. The time depth of the shared ancestry between the Koraga and the 10,000-year-old Ganj Dareh sample coincides perfectly with the proposed timeline for this linguistic phylum.

A Legacy Reclaimed

The story of the Koraga is no longer just one of social marginalisation. It is a story of deep time and human migration. They are not a peripheral people, but central narrators of India's past. Their genes reveal that the "Proto-Dravidian" ancestry is a fundamental, fourth pillar supporting the vast and intricate structure of the Indian population, present in most modern groups except for the most isolated tribal communities.

For the Koraga, this research is a bittersweet validation. The very social structures that oppressed them also preserved their unique genetic heritage, a heritage that turns out to be a missing piece in the grand puzzle of Indian civilisation. Their story is a profound reminder that the deepest histories are often held not in monuments or texts, but in the genes and the fading words of the most marginalised among us.

Additional reading:

Sequeira JJ, Vinuthalakshmi K, Das R, van Driem G and Mustak MS (2024) The maternal U1 haplogroup in the Koraga tribe as a correlate of their North Dravidian linguistic affinity. Front. Genet. 14:1303628. doi: 10.3389/fgene.2023.1303628

Article by Bindya and Jaison, Mangalore University


Relevant Notes (Author's response to some of the questions on social media):

1. What is the proportion of Proto-Dravidian ancestry in Dravidians already showing 30-50% Iran_Neolithic component?

qpAdm analysis reveals that genetically plausible models for Dravidian populations require a 13-23% ancestral contribution from Koraga. This specific component is not present in the Irula (used as an AASI proxy), Iran Neolithic, or IVC Periphery sources. It appears to have been retained only in the Koraga, likely due to their prolonged isolation. We identify this unique component as the Proto-Dravidian ancestry.

This finding is supported by an admixture plot, which shows approximately 20% of a Koraga-like ancestry in modern Indian populations, in addition to other components like Indigenous Tribal and British-like ancestry. This indicates a 20% genetic similarity between Koraga and modern Indian populations, independently supporting our estimate of 13-23% Proto-Dravidian ancestry.

Furthermore, when we model the Koraga population itself, the analysis requires about 10% of a Middle Eastern ancestry, in addition to Onge-like and IVC Periphery components. This requirement further corroborates our central finding, as it suggests the Koraga preserve a distinct West Eurasian-related lineage that aligns with the proposed Proto-Dravidian component.

 2.  What is the justification for the date estimate (4,400 years before present)?

The date estimate of ~4,400 years before present is derived from ALDER analysis, which infers admixture events by measuring Linkage Disequilibrium (LD) decay. This analysis revealed a prolonged period of admixture, spanning from approximately 6000 to 2800 years ago, marking the initial formation of the 'Proto-Dravidian' ancestral component through major mixing events between Iranian-related and AASI populations. The median date of ~4400 BP is interpreted as the peak period of divergence and population structuring. This represents the time when the ancestral populations of deep, isolated branches like Brahui, Oraon, and Koraga began to separate from the core continuum. Therefore, we consider the ~4400 BP date to be the genetic signature of the pivotal period when the Proto-Dravidian community was fragmenting and spreading.

We anticipate that more ancient DNA from the region between the Iranian Plateau and the Indus Valley will further solidify this model. We are also currently exploring other isolated populations on the southwest coast that could produce similar Koraga-like signals. Based on our preliminary analyses, we are confident that the 'Proto-Dravidian' component is not speculative but a statistically well-defined ancestral population.

3. But the ~1000 year old "founder effect" should dilute this signal. How can Koraga be a good model for reconstructing ancestry?

This is a valid concern. A strong founder effect can indeed complicate genetic analysis. We addressed this potential issue in several ways: First, we proactively identified a subset of Koraga individuals with higher levels of Identity-By-Descent (IBD) sharing with the Brahui, labeling them Koraga2. Our reasoning was that this subset might preserve a stronger signal of the shared ancestral lineage. Despite showing a more intense and recent founder event in our ASCEND analysis, Koraga2's fundamental ancestral characteristics were indistinguishable from the other Koraga individuals (Koraga1). 

Furthermore, the allele frequency-based tests we employed, such as f-statistics and qpAdm, are specifically designed to detect non-random, shared evolutionary history. They are robust to the random noise introduced by recent, population-specific genetic drift, such as a founder effect. These methods look for systematic correlations in allele frequencies across thousands of independent genomic markers, which are generated by deep shared ancestry, not recent random sampling. Moreover, our results were consistent; Koraga produced statistically robust fits across multiple, independent qpAdm models. If the Koraga signal were merely an artifact of recent drift, it would not consistently fit into these complex statistical models.

4. A founder event like that could also mean that seafarers arrived around 1000 years ago and formed Koraga. What is the antiquity of the Koraga population?

Our analysis with RELATE reveals two distinct drops in effective population size: one ~1000 years ago (the known founder event) and a much earlier one ~3000 years ago, shared with the Oraon, another North Dravidian speaker. This provides direct genomic evidence of a shared population bottleneck three millennia before the proposed "seafarer" event.

Folklore recounts a clash between a Koraga chieftain and the Kadamba rulers, a dynasty that existed between the 4th and 7th centuries CE. If the folklore is true, this places the Koraga community in the region at least 600 years before the 10th-century founder event, directly contradicting a recent arrival.

Our previous study showed U1 specific maternal founder lineage (Sequeira et al. 2024) within the strongly matrilineal Koraga community indicating a deep, localized maternal history, not a recent influx.

Lastly, we performed ADMIXTURE analysis with only X chromosomes (unpublished). The ancestry proportions were not significantly different from the autosomal data. All of this suggests that the admixture events shaping the Koraga were not recent and involved a more balanced contribution from both sexes, consistent with a long-standing population.

5. Birhor is a Munda-speaking tribe. How did it fit in the LD decay model?

Similar to Singh et al. (2025), we find that allele frequency-based tests fail to identify the genetic relationship between populations that diverged in the deeper past - such as Brahui and Oraon. In our own frequency-based tests, we too observe similar results for Koraga with Brahui and Oraon.

However, in our IBD sharing analysis - a haplotype-based method - we observe a differential pattern of long and short segment sharing between these groups. Divergent populations share shorter segments scattered throughout the genome, while recently related groups share longer segments, often concentrated in a few chromosomes. We observe the former pattern between Koraga–Brahui and Koraga–Oraon. Oraon shows very few long segments shared with Koraga, which we believe is due to a different admixture history compared to Brahui.

Now, to address the Birhor question: Koraga and Birhor do show significant LD decay and IBD sharing. However, the primary cause is not recent mixing between them, but rather their shared deep ancestry. Like Koraga, Birhor is an inbred tribe (with a notable proportion of East Asian ancestry). If both groups arose from a common ancestor, the chance of them retaining longer ancestral IBD segments is higher compared to more admixed groups, due to their reduced genetic diversity and prolonged isolation.

This pattern supports the view that Koraga and Birhor preserve an older ancestral connection, whereas the long IBD segments between Birhor and Oraon reflect recent gene flow.

6. Do Urali Kurumans also have genetic links as the Koragas do?

Two papers (Sylvester et al. 2019 and Palanichamy et al. 2015) have hinted at a genetic link between South Indian tribes and the Iranian plateau. One included the Koraga tribe, while the other focused exclusively on the Urali Kuruman tribe, an ethnic group scattered across southern India.

Although the Urali Kurumans show no direct cultural or historical links with the Koraga, they also carry a high proportion of the U1 mitochondrial haplogroup. However, in Sequeira et al. (2024), our phylogenetic analysis revealed that the U1 lineage in the Koraga and Urali Kuruman diverged as far back as the Last Glacial Maximum. Interestingly, the U1 lineage in Urali Kuruman clusters with U1 haplotypes found in Iran, the Caucasus, and the Middle East - a different cluster from that of the Koraga. Furthermore, the R30 haplogroup, which is abundant in Urali Kuruman, is absent in the Koraga, suggesting these populations evolved separately.

Focused studies on these ancient tribes are essential to untangle the mysteries of their origin, migration, and settlement. We would greatly value obtaining Urali Kuruman autosomal data to explore their genetic relationships further.

Comments

Popular posts from this blog

ನಾವು ಯಾರು? ಎಲ್ಲಿಂದ ಬಂದವರು? ಎಲ್ಲಿಗೆ ನಮ್ಮ ಪಯಣ?

ಬ್ರಾಹ್ಮಣರ ಪಿತೃವಂಶದಲ್ಲಿ ಗೋತ್ರದ ಮಹತ್ವ: ಅನುವಂಶಿಕ ಅಧ್ಯಯನ