STR-based genetic structure of the Berber population of Bejaia (Northern Algeria) and its relationships to various ethnic groups

Patterns of genetic variation in human populations have been described for decades. However, North Africa has received little attention and Algeria, in particular, is poorly studied, Here we genotyped a Berber-speaking population from Algeria using 15 short tandem repeat (STR) loci D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818 and FGA from the commercially available AmpF/STR Identifiler kit. Altogether 150 unrelated North Algerian individuals were sampled across 10 administrative regions or towns from the Bejaia Wilaya (administrative district). We found that all of the STR loci met Hardy–Weinberg equilibrium expectations, after Bonferroni correction and that the Berber-speaking population of Bejaia presented a high level of observed heterozygosity for the 15 STR system (>0.7).Genetic parameters of forensic interest such as combined power of discrimination (PD) and combined probability of exclusion (PE) showed values higher than 0.999, suggesting that this set of STRs can be used for forensic studies. Our results were also compared to those published for 42 other human populations analyzed with the same set. We found that the Bejaia sample clustered with several North African populations but that some geographically close populations, including the Berberspeaking Mozabite from Algeria were closer to Near-Eastern populations. While we were able to detect some genetic structure among samples, we found that it was not correlated to language (Berber-speaking versus Arab-speaking) or to geography (east versus west). In other words, no significant genetic differences were found between the Berber-speaking and the Arab-speaking populations of North Africa. The genetic closeness of European, North African and Near-Eastern populations suggest that North Africa should be integrated in models aiming at reconstructing the demographic history of Europe. Similarly, the genetic proximity with sub-Saharan Africa is a reminder of the links that connect all African regions.


Introduction
Global patterns of genetic diversity are becoming increasingly important to reconstruct the demographic history of human populations. While some regions have received significant attention, others, like North Africa, have been generally less sampled and less studied. This is the case for Algeria despite its geographical position linking the Mediterranean area and Sub-Saharan Africa. Today the Algerian population is composed of two main linguistic groups, the Berber-and the Arab-speaking populations, and it is usually considered that the majority of the Algerians descend from Berbers and Arabs (Taïeb, 2004). However, the history of Algeria and North Africa is rather complex. For instance, the Berber-speaking region of Bejaia has witnessed many successive invasions and conquests that caused important cultural, linguistic and religious reshuffles among which the most important is probably the Arab conquest that started in the seventh century. Chronologically, the region was submitted to the influence of the Romans (33 BC), the Vandals (429 AC), the Byzantines (533 AC), the Arabs (647 AC), the Spanish (1510 AC), the Ottomans (1555 AC) and the French (1832 AC) (Cote, 1991;Laporte, 2004). In addition to these migrations, there have been internal reshuffles, with the introduction of Jewish and sub-Saharan African populations. At the fall of Andalusia (1610 AC), many of its expelled citizens came to establish settlements in Bejaia (see Gaid, 2008).
Thus, while Berbers are likely to be the most ancient inhabitants of the region, gene flow, immigration and language switching may have obscured the relationships between neighboring or distant populations. Genetic data could therefore be useful to identify connections between populations speaking different languages today within Algeria or at a wider geographical scale. For instance, Henn et al. (2012), using genomic data, estimated that the North African populations are likely of Berber origin with substantial shared ancestry with the Near East and, to a lesser extent, eastern and western sub-Saharan Africa and Europe. Though the number of studies on North Africa is relatively limited, there have been important studies using various markers that have contributed to the anthropogenetic characterization of North African Berber populations. These studies have focused on the GM immunoglobulin allotypic system Coudray et al., 2004;Coudray et al., 2006), others on mitochondrial DNA (Fadhlaoui-Zid et al., 2004;Ennafaa et al., 2009, Coudray et al., 2009, the Y chromosome (Arredi et al., 2004), autosomal microsatellites (STR) (Bosch et al., 2000;Bosch et al.,2001;Coudray et al., 2006;Coudray et al., 2007a;khodjet-el-khil et al., 2008, El Ossmani, 2010, Khodjet-El-Khil et al., 2012, Gaibar et al., 2012, SNP (Henn et al., 2012), and Alu Sequences (Gonzalez-Pérez et al., 2003). Very few studies have been carried out on Algerian Berber populations (Bosch et al., 2001;Achilli et al., 2005;Lefevre-Witier et al., 2006;Coudray et al., 2009;Pereira et al., 2010, Bekada et al., 2013. The present study is part of a wider project on the anthropogenetic characterization of Algerian populations. In this paper we used 15 independent autosomal STR loci to genotype a sample of 150 individuals from the Berber-speaking population of the Bejaia wilaya to provide data on allele frequencies distribution and forensic parameters. The allele frequencies were exploited, using multidimensional scaling (MDS) and tree analysis (UPGMA), to assess the relationships between the Bejaia population and 42 other populations from North Africa, Sub-Saharan Africa, the Middle-East, Europe, Asia and South America. Analysis of molecular variance (AMOVA) was performed to assess the genetic structure of 17 populations (including Bejaia). A STRUCTURE analysis was also conducted.

Materials and methods 2.1. Population
Buccal swab samples were collected from unrelated healthy Berber-speaking donors (n=150 individuals, 300 gametes) from the Bejaia area in North Algeria (Fig.1

Electrophoresis and genotyping
DNA fragments were separated by multi-capillary electrophoresis on an ABI Prism 3130xl Genetic Analyzer using the ABI GeneScan 500 LIZ internal size standard as a basis for comparison. Fragment sizes were obtained using the software GeneMapper® v3.2 (Applied Biosystems, Foster City, CA) and alleles were identified by comparison to an allelic ladder supplied by the manufacturer (Applied Biosystems, Foster City, CA).

RelPair analysis
To detect intra-population pairs of close relatives, we used the program RelPair Version 2.01 (Epstein et al., 2000). Each population was separately analyzed following the suggested settings of Pemberton et al. (2013), namely with a critical value set to 100 and a genotyping error rate of 0.008. When related individuals were identified, one of them was discarded from the analysis. In order to minimize the number of individuals removed, we preferentially

Statistical and phylogenetic analysis
Allele frequencies, expected (He) and observed (Ho) heterozygosity (Nei, 1987) and the exact test of Hardy-Weinberg equilibrium (Levene, 1949;Guo and Thompson, 1992) were computed using the Arlequin Software Version 3.5.1.2 (Excoffier and Lischer, 2010). The In order to determine the genetic relationship of our sample with other ethnic groups, we compared it to 42 populations from Europe, Asia, America and Africa using homologous microsatellite loci (Table 1). Pairwise uncorrected Fst distances between the 43 populations were used to perform a standard non-metric MDS using Statistica 8.0 (StatSoft, 2008) and infer a UPGMA tree using POPTREE2 (Takezaki et al., 2010) available at:http://www.med.kagawa-u.ac.jp/~genomelb/takezaki/poptree2/index.html. Tree robustness was evaluated using Bootstrap tests on 1000 permutations (Felsenstein, 1985). UPGMA rather than NJ method was used because it was more bootstrap-supported than the NJ one. Note that the trees were simply used as a graphical representation of the genetic distances computed. They cannot be seen as a reliable representation of the relationships between populations due to the fact that such trees ignore the existence of gene flow, which is a crucial feature of human populations (Barbujani and Chikhi, 2007).
The MDS and Tree analyses were performed on all the 15 loci (including those with missing data) as well as after removing those with missing data (i.e. D16S539, D2S1338 and D19S433).
The significance of discriminance between groups in the MDS plot was determined using one-way ANOVA followed by unequal HSD (Honestly Significant Difference) test as implemented in Statistica 8.0 (StatSoft, 2008). The homogeneity of variances was checked using Levene's and Cochran's tests. When required, equality of variances was achieved by dividing data by the standard deviation values and comparing the standardized data.

. Results
Observed heterozygosity (Ho), expected heterozygosity (He) and Hardy-Weinberg equilibrium tests (Ph) estimated on the 116 individuals of the Bejaia population, are given in The power of discrimination (PD), the probability of excluding paternity (PE) and the polymorphic information content (PIC) are displayed in Table 2  The standard non-metric multidimensional scaling (MDS) based on Fst distances (15 loci) split the 17 North African populations including Bejaia (Table 1) into two main groups significantly discriminated (Fig. 2 A)   respectively. For this grouping, significant differences were observed for 8 out of 15 loci (Table 3): D8S1197, D7S820, CSF1PO, D3S1358, TH01, D16S539, TPOX and FGA.
However, for all the three plans of grouping (spoken language, geographical location and cluster affiliation), AMOVA revealed highly significant differences between populations within each the groups and for all loci (Supplementary Table 2).

Table 3
The

Discussion
These results constitute the first data reported on genetic diversity of the Bejaia population.
The 15 STR loci were highly polymorphic with a significant proportion (40%) of rare alleles The genetic heterogeneity of North African populations with more or less affinities with Middle East, Europe and Sub-Saharan Africa has been suggested by authors using mtDNA (Plaza et al., 2003;Coudray et al., 2009), Y-chromosome DNA (Arredi et al., 2004;Capelli et al., 2006), STR markers (Capelli et al., 2006;El Ossmani et al., 2010) and SNPs (Botigué et al., 2013;Henn et al., 2012). In some studies, a West-to-East gradient, ranging from West Sahara to the Middle East has been described, which we could not detect in the present study, most studies that have shown that genetic distances are correlated with geographical distances (Ramachandran et al., 2005;Lao et al., 2008) were performed at large geographic scale.
Studies carried out at smaller scale (within regions such as North Africa) are likely more influenced by population relocations and isolations (Ramachandran et al., 2005).
No significant genetic differences were found in this study between the Berber-and the Arab-  (Bosh et al, 2000), or that these populations were genetically very similar when they met.
Our results show that language boundaries are not correlated with genetic distances for North African populations, probably due to the fact that the Arabisation is recent in the region.
However, this is not necessarily a general rule since several authors found correlation between language boundaries and genetic differentiation (Barbujani and Sokal, 1990;Chen et al., 1995).  Table 3) analyses. This may be due to the low number of markers used in our study (15 STR) and/or the sample size of the populations analyzed. As demonstrated by Pritchard et al., 2000, the accuracy of inferences improves with sample size, number of loci, and degree of divergence between populations. Our results are in agreement with previously reported observation (Bosch et al., 2000, Khodjet-El-Khil et al., 2008.
Altogether our results show that the language spoken today may not reflect the history of the populations, with several Arab-speaking populations being Berbers who shifted their language after the Arab conquest. Another possibility is that genetic drift in some of them has led to significant differences in allele frequencies which blurred the historical relationships. Also, admixture and gene flow between Arab-speaking and Berber-speaking population may have contributed to the present-day situation where linguistic and genetic distances are less correlated than they perhaps were in the past.
In this study, we do not wish to make strong statements and draw conclusions on these issues.
Our aim was to identify useful markers for forensic studies and quantify genetic diversity in North Africa. We would also need to apply more complex and advanced statistical methods that those used here. In particular, it would be interesting to better understand the relationships between north Africa and the Andalusians of Moroccan origin who came to settle around Bejaia after the fall of Andalusia in 1610 (see Gaid, 2008). Similarly it would be interesting to quantify the impact of the various invaders of the Bejaia region during history. Historical texts and the genetic closeness of the Bejaia population to its neighbours found here suggests that these contributions were probably limited but it would still be interesting to quantify them using genomic approaches and inferential methods such as Approximate Bayesian Computation (ABC, Beaumont, 2010