Repetitive population turnovers are shown in 100 ancient genomes


ADMIXTURE: identifying ancient Danish genomes using allele-frequency-free inference, with application to deep Eurasian population structure

The ADMIXTURE results presented in this study represent subsets of individuals from the full ADMIXTURE runs in3 where 1,593 ancient individuals were analysed (n = 1,492 imputed, n = 101 pseudo-haploid, n = 71 excluded as close relatives or with a contamination estimate >5%; HO dataset). Figure 1c represents 176 ancient Danish genomes after excluding three close relatives (Supplementary Data 1 and 4).

To infer genetic relatedness between the study individuals, we used the allele-frequency-free inference method introduced previously88. Three relatedness estimators were calculated for each pair of individuals using the site-frequency-spectrum approach. The 2D-SFS was selected the highest likelihood across ten replicates, using the realSFS method90 implemented in the ANGSD package. We used a set of 1,191,529 transversion SNPs to find out what kind of person they are. Previously established cut-offs89 for the KING-robust estimator were applied to assign individual pairs to first-, second- or third-degree relationships. The relative pair of a parent–offspring relationship can be inferred by using the R0 and R1 ratios. Individual pairs with fewer than 20,000 sites contributing to the estimators were excluded.

We used ADMIXTOOLS2 to investigate deep Eurasian population structure with qpGraph. For these analyses, pairwise f2-statistics were pre-computed from pseudo-haploid genotypes in the ‘1000G’ dataset using the ‘extract_f2’ function with ‘afProd=TRUE’. The Upper Palaeolithic European individual Kostenki 14, who was treated as a separate population, was grouped into populations using their memberships in the genetic clusters inferred from IBD sharing. We used a semi-automatic iterative approach to fit the graph.

The following criteria must be met to potentially exclude individuals: (i) the contamination estimate is greater than 5%; Supplementary Note 1; and (ii) the coverage is less than 0.1 There were 1,492 individuals that passed all filters, and213 that didn’t, which was used in most of the downstream analyses.

D-statistics was obtained using pseudo-haploid genotypics at transversion SNPs that were used to estimate populations using their membership in the genetic clusters inferred from IBD sharing. We used the qPDstat function in the ADMIXTOOLS 2 R package to calculate the D-statistics.

We performed genetic clustering of the ancient individuals using hierarchical community detection on a network of pairwise IBD-sharing similarities103. To facilitate the detection of clusters at a finer scale, we ran IBDseq (v.r1206) on a dataset restricting to ancient samples only, and applied more lenient filters of imputation INFO score > 0.5, and minimum IBD segment length of 1 cM. We constructed a weighted network of individuals using the igraph104 package in R and the fraction of the genome shared between them as weights. We then performed iterative community detection on this network using the Leiden algorithm105 implemented in the leidenAlg R package (v1.01; https://github.com/kharchenkolab/leidenAlg). We used a resolution parameter of r = 0.5 as the starting value for each level of community detection. We split the network into the different communities if more than one community was detected. The resolution parameter was incremented in steps of 0.2 until a maximum value of 3 was reached. The initial clustering was completed when no more communities were detected at the highest resolution parameter, across all subcommunities. To convert the resulting hierarchy into a final clustering, we simplified the initial clustering by collapsing nodes into single clusters on the basis of observed spatiotemporal annotations of the samples. We note that the obtained clusters should not be interpreted as ‘populations’ in the sense of a local community of individuals, but rather as sets of individuals with shared ancestry. Although this approach is an oversimplification of the complex spatiotemporally structured populations investigated here, the obtained clusters nevertheless captured real effects, grouping individuals within restricted spatiotemporal ranges and/or archaeological contexts and recapitulating known relationships between clusters.

For the 100 sample ages in this study we use midpoint estimates of the calibrated and reservoir corrected probability distribution of the radiocarbon age (Supplementary Data 1; further 14C dates, associated isotopic measurements, calibrations and reservoir corrections are accessible in ref. In addition, 4). Focusing on estimating the interval between the two major population turnovers, we established a precise chronology using 81 radiocarbon dates from 64 Danish sites relevant to this particular interval (Supplementary Note 3). The high precision curve, combined with the results from the radiocarbon dates, is a result of a Bayesian approach. All models and data calibrations were performed using OxCal v4.485,86,87,88 and the calibration dataset from Reimer et al.89. The length of time between the first appearance of a Steppe-related farmer and that of a Anatolian farmer was determined using a phase known as the trapezoidal phase. We corrected the reservoir effect on bones with significantly increased isotope values (δ13C, −18.00 and δ15N, +12.00) directly in the models using previously defined reservoir ages as input and calculated the diet reconstruction estimates for the individual in 14C years based on the collagen isotope values (Supplementary Note 3 and Supplementary Figs. For a similar method, see refs. 93,93. For combining radiocarbon dates related to the same individual we used the R_Combine() function.

The linkage-disequilibrium based method DATES 44 was used to determine the mixture time for the FBC-associated individuals. We used hunter-gatherer individuals and early farmer individuals to estimate time for each individual from both Sweden andDenmark.

Strontium isotope analyses can provide a proxy for individual mobility102,103,104. The 87Sr/86Sr ratio in specific skeletal elements may reflect the local geological signature obtained through diet by the individual during early childhood and it will usually remain unchanged during life and after death105. We restrict our observations and interpretations of Sr variation to patterns that are only relative to our data due to ongoing controversies over the exact use of geographically-defined baseline values. Supplementary Data 2 contains the data from the 87Sr/86Sr ratios in teeth and petrous bones that were measured at the University of North Carolina- Chapel Hill. Supplementary note 5 has more details.

We used a high-resolution pollen diagram from Lake Hjby, Northwest Zealand108 to calculate the changes in vegetation cover during the 5000– 2,400 cal. period. The landscape-reconstruction algorithm is being used. The LRA was previously applied at low temporal resolution regional scale. This method is the first time that it has been used in a local scale to look at a pollen record of the Iron Age and Neolithic periods. There were 60 samples between 6,900 and 4,400 cal. bp were included and the temporal resolution between samples is approximately 40 years. Regional vegetation was estimated with the model REVEALS109 based on pollen data from six other lakes on Zealand (see Supplementary Fig. 6.1 Local scale vegetation is calculated and regional pollen rain is calculated for Hjby S. 25 wind pollinated species were estimated for average pollen productivity. The reconstructed cover for plant species were then combined into four land cover categories, crops (only cereals), grassland (all other herbs), secondary forest (Betula and Corylus) and primary forest (all other trees). The vegetation reconstruction from Højby Sø is used to illustrate the vegetation development at the Mesolithic/Neolithic transition in eastern Denmark. Supplementary Note 6 is an explanation of more details.