This README.txt file was produced by Bornhofen E. on 06 December 2022. 1. GENERAL - Title: "Single-nucleotide polymorphism (SNP) markers for the NORFAB panel of faba bean (Vicia faba L.) accessions using single primer enrichment technology (SPET)" - List of investigators for correspondence - Corresponding Investigator 1 Name: Stig Uggerhøj Andersen Institution: Aarhus University, Aarhus, Denmark Email: sua@mbg.au.dk - Corresponding Investigator 2 Name: Murukarthick Jayakodi Institution: IPK, Gatersleben, Germany Email: jayakodi@ipk-gatersleben.de - Corresponding Investigator 3 Name: Elesandro Bornhofen Institution: Aarhus University, Aarhus, Denmark Email: bornhofen@qgg.au.dk - Date of data collection: 2021/2022 2. INFORMATION ABOUT THE ACCOMPANYING FILE: NORFAB_chrAll_qual20_depth3_2snp_miss50_maf0.01_imputed_GD.csv - Latest update: 29 September 2022 - File content overview The file contains genotypes of 196 faba bean accessions for 323007 biallelic SNP markers. The data set was obtained after filtering out markers from the raw VCF file containing 1.86 M variants according to the following criteria: positions with overall quality (QUAL) score and mean mapping quality (MQ) lower than 20 were discarded, genotypes supported by 3 or fewer reads were set to missing following a filter for missingness tolerating up to 50% of missing values. Finally, multiallelic and monomorphic sites as well as indels and variants in close proximity to indels (within 10 bp) were discarded following a filter based on minor allele frequency of 0.01. VCF handling and all filtering steps were performed using BCFtools 1.15.1 (https://doi.org/10.1093/gigascience/giab008). The remaining missing positions were imputed using Beagle 5.2 (https://doi.org/10.1016/j.ajhg.2018.07.015). - Missing data codes: No missing data present. - Dimensions: 196 x 323008 - Orientation of the allele effect: The conversion of nucleotides to numbers was performed using the numericalization function of the GAPIT3 R package (https://zzlab.net/GAPIT/; accessed: 29/09/2022). Therefore, the sign of the allelic effect is based on the alphabetic order of the nucleotides. For example, if at a given site the reference allele is "G" and the alternative allele is "T", then "T" is the favorable allele and is going to be coded as 2 for the homozygous "1/1", 0 for the homozygous "0/0", and 1 for the heterozygous. 3. INFORMATION ABOUT THE ACCOMPANYING FILE: NORFAB_chrAll_qual20_depth3_2snp_miss50_maf0.01_imputed_GM.csv - Latest update: 29 September 2022 - File content overview The file contains the genetic map, i.e., information on the physical position of each SNP in the genome. It has three columns, where the first contains SNP names (in the same order as in the previously described genotype file). The second and third columns contain the chromosome and base pair position, respectively. - Missing data codes: No missing data in this file - Dimensions: 323007 x 3 4. INFORMATION ABOUT THE ACCOMPANYING FILE: core_SPET_chr_imput_SNPs.vcf_lifted.vcf.gz - Latest update: 29 September 2022 - File content overview The file contains imputed raw SNPs discovered using SPET data from 197 Core accessions and 7 other genotypes. The SNP position are based on the original chromsomes. The above tools were used for SNP calling and imputation. - Missing data codes: No missing data in this file - Dimensions: 204 x 1081031 4. FINAL RELEVANT INFORMATION Note that the original chromosome 1 was split into two parts, referred to here as chromosomes 1 and 2. Consequently, the codes for the remaining chromosomes shifted one unit and the total number is now seven. Both files described here, the GD and GM csv files, are ready to be used for GWAS analysis with the GAPIT3 package, provided the response vector.