An XL-MS standard mimicking complex biological samplesOur XL-MS standard is based on human proteins or protein fragments produced in E. coli (see Supplementary Data 1 for a full list and SDS–PAGE quality control). It is designed to provide a controlled set of allowed and prohibited protein contacts. Since this standard is intended for method development and benchmarking on the liquid chromatography (LC)–MS level, the structural and physicochemical basis of protein contact formation is not relevant. Taking advantage of this fact, we deliberately did not consider biological PPIs. Instead, we divided the proteins randomly into 32 interaction groups with 8 proteins each and cross-linked each group separately with disuccinimidyl sulfoxide (DSSO) under conditions to maximize cross-link formation (Fig. 1). Specifically, proteins from the same interaction group were mixed pairwise in all possible combinations, incubated for 20 min at 50 °C to induce physical contacts, cross-linked and combined into a pooled sample for digestion, strong cation exchange fractionation and LC–MS analysis. Because of the pairwise mixing scheme and the reliability of heat-induced protein contact formation, each protein is interlinked to 7 predefined interactors, allowing up to 896 unique protein pairs (we will refer to them as PPIs for the remainder of this paper because on the LC–MS level they fulfill the same function as biological PPIs). Since we can clearly define which PPIs can and cannot form, this standard serves as a bona fide ground truth to calculate an empirical FDR at the PPI level.Fig. 1: Schematic workflow of the construction of the XL-MS standard.Proteins were allocated into 32 interaction groups with 8 proteins each. Within the interaction groups, proteins were cross-linked pairwise in all possible combinations, resulting in 28 PPIs per interaction group and 896 PPIs in total. All cross-linked samples were merged before digestion. Created with BioRender.com.The full analytical standard gives rise to 23,895 tryptic peptides (when considering three missed cleavages, minimum peptide length = 6 amino acids, peptide mass = 500–6,000 Da). Outside of the His-tag, two peptides were shared between three proteins, and 397 peptides (1.66%) were shared between two proteins. On the peptide/ResPair level, it is an additional advantage of heat-induced PPI formation that a wide array of protein conformations and binding interfaces is formed, increasing the probability that all possible lysine–lysine contacts will be stochastically sampled. Therefore, any interlinks between proteins within the same interaction group that a cross-link search engine may identify are highly likely to be true-positive hits, as demonstrated by simulations19 (Extended Data Fig. 1). Any identified interlinks between proteins from different groups must be false positives. We can also consider any intralinks within proteins as true positives, since, provided that the search space is sufficiently large, they are unlikely to be false positives, as previously shown in experimental datasets2,12,20 and confirmed in Extended Data Fig. 1. While this degree of certainty does not reach the requirements for a bona fide ground truth for individual cross-links, it is sufficient to make our analytical standard suitable as an experimental benchmark for FDR validation on the CSM and ResPair level.The analytical standard was split into four batches, each containing eight randomly allocated interaction groups (Supplementary Data 1). Batch 1 was set aside to guide the development of Scout (see ‘Developing an ANN-based cross-link search engine’). Batch 2 was used for internal testing to optimize our LC–MS method. Batches 3 and 4 were combined and used for benchmarking Scout and published XL-MS search engines (see ‘Benchmarking Scout on independent XL-MS standards’).Developing an ANN-based cross-link search engineOur well-controlled XL-MS standard yields datasets that can aid the development of algorithms for XL-MS data analysis. As an example, we introduce Scout, a cross-link search engine for identifying peptides cross-linked with cleavable reagents. Scout is an intuitive, user interface-controlled software that relies on ANNs to generate discriminant functions to score and rank identifications using several quality metrics optimized at the CSM, ResPair and PPI levels. Scout enables multitier FDR filtering at all levels. Scout’s workflow is shown in Fig. 2 and described in detail in Supplementary Notes 1 and 2.Fig. 2: Schematic representation of the cross-link identification workflow employed by Scout.Scout requires mass spectrometry raw data (MS2 spectra) and a protein sequence database as input. Cross-links are identified in two search steps—ion pair doublet searching and fast CSM searching—which are both described in Supplementary Note 1. The shortlisted peptide pair candidates for each MS2 spectrum are then subjected to refined spectrum scoring based on a set of sensitive quality metrics described in Supplementary Note 2. Finally, the results are filtered according to a user-defined FDR using a machine learning-based discriminant function at each tier of identification: CSMs, ResPairs and PPIs (Supplementary Note 3). The final output is presented through a graphical user interface (GUI), providing a user-friendly display of the identified cross-linked peptides and their associated metrics.We used the batch 1 XL-MS dataset to guide the development of the ANNs (Supplementary Notes 3 and 4). Batch 1 comprises 1,409,900 MS2 spectra, which is similar to proteome-wide XL-MS studies in intact human cells (for example, 1,150,447 MS2 spectra from HEK293T cells, see ‘Data Availability’ statement). This increases our confidence that the batch 1 dataset can mimic a proteome-wide XL-MS experiment with regard to MS2-level complexity.Benchmarking Scout on independent XL-MS standardsIn addition to aiding software tool development, the datasets derived from our standard can serve as a ground truth: we know for each detected cross-link whether it is allowed (interlinks and intralinks within one interaction group) or not allowed (between-group interlinks, or intralinks of proteins not present in the respective group) according to our mixing scheme (Fig. 1). This information enables us to calculate an empirical FDR similar to a target-decoy FDR15, which offers the opportunity for an unbiased benchmarking of XL-MS search engines.We compared Scout with the following widely adopted XL-MS search engines that are compatible with MS-cleavable cross-linkers: MaxLynx, MSAnnika, XlinkX PD, MeroX and xiSEARCH with the xiFDR module. We benchmarked these tools with two standard datasets (batches 3 and 4) that were used neither during Scout development nor LC–MS method optimization. To avoid that a protein was assigned to the wrong interaction group because of nonunique peptides within the dataset, we allowed more than one possible group per peptide if the peptide was shared between homologous proteins.We varied the search space size, using a small 540-protein database and a large 4,000-protein database, both comprising the proteins present in our standard and randomly selected human entrapment proteins from SwissProt with <85% sequence identity to our recombinant proteins (Supplementary Table 1). We selected the same search parameters for each software wherever possible, and set a 1% FDR cutoff within each software (referred to as software-defined FDR) with separate FDR filtering for inter- and intralinks. If a software tool did not provide software-defined FDRs on all identification levels (CSM, ResPair, PPI), we took the software-defined FDR-filtered identifications from the next lowest level and aggregated them to the higher level without any postprocessing filters and score cutoffs (referred to as ‘post hoc aggregated results’). A full description of the search settings is provided in Supplementary Table 2. For clearer visualization, results are presented separately for interlinks (Figs. 3 and 4) and intralinks (Extended Data Fig. 2).Fig. 3: Benchmarking Scout against other XL-MS search engines for interlink identifications.Number of identified interprotein CSMs, ResPairs and PPIs and the empirically determined FDR at a software-defined 1% FDR cutoff using a 540-protein or 4,000-protein database and identical search parameters, including K as the only cross-linking site. Framed bars mark post hoc aggregated results, that is, cases when CSMs were aggregated to unique ResPairs or unique ResPairs to unique PPIs because search engines do not control FDR at these levels (MeroX and MaxLynx do not report FDR-controlled ResPairs; MeroX, MaxLynx, MSAnnika do not report FDR-controlled PPIs). Blue bars show true-positive identifications, yellow bars show false-positive identifications, violating the mixing scheme of our XL-MS standard.Source dataFig. 4: Benchmarking of XlinkX PD, xiSEARCH/xiFDR and software processing times.a–c, Interprotein CSMs, ResPairs and PPI identifications when comparing Scout and XlinkX PD. True-positive identifications by XlinkX PD are shown in light brown (540-protein database) and dark brown (4,000-protein database). The Scout numbers (blue diamonds) are the same as in Fig. 3. In addition to using our standard search parameters, XlinkX PD identification were postprocessed using a static score cutoff (‘default’) (a), score cutoffs derived from the highest scoring CSM-level decoy in every analysis (‘dynamic’) (b) and score cutoffs set to filter XlinkX PD results to 1% empirical FDR (c). The XlinkX score cutoffs are displayed below the bars. Both Scout and XlinkX PD considered K as the only cross-linking site. d, Interprotein CSMs, ResPairs and PPI identifications when comparing Scout and xiSEARCH. For Scout, results were filtered at 1% software-defined FDR on all levels. For xiSEARCH/xiFDR, following the developer’s recommendation, a 1% software-defined FDR was applied only on the PPI level using boost between proteins (xiFDR) and reported are the resulting PPIs together with their corresponding CSMs and ResPairs. Scout and xiSEARCH were run using their default parameters, respectively, with KSTY as the possible reaction sites for the cross-linking reagent. In a–d, the framed percentage numbers indicate the final empirical FDR and yellow bars show false-positive identifications, violating the mixing scheme of our XL-MS standard. e, Processing time in minutes (min) using different search engines on the benchmarking dataset with a 1% software-defined FDR cutoff on a computer with 512 GB RAM and powered by dual Intel(R) Xeon(R) Gold 6136 CPUs operating at 3.00 GHz. xiSEARCH/xiFDR did not run to completion on this hardware setup when using the full benchmarking dataset. Therefore, a separate Scout versus xi speed comparison using only four RAW files was performed and is shown in Extended Data Fig. 3b.Source dataComparing the identification numbers in Fig. 3 and Extended Data Fig. 2, all search engines identified 1–2 orders of magnitude more intralinks than interlinks (on CSM and ResPair level), which is similar to our previously reported numbers for DSSO cross-linking of human cell lysates2. For Scout, the fraction of interlinks was ~5% at the CSM level and ~11% at the ResPair level. Interlinks identification by XlinkX PD and xiSEARCH/xiFDR had to be analyzed separately as explained below.For intralinks, all search engines properly controlled the FDR below 1% on the CSM and ResPair levels, except for MeroX (FDR > 20%). Since intralinks are used for investigating protein structure rather than PPIs, the most relevant information comes from the ResPair level, for which Scout shows the best overall performance (Extended Data Fig. 2).For interlinks, Scout outperformed MaxLynx, MSAnnika and MeroX on the CSM and ResPair levels, yielding the smallest empirical FDR and highest true-positive identification numbers, irrespective of the database size (Fig. 3). At the PPI level, MSAnnika reports more true-positive PPIs than Scout, but at a substantially higher empirical FDR (12–15%, Fig. 3, right panels), conceivably because MSAnnika does not include a dedicated FDR control at the PPI level. This emphasizes the importance of controlling FDR at all identification levels15. Even at this inflated FDR, MSAnnika only identifies around 300 true PPIs, substantially fewer than the 448 PPIs that are theoretically in the benchmarking dataset (batches 3 and 4) according to our mixing scheme (Fig. 1). Scout, at 1% empirical FDR, identifies 195 PPIs, that is, 43.5% of the theoretical maximum. Such incomplete PPI coverage is also always observed in biological proteome-wide XL-MS datasets, where it results from the relative sparsity and low abundance of cross-linked peptides compared to linear peptides21.The interlink comparison of Scout and XlinkX PD was done separately, because the XlinkX PD output is substantially affected by postprocessing settings that are not part of our standard search parameters. In particular, the FDR in XlinkX PD is strongly influenced by heuristic score cutoffs, which were shown to depend on dataset and search parameters22. Recent studies made differing recommendations for a static minimum XlinkX score10,23,24, whereas XlinkX PD developer recommendations indicate that setting a dynamic score cutoff based on the score of the best CSM-level decoy hits in each analysis may be preferable (Methods). We tested both of these options, using either a static score XlinkX cutoff of 60 or a dynamic score cutoff (Fig. 4a,b). We also evaluated how the identification numbers change when increasing the XlinkX score cutoff until 1% empirical FDR is reached (Fig. 4c). On the identification level, Scout outperformed XlinkX PD on CSMs and ResPairs, whereas XlinkX PD identified more true-positive PPIs in most settings. However, on the confidence level, XlinkX PD strongly depends on manually setting suitable score cutoffs during postprocessing, while Scout is able to maintain an empirical FDR < 2% on all identification levels (Fig. 3). Furthermore, XlinkX PD identifications are more susceptible to entrapment and less sensitive in large database searches. At 1% empirical FDR, XlinkX PD reports a higher number of PPIs than Scout when searching against the 540-protein database, but Scout outperforms XlinkX PD in the 4,000-protein database search (Fig. 4c). These results suggest that Scout is more robust and suitable for identifying PPIs against large sequence databases.The comparisons of Scout to xiSEARCH/xiFDR were also performed separately because searches against the small 540-protein database with xiSEARCH/xiFDR did not run to completion after several weeks when using our standardized settings and server equipment. This is in line with previous reports that proteome-wide applications of xiSEARCH/xiFDR on in-house computers are highly restricted10. To still compare Scout to xiSEARCH/xiFDR, a subset of the benchmarking dataset was run on a computer cluster using the 540-protein database and developer-recommended parameters (Methods). Importantly, these parameters included accepting KSTY as cross-linking sites. Therefore, Scout was also run with KSTY specificity in this specific case (other than that the same parameters as in Fig. 3 were used). Scout still reported more correct identifications on all levels when filtering the data with a software-defined 1% or 5% FDR cutoff (Fig. 4d and Extended Data Fig. 3a).We also compared the data processing times of all tested software by searching our benchmarking dataset (batches 3 and 4) on a computer equipped with 512 GB RAM and powered by dual Intel(R) Xeon(R) Gold 6136 CPUs operating at 3.00 GHz. Scout was substantially faster than the other tools in small and large database searches, and showed the smallest speed decline with increasing database size (Fig. 4e). To be able to compare the processing time of xiSEARCH and Scout using the same computational setup, we limited the searches to four RAW files of the benchmarking dataset and used the default parameters for Scout and xiSEARCH (Extended Data Fig. 3b). Scout processed the data >200 times faster than xiSEARCH. Importantly, while this high-capacity server was needed to meet the RAM demands of some of the tested tools, Scout operates efficiently with a small memory footprint and is well suited for desktop PCs with as little as 16 GB RAM (see also Extended Data Fig. 4). Thus, Scout provides high sensitivity, specificity and speed on all three levels of XL-MS identifications, irrespective of the search space.Next, we tested how increasing the search space and adding entrapment sequences impacts the overall performance of Scout when using standard parameters and a 1% software-defined all-level FDR cutoff. Scout maintains a low empirical FDR on all levels at most tested database sizes (Extended Data Fig. 4a). The number of identified interprotein CSMs, ResPairs and PPIs decreases as expected. For example, moving from an entrapment four times higher than the number of experimentally available proteins to an entrapment 160 higher reduces the PPI identifications by 23%. However, our subsequent analyses indicate that this decrease is less drastic than for other search engines (see Supplementary Table 3 and related discussion below). Meanwhile, the search of 55 RAW files with 540 protein entries takes only 2 h on a desktop computer (Intel Core i7 2.90 GHz, 16 GB RAM). When searching the same RAW files against a 35 times larger database (20,622 protein entries), Scout shows an acceptable 4.7-fold processing time increase, to ~9.5 h, demonstrating that it operates efficiently even in large search spaces and is compatible with a standard desktop PC (Extended Data Fig. 4b).We further evaluated Scout’s performance on a published small-scale dataset.10 Here, a synthetic peptide main library was cross-linked with DSSO according to a mixing scheme. This allows FDR assessments at the CSM and ResPair levels10. We compared Scout to the best-performing software reported in the original publication (MSAnnika). Using the same search parameters as above and the database(s) provided in the original publication, we obtained highly similar ResPair-level results for both tools (Fig. 5a). For the nonoverlapping identifications, however, Scout achieves a lower empirical FDR.Fig. 5: Performance of Scout on published XL-MS benchmarking datasets from Matzinger et al. using synthetic peptides and Lenz et al. using fractionated E. coli lysate.a, Overlap of ResPairs identified by Scout and MSAnnika and the true FDR of Scout-specific (left), shared (middle) and MSAnnika-specific (right) identifications using the DSSO main library from Matzinger et al.10 and our standard search parameters, which are similar to the ones reported in the original publication. b, Scout’s true-positive (blue) and false-positive (yellow) ResPair-level identifications from the DSSO main library spiked 1:5 into tryptic HEK peptides when searched on increasingly large databases. The software-defined FDR cutoff was set to 1%; empirical FDR and operating times are indicated above the bars. c, Overlap of PPI-level identifications from Scout (left) and xiSEARCH (right) using the PPI benchmarking dataset by Lenz et al.12 and a 1% separate software-defined FDR cutoff on the PPI level. Scout was operated with standard parameters and xiSEARCH identifications were retrieved from the original publication. Empirical FDR was determined using the procedure suggested in the original publication.12 d, Performance of Scout and xiSEARCH in identifying intra- and interprotein CSMs, interprotein CSMs only, interprotein PepPairs (peptide pairs) and PPIs when setting an all-level software-defined FDR cutoff of 1%. Scout was operated with standard parameters and xiSEARCH identifications were retrieved from the original publication. The empirical FDR was calculated as described by Lenz et al. and is indicated above the bars.Source dataIn addition to generating a peptide library, the authors mixed their synthetic peptides with non-cross-linked tryptic HEK cell peptides at a 1:5 ratio to generate a sample with a realistic background of linear peptides, as one would expect from complex interactomics experiments10. From this sample, Scout identifies more unique true-positive ResPairs than all search engines reported in the original paper10, and maintains the lowest FDR and highest identification numbers in searches against databases with 671–20,334 protein entries (Supplementary Table 3). Furthermore, Scout maintains its efficiency and speed in a scenario with higher entrapment: increasing the search space shows that larger database sizes, as expected, reduce the number of ResPair identifications but do not substantially impact the FDR or processing time (Fig. 5b), confirming the results obtained with our own XL-MS standard (Extended Data Fig. 4).Finally, as the dataset from Matzinger et al.10 provides only limited PPI-level information, we turned to the dataset from Lenz et al.12, which aims to provide a PPI-level quasi-ground truth based on the abundance of proteins in cross-linked size exclusion chromatography fractions of E. coli lysate. As this dataset was used to advance xiSEARCH/xiFDR, we compared this software’s performance to Scout. Scout identifies more PPIs compared to the xiSEARCH result reported in the original publication (Fig. 5c). On all identification levels, both software tools show a low empirical FDR (following the definition from the original publication), but Scout provides more identifications (Fig. 5d).Testing Scout on biological XL-MS dataFinally, we assessed Scout’s performance on proteome-wide XL-MS data of biological samples by comparing it to MSAnnika (see also Supplementary Note 5). We first performed an entrapment experiment, searching a published XL-MS dataset from intact human mitochondria cross-linked with the enrichable, MS-cleavable Azide-A-DSBSO cross-linker25 against a database containing equal numbers of human mitochondria and E. coli proteins (Fig. 6a). Confirming the trends observed in the benchmarking with our standard datasets, Scout identifies the most CSM and ResPair hits, while providing fewer, but more stringently FDR-controlled, PPI identifications. To illustrate the effect of Scout’s PPI-FDR filter, we also report Scout PPI identifications aggregated from the ResPair level (that is, the same approach as used for MSAnnika). The aggregated results of both search engines are highly similar in terms of identification number and FDR, showing that stringent PPI-FDR control is a direct consequence of Scout’s dedicated filter.Fig. 6: Application of Scout and MSAnnika to biological proteome-wide XL-MS datasets.a, Entrapment database search on a published dataset of Azide-A-DSBSO cross-linked human mitochondria25. The data were searched against 2,000 random human mitochondria proteins sampled from a linear peptide search on the XL-MS data, supplemented with 2,000 random E. coli BL21 protein sequences. Interspecies cross-links and E. coli cross-links were considered false. Percentages indicate the resulting empirical FDR. b, Evaluation of PPIs identified from a HEK cell Azide-A-DSBSO XL-MS dataset. Brown, light blue and dark blue correspond to different STRING confidence score ranges. Yellow represents identifications that could not be found in STRING or that are considered impossible because they match to the Negatome database. In a and b, PPI-level results for Scout were either determined using the PPI-FDR filter (Scout) or by aggregation of ResPairs to unique protein pairs (Scout*). The second approach was also used for MSAnnika. c, ResPair interlinks per PPIs identified with MSAnnika and PPI-FDR-controlled Scout on the Azide-A-DSBSO HEK dataset. d,e, Cα–Cα distances of ResPair interlinks identified by Scout (blue) and MSAnnika (brown) when mapped on AlphaFold-Multimer models of their identified PPIs. For each PPI, the model with the highest cross-link satisfaction was used for analysis. Shown are all interlink Cα–Cα distances that can be mapped on AlphaFold-Multimer models with a model confidence of at least 0.5 (d), as well as the spread of interlink Cα–Cα distances for different ranges of AlphaFold-Multimer model confidence (e). In both cases, only interlinks between residues with a pLDDT score above 50 (indicating an ordered protein region) are considered. Boxes in e range from first to third quartile with the median indicated as a horizontal line. Whiskers represent 1.5 times the interquartile range. The violin plot shows that full data distribution, including minima and maxima.Source dataFurthermore, we generated a deep XL-MS dataset from HEK293T cells cross-linked with Azide-A-DSBSO. As there is no available ground truth information, we compared the identified PPIs to STRING26 and the Negatome database of noninteracting proteins27 (Fig. 6b). PPI-FDR-controlled Scout identified fewer PPIs than MSAnnika and ResPair-aggregated Scout, but the PPI-FDR filter slightly increased the fraction of medium-to-high confidence PPIs in STRING and reduced the number of Negatome hits. To begin to understand the reasons behind the reduction in PPI identifications when applying the Scout PPI-FDR, we analyzed how many ResPair interlinks support the PPIs found with Scout and MSAnnika. We found that the additional hits in MSAnnika mainly arise from PPIs supported by one ResPair interlink (Fig. 6c), suggesting that Scout’s PPI-FDR filter mainly removes PPIs arising from single observations.To assess the information contained in the ResPair-level identifications, we mapped the Scout and MSAnnika interlinks on AlphaFold-Multimer models of the PPIs identified by the two search engines (Fig. 6d,e). The structural accuracy of both search engines is highly similar, with MSAnnika performing marginally (2.4%) better on models with >0.5 confidence score (Fig. 6e). However, Scout finds approximately 40% more ResPair interlinks, suggesting that it can provide richer information for structural biology applications.