The global geography of artificial intelligence in life science research

Dimension one: productivityWe begin our assessment of productivity by documenting an exponential increase in global AI life science publications (Fig. 1), quantified by a 20% annual growth rate since 2010.Fig. 1: Evolution of the AI research enterprise in the life sciences.Yearly counts of articles (n = 397,967) with AI-related keywords in titles or abstracts from 2000 to 2022. Growth refers to the compound annual growth rate (CAGR 2010–2022). Source data are provided as a Source Data file.Continuing with the first dimension of our atlas, we show a geographical concentration of AI life science research in the US (101,195 articles), followed by China (73,129 articles), together accounting for about 44% of cumulative productivity between 2000-2022 (Fig. 2). Of note, 2020 marks the first year in which China has surpassed the US in the number of publications per year in our dataset (see dynamic online graph for details). In terms of cumulative productivity, there is a marked gap between the US, China, and the next tier of countries, which is led by the United Kingdom (21,215 articles), Germany (18,759 articles), Japan (15,263), Canada (12,578 articles), India (12,560 articles), and South Korea (12,264 articles). Select countries, like India, show differences between their productivity in life science journal publications versus computer science conference publications with a life science focus. We provide a table showing all countries’ individual productivity statistics in the Supplementary Material S2. While the regions of Asia, Europe, Northern America, and Oceania all tangibly contribute research, countries in Africa and Latin America show moderate-to-low involvement in the AI life science research enterprise. These data underscore two concerns: An almost bipolar geographic concentration of AI research productivity, led by the US and China, while countries from Africa and Latin America remain little involved in AI life science research.Fig. 2: Geography of the AI life science research enterprise in terms of productivity.Counts of AI-focused life science articles by country, cumulated for the years 2000 to 2022 (n = 397,967). Source data are provided as a Source Data file.We next consider whether the observed geographic concentration goes in hand with a concentration in research topics and underlying capabilities, which may cater to productivity advantages of some countries over others. In a first step, we assign articles to content categories available from the OpenAlex database. We provide further details on the categorization in the Methods and in the Supplementary Material (S3). We focus our analysis on the 40 most frequent content categories in our dataset, representing, on average, two-thirds of AI life science research across the 40 most productive countries. These 40 countries collectively account for 96% of global productivity in our data. To examine the resulting content-by-country (40 × 40) data matrix, we create a heatmap visualization in Fig. 3. The individual cells of the heatmap contain the share of a country’s publications for a specific content category relative to all publications by the same country. This share, expressing nations’ research foci, also defines the heatmap’s color, with darker shading representing less focus and lighter shading greater focus. The heatmap first indicates that there are many fields that yet stand to gain from further AI applications, indicated by the broad space covered by darker coloring across world regions. Looking at the most productive AI life science research categories, such as computer vision, computational biology, neuroscience, internal medicine, statistics, radiology, and surgery, there is a global focus rather than geographic specialization. Thus, topic specialization does not appear to be driving the concentration of productivity visible in Fig. 2.Fig. 3: Heatmap of relative country focus with respect to publication topics.The horizontal axis enlists the 40 most productive countries grouped by geographic region. The vertical axis depicts the underlying publication topics in descending order (computer vision being the most frequently researched topic). The color scheme of the heatmap reflects the percentage share of country-specific productivity for a given publication topic (n = 397,967). Source data are provided as a Source Data file.Extending our productivity stratification for content, we assess the extent to which countries generally conduct clinical research with the application of AI. Clinical research is of particular interest because it reflects research with potential applications that more directly benefit human health. To identify clinical research, we rely on a search strategy proposed by Haynes and colleagues24,25, further described in the Methods. Overall, AI-focused clinical research accounts for about 20% of the articles included in our sample. Figure 4 depicts the geographic distribution across the 30 most productive countries together accounting for 94% of global production of clinical AI research. The primary vertical axis shows the share of a country’s clinical research articles relative to all clinical research articles globally (blue bars), while the secondary vertical axis shows the share of a country’s clinical research articles relative to all AI life science articles published by that country (orange bars). Comparable to general productivity, we observe the US and China account for about 45% of AI clinical research, with several countries from all world regions, except for Africa and Latin America, contributing tangibly to the clinical AI research enterprise. Consistent with the content analysis presented in Fig. 3, we also find that many countries devote 15–20% of their research efforts to AI clinical research.Fig. 4: Clinical AI research across countries.The share of a country’s clinical research relative to global clinical research production (primary y-axis) and relative to all publications within the same country (secondary y-axis) for the 30 most productive countries in terms of clinical articles (n = 67,167). Source data are provided as a Source Data file.Dimension two: quality-adjusted productivityNext, we examine whether the geographic concentration we observe in the number of publications is accompanied by a concentration in quality. Scientific progress tends to be driven by research of unusual rather than average quality26,27, traditionally motivating dedicated examinations of the right-hand tail of the research quality distribution28.To adjust for quality with a field-normalized approach, we use external rankings of journals and conferences. For journal articles, we consider articles published in one of the top three journals within a given journal category according to Clarivate’s Journal Citation Report. For conference proceedings, we consider articles published in a proceedings publication of conferences ranked “A*”, according to the CORE conference ranking29. For journal publications, this approach classifies about 8% of the research as appearing in high quality outlets, and for conference proceedings publications about 6% (S4).We find that the US, Australia, and several European countries contribute the largest shares of research in high-quality outlets over the period 2000–2022 (Fig. 5). Compared to general productivity, China, and other Asian countries, as well as countries in Latin America rank in the midfield towards the lower-end of the quality-adjusted productivity distribution. Africa, meanwhile, remains largely absent from this mapping due to overall low productivity, including in top-ranked outlets. A notable exception is Kenya, which has international collaborators on two-thirds of its publications placed in high-ranking outlets, while, for example, one-third of South African publications have international collaborators. We discuss the role of internationally collaborated research in a separate section below.Fig. 5: Geography of the AI life science research enterprise in terms of quality-adjusted productivity.Percentage shares of AI-focused life science articles published in high-ranked outlets by country, cumulated for the years 2000 to 2022 (n = 31,837). The analysis is limited to countries with at least 100 publications. Source data are provided as a Source Data file.Moving the analysis from the country level to the level of world regions, we seek to examine the consistency with which regions can contribute to AI-focused life science research published in high-ranking outlets. Figure 6 depicts relatively stable proportions of research that distinguish into two groups of regions. On the one hand, there is the group of Northern America, Europe, and Oceania that places consistently about 10% of their published research in high-ranking outlets. On the other hand, there is a group consisting of Asia, Latin America, and Africa, who publish about 5% of papers in these top-ranked outlets. Europe and Asia have shown opposite trends in recent years, with Europe gradually decreasing and Asia gradually increasing their respective shares of publications in high-quality outlets.Fig. 6: Geography of the AI life science research enterprise in terms of quality-adjusted productivity.Percentage shares of AI life science articles published in high-ranked outlets by geographic region and per year (n = 32,010). Source data are provided as a Source Data file.Dimension three: relevanceTo assess the third dimension of the atlas, we examine geographic variance in the relevance of the produced research. We conceptualize relevance as the extent to which focal publications inform (a) scientific progress (scientific relevance) and (b) clinical application (clinical relevance). We operationalize relevance via forward citations to the AI life science articles in our sample. As econometric model, we employ negative binomial regression models to account for the overdispersion of citation measures. We regress citation counts on dummy variables representing the six geographic regions, setting the most productive region, Asia, as the base category. We control for the publication year to account for the time a given article had to accrue citations. Figure 7 shows incidence rate ratios (IRRs) obtained from the negative binomial regression models. These ratios can be interpreted as percentage changes in the dependent variable, citations, given a one-unit change in the independent dummy variables, i.e., given the geography of the focal articles across the six world regions.Fig. 7: Geography of the relevance of AI life science articles in terms of forward citations in life science research (Figs. 7A, D) and clinical research (Figs. 7B, C).All panels depict incidence rate ratios (IRRs) with error bars for 95% confidence intervals obtained from negative binomial regressions of citations on dummy variables for the geography of the research, with the most productive region, Asia, serving as the base category. A (n = 397,965) and B (n = 397,965) show unadjusted estimates (only accounting for publication year), whereas (C) (n = 393,722) and (D) (n = 375,033) also include controls for quality variation across publishing outlets. As publishing outlets with all zero outcomes for the dependent variable (i.e., publishing research that is not cited) get automatically dropped from the analyses with quality controls, the sample sizes are smaller in (C and D). Source data are provided as a Source Data file.Scientific relevanceWe assess an article’s scientific relevance as the number of forward citations an article receives from general life science research articles. We find that AI-focused life science research produced in the world regions of Africa, Oceania, Europe and Northern America receives about 10% (95% confidence interval (CI) 6%–15%), 26% (95% CI 23%–29%), 20% (95% CI 19%–22%), and 40% (95% CI 38%–42%) more forward citations in general life science articles, respectively, than research created in Asia (Fig. 7A). Research produced in Latin America, in comparison, receives fewer forward citations than research from Asia.To adjust for the quality of the underlying research, we next include dummy variables for each journal and conference proceedings outlet in our regression model (i.e., outlet fixed effects). The inclusion of fixed effects adjusts for any geographical variance in research tied to the outlet, including quality ranking and subject matter published. In this adjusted model, with the exception of Africa and Latin America, world regions are no longer statistically different in terms of forward citations in downstream life sciences research (Fig. 7D). In other words, the citation differences between geographic regions appear to be largely explained by regional differences in research quality, which is consistent with the geographic variance in research quality shown in Fig. 5.Clinical relevanceUltimately, AI is expected to transform medicine. We therefore seek to analyze the influence of AI life science research on clinically applied research. Figure 7B shows that the regions of Oceania, Europe and Northern America receive a citation premium from downstream clinical research articles (about 13% (95% CI 8%–17%), 26% (95% CI 24%–29%), and 55% (95% CI 52%–57%) respectively), compared to research generated in Asia, analogous to the scientific relevance dimension (Fig. 7A). The greater number of clinical citations to AI life science articles from these three regions again appears to be explained by our approximation of the underlying research quality (Fig. 7C).Overall, the findings in Fig. 7 indicate that the differences in scientific and clinical relevance are driven by differences in quality rather than geographic bias in citation patterns. In other words, the cumulative knowledge-building process in the AI research enterprise appears to be largely unbiased with respect to the geographic location of the knowledge-creating researchers.International collaborationsLastly, we return to the argument that scientific progress is driven by collaborating on the best ideas, irrespective of the ideas’ geography. We analyze international collaborations in our dataset and define articles as international if at least two authors on the author byline are affiliated with institutions from different countries. We focus this analysis on the relevance dimension, because it is the best proxy for what kind of research informs the advancement of the global research enterprise.We again estimate negative binomial regression models with citations as dependent variables and a dummy variable for international collaboration as the core independent variable. In the analysis of a potential citation differential between research from international versus national collaborations, we control for three factors. We include dummy variables for the lead author country to account for regional variance in international collaboration. We control for the number of co-authors because larger author teams are more likely to include a co-author from another country and team size has been shown to correlate with citations7. Additionally, we control for the publication year of a focal article to account for the time it had to accrue citations.We find that articles stemming from international rather than national collaborations receive, on average, 21% (95% CI 20%–22%) more citations by general life science articles and 7% (95% CI 6%–8%) more citations by clinical life science articles (Fig. 8A). Of note, international collaborations also tend to publish 35% more frequently in high-ranking research outlets than national collaborations, on average.Fig. 8: Characteristics of international collaborations.The effect of international collaboration on scientific and clinical relevance (A); share of international collaborations over time (B); share of international collaborations by region (C). Incidence rate ratios (IRRs) with error bars for 95% confidence intervals obtained from negative binomial regressions of citations (n = 397,949) and clinical citations (n = 397,887) on a dummy variable for international collaboration, accounting for country of lead author, team size, and publication year (A). Percentage share of articles with at least two authors affiliated in different countries (n = 397,965) (B). Percentage share of articles with at least two authors affiliated in different countries by geographic region (n = 397,965) (C). Source data are provided as a Source Data file.Despite apparent benefits of collaborating across borders, the share of internationally collaborated research is with less than 20% over time relatively low and has come to stagnate in proportion (Fig. 8B). However, the extent to which regions engage in international collaboration varies. Figure 8C shows the share of publications that stem from international collaboration by region of the lead author. While African lead authors coauthor 36% of their publications with at least one collaborator from a different country, Asian lead authors do so for only 16% of their articles. Oceania (32%), Europe (27%), and Latin America (23%) range in between, whereas Northern America also tends to emphasize national over international collaborations (18%).To further contextualize this cross-regional variance in international collaborations, our final analysis characterizes the dyadic relationships between regions that engage in international collaborations. Figure 9 presents an alluvial diagram to show patterns of international collaboration, including, by construction, only the articles identified as international. We count each occurrence of a difference in geographic location separately and sum international collaborations to the regional level. In other words, if a lead author’s country of affiliation is the US and the lead author collaborates with co-authors from China and Germany, then we depict two lines in the alluvial diagram, one from Northern America to Asia and one from Northern America to Europe. The vertical bars on the left depict the sum of outgoing international collaborations from lead authors affiliated in the respective region, while the right vertical bars depict the incoming collaborations for non-lead authors from the respective region.Fig. 9: Alluvial diagram of international collaborations.Number of dyadic collaborations between authors from different countries, aggregated to the regional level. Dyadic collaborations are counted as co-authorships between a publication’s lead author (last author or first author otherwise) and any other author on the author byline that is from a different country. Only international dyads are considered (n = 105,258 dyads). Source data are provided as a Source Data file.Overall, Fig. 9 shows that Europe engages most frequently in international collaborations, both from an outgoing perspective (lead authors) as well as an incoming perspective (other authors), represented in the blue vertical bars on both sides of the diagram. European researchers most frequently collaborate with colleagues from the same geographic region, followed by Northern America. But Europe also appears to play an important role in partnering with African and Latin American researchers. Oceania’s international collaborations appear most pronounced with Asia. Africa collaborates frequently with European and Asian researchers. Latin America appears more varied in its international collaboration patterns, but also appears collaborating most frequently with researchers based in Europe. Northern American lead authors tend to mostly co-author with colleagues from the same region, followed by collaborations with Asia and Europe.

Hot Topics

Related Articles