Histones and histone variant families in prokaryotes

Identifying prokaryotic histonesTo find histones in archaea and bacteria, we used the protein annotation database InterPro30,31. InterPro classifies all proteins within the UniProt database into families, domains, and important sites based on their sequences. For histones, InterPro contains the histone-fold homologous superfamily, which serves as a comprehensive category for proteins exhibiting the histone fold. We retrieved all sequences that are part of this superfamily. To validate that these sequences indeed contain a histone fold and to gain insight into their potential function, we predicted monomer and multimer structures with AlphaFold227,28. The majority of our predictions have high confidence scores and deep multiple-sequence alignments (Supplementary Figs. 1–4). A general explanation on using AlphaFold2 and interpreting the confidence values is provided in the Supplementary. We identified a total of 5823 histones in prokaryotes, 25% of which are from bacteria. Half of the 5823 histones have not been previously identified. We refer to all histones as “histones” as they are all predicted to feature the characteristic histone fold structure (Fig. 1a). However, it is not clear whether they all function as global genome organizers. Histones form dimers in solution, whereby the α2 helices cross each other and the α1 and α3 helices are positioned on opposite faces of the dimer (Fig. 1b). The dimer represents the smallest functional unit and can bind 30 base pairs of DNA at its α1 face. Eukaryotic histones form nucleosome structures by linking dimers consecutively through their α3 helices and the last 8 C-terminal residues of their α2 helices, and possess long N-terminal tails (Fig. 1c, d). These tails are generally lacking in archaeal nucleosomal histones, although we find several nucleosomal histones in the Asgard archaea phylum that have long disordered N-terminal tails (Supplementary Fig. 5). Most new histones differ significantly from conventional nucleosomal histones in sequence and quaternary structure. We manually reviewed every prediction and its confidence values. We grouped histones together if they are predicted to form similar quaternary structures and if the predicted interface for this multimer has low predicted aligned errors (<10Å) or if the histones share similar additional domains. Histones that are similar in sequence to the recently described bacterial histone Bd0055/HBb were classified into the group “bacterial dimer histones” as they lack unique quaternary structures. In this manner, we subdivided all prokaryotic histones into 17 categories, 13 of which have not been identified before (Supplementary Table 2). To visualize the sequence space of prokaryotic histones, we clustered all histone sequences based on their all-against-all pairwise sequence similarity in a two-dimensional space using the CLANS software32 (Fig. 2a). The histone categories determined by structure prediction are visualized in the CLANS map; they overlap well with the sequence similarity-based clusters.Fig. 1: The conventional histone protein forms nucleosomes.a The histone fold (PDB: 1A7W9). The N-terminal α1, central α2, and C-terminal α3 helices are colored blue, orange, and green respectively. Linkers L1 and L2 connect the α1 and α3 helices to the central α2 helix. b The histone dimer binds DNA at its α1 face (PDB: 5T5K13). c The eukaryotic H3-H4 tetramer (PDB: 1AOI2). Only the core histone fold is visualized. d The eukaryotic octameric nucleosome (PDB: 1AOI2).Fig. 2: Alternative histones are diverse and found across prokaryotes.a Clustered sequence space of prokaryotic histones. Clustering was performed with CLANS. The color of each line indicates the sequence similarity between the two sequences; sequences that are connected by darker lines are more similar than those connected by lighter lines. Clusters are colored based on the histone category to which they belong as determined by the AlphaFold2 predictions. For a short description of each histone category, see Supplementary Table 2. b Cladogram of archaea showing the distribution of nucleosomal (Nuc), face-to-face (FtF), coiled-coil (CC), and Methanococcales (Mc) histones across different phyla. The cladogram is based on GTDB version 207. For reference, Methanobacteriota A contains the Methanopyri and Methanococci classes; Methanobacteriota B contains the Thermococci and the Methanofastidiosa classes.Based on sequence and structure similarity, we have identified two major prokaryotic histone families: the nucleosomal histones and the α3-truncated histones, referred to from now on as α3 histones. The α3 family compromises five histone categories, each of which is likely related to a different function as they differ in predicted quaternary structure and the presence of additional domains: face-to-face, bacterial dimer, ZZ, Rab GTPase, and phage histones. These categories are discussed in more detail below. α3 histones are defined by a truncated α3 helix, which, compared to the 10 amino acid long nucleosomal α3 helix, consists only of 3 to 4 amino acids. Furthermore, their α2 helix is 4 to 5 amino acids shorter compared with nucleosomal histones. α3 histones are related in structure and sequence, as the five categories that make up the superfamily cluster close together in the CLANS map (Fig. 2a). While nucleosomal histones exist exclusively in archaea, about 40% of α3 histones are from bacteria (Supplementary Fig. 6). However, they are not well conserved within the bacterial domain as only 1.15% of bacterial proteomes in UniProt contain an α3 histone, in contrast to archaea where α3 histones are found in almost all phyla. While nucleosomal and α3 histones present the two major families, only 65% of histones belong to either of these two families. The remaining histones are part of minor, highly diverse histone categories. Some minor histones appear to have dramatically changed their architectural properties, possibly bridging DNA instead of wrapping it, based on their predicted multimer structures. Other minor histones appear to have lost their DNA binding ability, as they lack identifiable DNA-binding residues and have instead gained transmembrane domains (Supplementary Fig. 7). In the subsequent sections, we will go further into detail about some of these histones, focussing on the most prominent histone families, the α3 histones, and histones that likely bridge DNA.Face-to-face histonesFace-to-face (FtF) histones make up the largest subcategory of α3 histones (83%) and are, after nucleosomal histones, the largest group of histones in prokaryotes. FtF histones are found in the majority of archaeal phyla (Fig. 2b). In bacteria, FtF histones are predominantly found in the phyla Spirochaetota, Planctomycetota, Bdellovibrionota (class Bacteriovoracia), and Myxococcota (Supplementary Figs. 6 and 8). FtF histones are defined by their predicted tetramer structure (Fig. 3a). In the tetramer, two dimers interact in a manner similar to nucleosomal histones, via their α3 helices and the last  ~8 amino acids of their α2 helix. Unlike nucleosomal histones, both sides of the dimer interact, forming a torus. To gain more insight into important residues, we aligned all FtF histones, constructed an HMM profile, and visualized this profile as an HMM logo (Supplementary Fig. 9). The HMM logo shows which residues are strongly conserved between FtF histones and are probably of functional importance. The strongest conserved residues are found at the C-terminus of the protein and include residues 48, 52, 54, 56, and 61 in the HMM profile (Supplementary Fig. 9). These conserved residues correspond to R41, N45, R47, T49, and D54 for FtF histone D4GZE0 from the archaeal model organism Haloferax volcanii (Fig. 3b, c). Residues R47, T49, and D54 form an RxTxxxxD motif which is also present in nucleosomal histones from archaea (Supplementary Fig. 10). The arginine and the aspartate are responsible for structuring the L2 loop, while the threonine, located in the L2 loop, is involved in DNA binding. Compared to nucleosomal histones, the FtF histone HMM profile lacks a strongly conserved DNA-binding arginine or lysine at position 55 (RKTxxxxD) (Supplementary Fig. 9). Residues R41 and N45 are responsible for the dimer-dimer interactions at the dyad. In the predicted D4GZE0 structure, N45 is located at the ‘front’ of the dimer-dimer interface and can form hydrogen bonds with R47 of the opposing dimer. R41 is located further back within the dimer-dimer interface and forms salt bridges with the carboxyl group of residue 55 of the opposing dimer (Fig. 3c). On the α1 helix, a lysine is strongly conserved at position 16 (Supplementary Fig. 9). This residue corresponds to K11 for FtF histone D4GZE0 (Fig. 3b). Conserved lysines can also be found at positions 12, 14, and 20 in the HMM profile, although with lower frequency. As these lysines are solvent-exposed, conserved, and positively charged, they are likely involved in DNA binding.Fig. 3: The face-to-face (FtF) histones form a unique tetramer structure.a The homotetramer of FtF histone D4GZE0 from Haloferax volcanii as predicted by AlphaFold2. Each residue is colored by its predicted local distance difference test (pLDDT) value. AlphaFold2 is confident in the local structure if the pLDDT is  >70. b The conserved RxTxxxxD motif, DNA binding residues K11, and tetramerization residue N45 of FtF histone D4GZE0. K11, N45, R47, T49, and D54 relate to K16, N52, R54, T56, and D61 in the FtF HMM logo (Supplementary Fig. 9). c The `back’ of the dimer-dimer interface of FtF histone D4GZE0. R41 relates to R48 in the FtF HMM logo (Supplementary Fig. 9). d Crystal structure of FtF histone HTkC from Thermococcus kodakarensis (PDB: 9F2C). e Our proposed model for how FtF histones bind and wrap DNA.To confirm the predicted tetramer structure of FtF-histones, we purified and crystallized the FtF histone from Thermococcus kodakarensis, which we refer to as HTkC. We solved the crystal structure at a resolution of 1.84 Å (PDB: 9F2C) via molecular replacement using the predicted structure as a search model (Fig. 3d). The asymmetric unit contains two histone dimers, that assemble into a torus-shaped tetramer. With a Cα RMSD of 0.652 Å, the HTkC crystal structure is virtually identical to the AlphaFold prediction (Supplementary Fig. 11), highlighting AlphaFold’s accuracy in predicting histone structures.To gain more insight into the possible function of FtF histones, we examined transcriptome data from Halobacteria, Thermococci, and Leptospirales, three well-studied taxa where FtF histones are common. Halobacteria contain one FtF histone on their chromosome and additional ones on their plasmids. The chromosomal FtF histones in H. volcanii and Halobacterium salinarum, HVO0196 and VNG2273H, respectively, are among the top 2% of the highest expressed genes across three out of four transcriptome datasets (Supplementary Fig. 12a–d). As the FtF histone is highly expressed, it is a likely candidate to be the unknown protein that causes the nucleosome-like organization of chromatin in Halobacteria. Electron microscopy images of the chromosomal fibers of H. salinarum show beads-on-a-string-like structures33. These beads were estimated to have a diameter of 8.1 ± 0.6 nm, somewhat smaller than the eukaryotic nucleosomes of 11 nm. Micrococcal nuclease digestion of crosslinked H. volcanii chromatin showed protected DNA fragments of 50 to 60 base pairs34, suggesting that this unknown protein binds 50 to 60 base pairs of DNA. The expression of this unknown protein is expected to be high as the H. volcanii genome is estimated to contain 14.2 nucleosomes per kilobase, 2.7 times higher than the 5.2 nucleosomes per kilobase in Saccharomyces cerevisiae34. Not all organisms with FtF histones show such high expression levels. The FtF histone, HTkC, in T. kodakarensis is part of the top 7% of the highest expressed genes, with an expression level that is 34 times lower than that of hypernucleosome histone HTkA (Supplementary Fig. 12e). The transcriptome of the related organism Thermococcus onnurineus was measured in three different conditions: in yeast extract-peptone-sulfur (YPS), modified minimal-CO (MMC), and modified minimal-formate (MMF) media (Supplementary Fig. 12f–h). In YPS and MMF, the FtF histone shows a two to three times lower expression level than the other two nucleosomal-like histones, B6YSY3 and B6YXB0. In MMF, however, the FtF histone is expressed twice as highly as the nucleosomal histones, indicating that environmental factors might play a role in regulating the expression of FtF histones in hypernucleosome-containing archaea. Similar to Halobacteria, the FtF histone in the pathogenic bacterium Leptospira interrogans serovar Lai is among the top 2% of the highest expressed genes (Supplementary Fig. 12i). As we find no other known NAPs among the top 10% of expressed genes, the FtF histone might be the main architectural protein in Leptospirales. The FtF histone is essential to Leptospira interrogans serovar Lai, as attempts to delete the gene from its genome have failed25. The histone is found in both pathogenic and free-living saprophytic Leptospirales and always contains an N-terminal tail. Tails are common in bacterial FtF histones as opposed to archaeal FtF histones, which generally lack tails (Supplementary Fig. 13). There is no conserved tail sequence across bacteria; however, the majority of them are positively charged.Based on the aforementioned studies on H. volcanii, we propose a model for the binding of FtF histones to DNA. If the FtF histone is indeed the main architectural protein of Halobacteria, it binds 50 to 60 base pairs of DNA and forms beads-on-a-string-like structures with diameters of 8.1 ± 0.6 nm. To fit 50 base pairs, the DNA would have to wrap around the tetramer, similarly to nucleosomal histones (Fig. 3e). Furthermore, the diameter of this DNA-wrapped tetramer would be 8.3 nm, similar to the observed beads-on-a-string-like structures. However, one possible issue with this model is steric hindrance caused by the bending of DNA molecules close to each other.Minor α3 histonesIn addition to FtF histones, within the α3 histone family, there are four, smaller, categories: bacterial dimers, ZZ histones, Rab GTPase histones, and phage histones. The bacterial dimers are predominantly found in Bdellovibrionota (class Bdellovibrionia), Elusimicrobiota, Spirochaetota (class Spirochaetia), Planctomycetota, Myxococcota_A, and Chlamydiota (Supplementary Figs. 6b and 8). AlphaFold does not produce a confident multimer prediction larger than dimers for these histones, hence the name ‘bacterial dimers’ (Supplementary Fig. 14a). Closer inspection of the HMM logo shows that bacterial dimers lack the conserved residues R48 and N52, which facilitate the dimer-dimer interaction in FtF histones (Supplementary Fig. 15). Similar to nucleosomal histones, we find the RKTxxxD motif in bacterial dimers (Supplementary Fig. 14b). They also contain conserved lysines on their α1 helix at positions 11, 13, and 17, which possibly bind DNA. K17 is also conserved in nucleosomal histones as is an arginine (R20), while K13 is conserved in FtF histones (K16) (Supplementary Figs. 9, 10, and 15). The bacterial dimer HBb (locus name: Bd0055) in Bdellovibrio bacteriovorus HD100 is highly expressed, being part of the top 6% of highest expressed genes during its growth phase with an absolute expression level similar to IHF, HU, and SMC (Supplementary Fig. 16a). Furthermore, proteomics data show that HBb is highly abundant in the attack phase25. Based on crystal structures, in vitro characterization, and molecular dynamics data, we have recently demonstrated that HBb bends DNA, similar to members of the HU/IHF protein family26.ZZ histones, which are closely related to bacterial dimers, are predominantly found in Proteobacteria (Supplementary Fig. 8). They consist of a ZZ-type zinc finger domain at their N-terminus and a bacterial-dimer-like histone domain at their C-terminus (Fig. 4a). The ZZ-domain contains two conserved zinc-binding sites, a C4 and a C2H2 site, and is structurally similar to the eukaryotic ZZ-domains of HERC2 (a ubiquitin protein ligase), p300 (a histone acetyltransferase), and ZZZ3 (a histone H3 reader) (Fig. 4b–c and Supplementary Figs. 17–18). These eukaryotic proteins all participate in the post-translational modifications of histones. Intriguingly, these eukaryotic ZZ-domains bind to the tail of histone H3, potentially aiding their localization to nucleosomes35,36,37. If the function of the ZZ-domain is conserved, it may bind H3-like tails in bacteria too. In fact, H3-like positively charged tails are found in the majority of bacterial FtF histones (Supplementary Figs. 13, 19 and 20). However, the binding pocket residues of the eukaryotic ZZ-domain are not conserved in the bacterial variant (Supplementary Fig. 21). Given that only 50% of the proteomes that have a ZZ histone also contain a second histone and that bacterial histone tails lack a conserved sequence, it seems unlikely that the ZZ-domain binds to tails of other histones. Transcriptome data from B. bacteriovorus HD100 reveal very low expression of the ZZ histone during both attack and growth phases (top 43 to 70% of the highest expressed genes) (Supplementary Fig. 16). Similarly, proteomics data indicate a low abundance of the ZZ histone during the attack phase as well as in the host-independent strain HID13, which lacks the attack phase25.Fig. 4: ZZ-type zinc finger histones can possibly bind two zinc ions.a The ZZ-histone D0LYE7 from Haliangium ochraceum SMP-2 as predicted by AlphaFold2. ZZ-histones contain a ZZ-type zinc finger domain at the N-terminus and an α3 histone fold at the C-terminus. Each residue is colored by its pLDDT value. b The ZZ domain of D0LYE7. This domain contains two zinc binding motifs, one C4 motif (C20, C23, C51, C53) and a C2H2 motif (C36, C41, H59, H65). These residues correspond to C10, C13, C41, and C43 for the C4 motif and C26, C31, H49, and H55 for the C2H2 motif in the HMM profile (Supplementary Fig. 18). Each residue is colored by its pLDDT value. c The ZZ domain of HERC2 (gray, PDB: 6WW435) aligned to D0LYE7.The last two minor α3 histone types are the Rab GTPase and phage histones. Rab GTPase histones are found exclusively in Lokiarchaeota. They contain a FtF-like histone at their C-terminus and a Rab GTPase domain, a subfamily of Ras GTPases involved in regulating membrane trafficking pathways, at their N-terminus (Fig. 5a). The closest homologs of these Rab GTPase domains are the small Rab GTPases from eukaryotes. GTPase-related histones, which are also found in eukaryotes, include the eukaryotic Ras activator Son of Sevenless (SOS) that contains a double histone domain38. This histone domain binds to lipids and regulates SOS activity to activate Ras GTPases39,40. Whether the archaeal Rab GTPase histones similarly bind to lipids is unclear as the histone domain in SOS has no detectable sequence similarity with the Rab GTPase histones. Structurally the SOS histone domain is more similar to the H2A/H2B heterodimer than to the Rab GTPase histones. The last minor α3 histone type, the phage histones, are found in prokaryotic dsDNA virus metagenomes and bacterial metagenomes. In the viral metagenomes we find tail proteins, suggesting that these bacteriophages are part of the Caudovirales order. Since some bacterial metagenomes also contain the histone, the bacteriophage might be a prophage. The phage histones contain an α3 histone fold at their N-terminus and an alpha-helical domain at their C-terminus. In the tetramer prediction, the C-terminal domains form a tetramerisation domain (Fig. 5b). The histone domain lacks the RxTxxxD motif and shows low sequence identity with other histone types (Supplementary Fig. 22). It contains two conserved residues that possibly bind DNA: K10, and K49, which correspond to the DNA binding residues K20 and K63 in the nucleosomal HMM logo (Supplementary Fig. 10). Interestingly, one of the highest conserved residues is an arginine on the side of the histone dimer (Supplementary Fig. 23). As the phage histone is predicted to form a tetramer structure through its C-terminal domain instead of through the histone folds, it may bridge two DNA duplexes.Fig. 5: α3 histones from bacteriophages or with eukaryotic-like domains.a The homodimer of Rab GTPase histone A0A0F8XJF6 as predicted by AlphaFold2. Rab GTPase histones contain a small Rab GTPase domain on the N-terminus and an FtF-like histone fold at the C-terminus. Each residue is colored by its pLDDT value. b The homotetramer of phage histone A0A2E7QIQ9 as predicted by AlphaFold2. Phage histones contain an α3 histone fold at the N-terminus and an α-helix which functions as the tetramerization domain at the C-terminus.DNA bridging histonesAmong the 17 prokaryotic histone categories, the predicted quaternary structures of four suggest potential DNA bridging capabilities. These are the Methanococcales, coiled-coil, RgdC, and the aforementioned phage histones. The Methanococcales (Mc) histones are exclusively found in the Methanococcales order and contain a tetramerization domain on the C-terminus which facilitates DNA bridging. This has been experimentally confirmed for the Mc histone MJ1647 from Methanocaldococcus jannaschii19. Coiled-coil (CC) histones are more widely distributed throughout archaea, being found in Aenigmatarchaeota, Altiarchaeota, B1Sed10-29, EX4484-52, Iainarchaeota, Methanobacteriota(_B), Micrarchaeota, Nanoarchaeota, Nanohaloarchaeota, and Undinarchaeota (Fig. 6a). CC histones have a long α-helix at the C-terminus of the histone fold, which is predicted to form a coiled-coil in the tetramer (Fig. 6b). The tetramer structure of CC histones bears structural resemblance to that of Mc histones, where the two dimers are positioned opposite each other and interact through their C-terminal domains. However, despite these structural parallels, there is little sequence similarity between CC and Mc histones. Based on the HMM logo, we identified 4 possible DNA binding residues on the α1 helix face of CC histones: R32, K35, R45, and K49 (Supplementary Fig. 24). Two of these, R32 and K35, correspond to the DNA binding residues R20 and K23 in the nucleosomal HMM logo (Supplementary Fig. 10). A common feature of CC histones is their highly negatively charged tails (Supplementary Figs. 26 and 27). These tails can be present at the C- and/or N-terminus and are predicted by AlphaFold to be disordered. Notably, the CC histone of model archaeon Methanothermus fervidus lacks these tails. The function of the tails remains unclear; however, as they are highly negatively charged, they might act as intramolecular inhibitors by occluding the DNA-binding α1 face of the histone. Transcriptome data show that CC histone of Methanobrevibacter smithii PS shows low expression, ranking in the top 53% of the highest expressed genes (Supplementary Fig. 28).Fig. 6: DNA-bridging coiled-coil (CC) histones are widely found in archaea.a Phylogenetic tree of CC histones. Clades are colored by phylum as they are assigned in the GTDB database (v207). The tree was generated with RAxML-NG. 1000 bootstraps were performed and used to calculate the transfer bootstrap expectation values (TBE). b The homotetramer of CC histone E3GZL0 from Methanothermus fervidus as predicted by AlphaFold2. CC histones contain a long α-helix on the C-terminus. Each residue is colored by its pLDDT value. c The homotetramer of RdgC histone D4GVY1 from Haloferax volcanii as predicted by AlphaFold2. RdgC histones form very similar tetramer structures to coiled-coil histones despite low sequence identity. Each residue is colored by its pLDDT value.The predicted tetramer structure of CC histones suggests that they bridge DNA by binding two separate DNA duplexes at opposing histone dimers (Supplementary Fig. 25a). To test this hypothesis, we purified the CC histone from M. fervidus, which we refer to as HMfC, and performed a DNA-bridging assay (Supplementary Fig. 25b). We observe an increase in DNA bridging activity with increasing HMfC concentrations, confirming that CC histones bridge DNA. This represents a major divergence from the DNA binding mode of conventional histones and highlights both the diversity of prokaryotic histones and the utility of AlphaFold in providing accurate preliminary insights into the DNA-binding properties of histones.RgdC histones are similar to CC histones in that they contain a large C-terminal helix which forms a tetrameric coiled-coil helical bundle (Fig. 6c). However, they share only 15 to 20% sequence identity with CC histones. RdgC histones are present in certain Bacillus and Halobacteria species and are encoded within multi-gene operons. The gene for the RdgC histone is always the first gene in its operon, followed by an RdgC-like protein and an unkown transmembrane (TM) protein (Supplementary Fig. 29). The conservation of this operon structure across different organisms suggests functional coupling among these proteins. The RdgC-like protein is structurally similar to RdgC, yet they share only 18% sequence identity (Supplementary Fig. 30). RdgC is found in Proteobacteria and forms a ring structure as a dimer, through which it might bind DNA41. Functionally, it is involved in recombination and is thought to modulate RecA activity42. The RdgC-like protein differs from RdgC as it contains two additional small domains, one on the C-terminus and the other on the N-terminus. The TM protein is found exclusively within the context of the RdgC histone. It consists of 4 domains: a winged helix domain, two unknown domains, and a transmembrane domain (Supplementary Fig. 31). Identifying functional residues in RdgC histones is challenging due to significant variability in sequences across species (Supplementary Fig. 32). One of the few conserved DNA binding residues is K34, which corresponds to the DNA binding residue K23 in the nucleosomal HMM logo (Supplementary Fig. 10). Although the sequence variation is high, the α1 face is always positively charged, suggesting they all bind DNA. Transcriptome data from H. volcanii and Bacillus cereus strain ATCC 10987 show low expression of the RdgC histone, ranking it within the top 14 to 37% of highest expressed genes (Supplementary Figs. 12a, b, 33).IHF-related histonesGenes encoding nucleoid-associated proteins are generally organized as single-gene operons43. This pattern holds for model histones HMfA, HMfB, HTkA, and HTkB, as well as for the previously discussed α3 and CC histones. However, as described earlier, some histones consistently occur within multi-gene operons across different organisms. One such example is what we refer to as IHF-related or IHF histones (Fig. 7a). IHF histones are scarcely found in bacterial metagenomes, with only 19 IHF histone homologs in the UniProt database. Characterized by a structured C-terminal tail that dimerizes, IHF histones likely function as dimers as AlphaFold does not confidently predict larger homo-oligomer structures. They appear in metagenomes from the phyla Omnitrophota, Wallbacteria, CG03, and Elusimicrobiota. In all cases, a gene encoding an integration host factor-like (IHF-like) protein is present within the same operon as the genes for IHF histones, hence the name ‘IHF histones’ (Fig. 7b and c). The conserved co-occurrence of these proteins suggests that they are associated via an unknown function.Fig. 7: IHF histones are functionally related to IHF.a The homodimer of IHF histone A0A358AGI2 from Candidatus Omnitrophica as predicted by AlphaFold2. Each residue is colored by its pLDDT value. b Gene cluster comparison of bacterial metagenomes that contain the IHF histone. The organism and its genome ID are noted on the left. The IHF histone, IHF-like, and topoisomerase VI-like genes are colored green, orange, and blue respectively. c The monomer of IHF-like A0A358AGI6 from Candidatus Omnitrophica as predicted by AlphaFold2. Each residue is colored by its pLDDT value. Residue R45 is highlighted in green. d The RxTxxxxD motif and the “sprocket” R32 of IHF histone A0A358AGI2. R32, R65, T67, and D72 relate to R24, R57, T59, and D64 in the IHF histone HMM logo (Supplementary Fig. 34).Both IHF histones and IHF-like proteins exhibit low sequence similarity across organisms. The most strongly conserved residues in the histone fold are the RxTxxxxD motif and “the sprocket” R24 (Fig. 7d and Supplementary Fig. 34). The typical residues in the α1 helix facilitating DNA binding are not conserved, except the hydrophobic residues involved in the packing of the dimer. Thus, it remains unclear whether IHF histones bind DNA similarly to nucleosomal histones. The C-tail contains two α-helices that dimerize in a handshake motif. As the tail lacks strongly conserved positively charged residues, it likely does not bind to DNA. The IHF-like protein shows weak sequence similarity to both HUα/β and IHFα/β. However, the IHF-like protein might function similarly to IHFβ based on the strong conservation of R45, a residue conserved in IHFβ but not in HU or IHFα (Supplementary Fig. 35)44; Furthermore, we propose that the IHF-like protein can bend DNA, similar to IHF, based on the conservation of intercalating and DNA binding residues on the beta arms (Supplementary Fig. 36).

Hot Topics

Related Articles