DNA RESEARCH

Towards molecular evolutionary epigenomics with an expanded nucleotide code involving methylated bases
Yoshida S, Uchiyama I, Fukuyo M, Kato M, Rao DN, Konno M, Fujiwara SI, Azuma T, Kobayashi I and Kishino H
In molecular evolution analyses, genomic DNA sequence information is usually represented in the form of 4 bases (ATGC). However, research since the turn of the century has revealed the importance of epigenetic genome modifications, such as DNA base methylation, which can now be decoded using advanced sequence technologies. Here we provide an integrated framework for analyzing molecular evolution of nucleotide substitution, methylation, and demethylation using an expanded nucleotide code that incorporates different types of methylated bases. As a first attempt, we analysed substitution rates between bases, both unmethylated and methylated ones. As the model methylomes, we chose those of Helicobacter pylori, a unicellular bacterium with the largest known repertoire of sequence-specific DNA methyltransferases. We found that the demethylation rates are remarkably high while the methylation rates are comparable with the substitution rates between unmethylated bases. We found that the ribosomal proteins known for sequence conservation showed high methylation and demethylation frequencies, whereas the genes for DNA methyltransferases themselves showed low methylation and demethylation frequencies compared to base substitution. This study represents the first step toward molecular evolutionary epigenomics, which, we expect, would contribute to understanding epigenome evolution.
Localization of the origin of transfer for Salmonella genomic island 4 from Salmonella enterica serovar I 4,[5],12:i:
Neupane DP, Bearson BL and Bearson SMD
Salmonella enterica serovar I 4,[5],12:i:- (serovar I 4,[5],12:i:-) is one of the most frequent multidrug-resistant (MDR) Salmonella serovars associated with food-animal production globally, and strains often contain Salmonella genomic island-4 (SGI-4), an integrative conjugative element (ICE) encoding metal tolerance for copper, silver, and arsenic. Horizontal gene transfer (HGT) of SGI-4 from serovar I 4,[5],12:i:- to recipient bacteria results in enhanced metal tolerance for the transconjugants; however, the origin of transfer (oriT) for SGI-4 mobilization is unknown. In this study, the oriT within SGI-4 of MDR serovar I 4,[5],12:i:- strain USDA15WA-1 was identified by (i) cloning an internal region of SGI-4 into a non-mobilizable plasmid and demonstrating HGT to a bacterial recipient, and (ii) deleting the predicted oriT region of SGI-4 from strain USDA15WA-1 and abolishing SGI-4 transfer. Sequence similarity to oriTSGI-4 was identified in other Enterobacteriaceae, and conjugation of SGI-4 occurred from USDA15WA-1 to Salmonella serovars from Serogroups C-E as well as Escherichia coli and Citrobacter. Localization of the SGI-4 oriT enhances our understanding of a DNA region involved in HGT of an ICE in a frequent MDR Salmonella serovar, thereby providing a model to investigate HGT of SGI-4 and dissemination of metal tolerance genes in the food-animal production environment.
Full-length hybrid transcriptome of the olfactory rosette in Senegalese sole (Solea senegalensis): an essential genomic resource for improving reproduction on farms
Torres-Sabino D, Blanco-Hortas A, Villamayor PR, Rasines I, Martín I, Bouza C, Robledo D and Martínez P
Senegalese sole is a promising European aquaculture species whose main challenge is that captive-born males (F1) are unable to reproduce in farms, hindering breeding programs. Chemical communication through the olfactory system is hypothesized to stem this issue. Although significant advancement in genomic resources has been made recently, scarce information exists on the genomic basis of olfaction, a special sensory system for demersal species like flatfish, which could play a prominent role in reproduction, social and environmental interactions. A full-length transcriptome of the olfactory rosettes including females, males, juveniles and adults, of both F1 and wild origins, was generated at the isoform-level by combining Oxford Nanopore long-read and Illumina short-read sequencing. A total of 20,670 transcripts actively expressed were identified: 13,941 known transcripts, 5,758 novel transcripts from known genes, and 971 from novel genes. Given the important role of olfaction in reproductive behaviour, we comparatively examined the expression and functional enrichment of the olfactory receptor gene families (OlfC, OR, ORA, and TAAR). Our comprehensive olfactory transcriptome of Senegalese sole provides a foundation for delving into the functional basis of this complex organ in teleost and flatfish. Furthermore, it provides a valuable resource for addressing reproductive management challenges in Senegalese sole aquaculture.
MedakaBase as a unified genomic resource platform for medaka fish biology
Morikami K, Tanizawa Y, Yagura M, Sakamoto M, Kawamoto S, Nakamura Y, Yamaguchi K, Shigenobu S, Naruse K, Ansai S and Kuraku S
Medaka, a group of small, mostly freshwater fishes in the teleost order Beloniformes, includes the rice fish Oryzias latipes, a useful model organism studied in diverse biological fields. Chromosome-scale genome sequences of the Hd-rR strain of this species were obtained in 2007, and its improved version has facilitated various genome-wide studies. However, despite its widespread utility, omics data for O. latipes are dispersed across various public databases and lack a unified platform. To address this, the medaka section of the National Bioresource Project (NBRP) of Japan established a genome informatics team in 2022 tasked with providing various in silico solutions for bench biologists. This initiative led to the launch of MedakaBase (https://medakabase.nbrp.jp), a web server that enables gene-oriented analysis including exhaustive sequence similarity searches. MedakaBase also provides on-demand browsing of diverse genome-wide datasets, including tissue-specific transcriptomes and intraspecific genomic variations, integrated with gene models from different sources. Additionally, the platform offers gene models optimized for single-cell transcriptome analysis, which often requires coverage of the 3' untranslated region (UTR) of transcripts. Currently, MedakaBase provides genome-wide data for seven Oryzias species, including original data for O. mekongensis and O. luzonensis produced by the NBRP team. This article outlines technical details behind the data provided by MedakaBase.
Genome assembly and insights into globally invasive Red-vented Bulbul (Pycnonotus cafer)
Puthumana MA, Bisht MS, Singh M and Sharma VK
The Red-vented Bulbul (Pycnonotus cafer) of the Pycnonotidae family is one of the most invasive tropical passerine bird species. We accomplished the genome and transcriptome sequencing of P. cafer to explore the genomic basis of invasiveness and assembled the genome size of 1.03 Gb and 15,533 protein-coding genes with an N50 of 3.04 Mb and 97.2% BUSCO completeness. Our study constructed the mitogenome and 18S rRNA marker gene of P. cafer for the first time. Further, we investigated the demographic history and identified recent genetic bottlenecks the species experienced. We established the phylogenetic position of P. cafer and examined the gene family evolution along with orthologous gene clustering to provide clues on the invasive characteristics of P. cafer. Our study thus serves as a significant resource for future studies in invasion genomics and the possible management of this bird species in alien ranges.
Chromosome-scale genomes of two wild flowering cherries (Cerasus itosakura and Cerasus jamasakura) provide insights into structural evolution in Cerasus
Fujiwara K, Toyoda A, Katsuki T, Sato Y, Biswa BB, Kishida T, Tsuruta M, Nakamura Y, Mochizuki T, Kimura N, Kawamoto S, Ohta T, Nonomura KI, Niki H, Yano H, Umehara K, Suzuki C and Koide T
Flowering cherries (genus Cerasus) are iconic trees in Japan, celebrated for their cultural and ecological significance. Despite their prominence, high-quality genomic resources for wild Cerasus species have been limited. Here, we report chromosome-level genome assemblies of two representative Japanese cherries: Cerasus itosakura, a progenitor of the widely cultivated C. ×yedoensis "Somei-yoshino," and Cerasus jamasakura, a traditional popular wild species endemic to Japan. Using deep PacBio long-read and Illumina short-read sequencing, combined with reference-guided scaffolding based on near-complete C. speciosa genome, we generated assemblies of 259.1 Mbp (C. itosakura) and 312.6 Mbp (C. jamasakura), with both >98% BUSCO completeness. Consistent with their natural histories, C. itosakura showed low heterozygosity, while C. jamasakura displayed high genomic diversity. Comparative genomic analyses revealed structural variations, including large chromosomal inversions. Notably, the availability of both the previously published C. speciosa genome and our new C. itosakura genome enabled the reconstruction of proxy haplotypes for both parental lineages of "Somei-yoshino." Comparison with the phased genome of "Somei-yoshino" revealed genomic discrepancies, suggesting that the cultivar may have arisen from genetically distinct or admixed individuals, and may also reflect intraspecific diversity. Our results offer genomic foundations for evolutionary and breeding studies in Cerasus and Prunus.
Life inside a bag: multiomics insights into the bagworm species Eumeta crameri
Chakraborty A, Mahajan S, Prasoodanan P K V, Khamkar AS and Sharma VK
Bagworms are commonly known for the well-organized case or bag surrounding them constructed using their silk and plant materials. To understand the genetic basis of these unique characteristics in bagworms, we performed multiomics analyses of a bagworm species, Eumeta crameri. The genome and transcriptome sequencing of E. crameri were used to construct the nuclear genome with a size of 668.2 Mb, N50 value of 6.6 Mb, and 13,554 coding genes, which was further assembled into 31 pseudochromosomes. The mitochondrial genome had a size of 15.6 Kb. We established the phylogenetic position of E. crameri with respect to 54 other insect species. The comparative analyses of E. crameri with other Lepidopterans revealed the adaptive evolution of genes related to primary metabolic pathways, defense, molting, and metamorphosis, and silk formation in the bagworm species. We also showed the ultrafine nature of the E. crameri silk fibres. Further, we performed the gut microbiome sequencing for E. crameri and constructed a gut microbial gene catalogue, which revealed the unique composition of the gut microbiome and its significance for host metabolism and defense. Together, the results provide multifaceted insights into the biological processes that support the well-organized holometabolous metamorphosis inside the bags of E. crameri.
Strain-level dissection of complex rhizoplane and soil bacterial communities using single-cell genomics and metagenomics
Kifushi M, Nishikawa Y, Hosokawa M, Anai T and Takeyama H
Root exudates shape root-associated microbial communities that differ from those in soil. Notably, specific microorganisms colonize the root surface (rhizoplane) and strongly associate with plants. Although retrieving microbial genomes from soil and root-associated environments remains challenging, single amplified genomes (SAGs) and metagenome-assembled genomes (MAGs) are essential for studying these microbiomes. This study compared SAGs and MAGs constructed from short-read metagenomes of the same soil samples to clarify their advantages and limitations in soil and root-associated microbiomes, and to deepen insights into microbial dynamics in rhizoplane. We demonstrated that SAGs are better suited than MAGs for expanding the microbial tree of life in soil and rhizoplane environments, due to their greater gene content, broader taxonomic coverage, and higher sequence resolution of quality genomes. Metagenomic analysis provided sufficient coverage in the rhizoplane but was limited in soil. Additionally, integrating SAGs with metagenomic reads enabled strain-level analysis of microbial dynamics in the rhizoplane. Furthermore, SAGs provided insights into plasmid-host associations and dynamics, which MAGs failed to capture. Our study highlights the effectiveness of single-cell genomics in expanding microbial genome catalogues in soil and rhizosphere environments. Integrating high-resolution SAGs with comprehensive rhizoplane metagenomes offers a robust approach to elucidating microbial dynamics around plant roots.
Patterns and drivers of genome-wide codon usage bias in the fungal order Sordariales
Hensen N, Hiltunen Thoréna M and Johannesson H
Here we present a study on amino acid composition, codon usage bias (CUB), and levels of selection driving codon usage in Sordariales fungi. We found that GC ending codons are used more often than AT ending codons in all Sordariales genomes, but the strength of CUB differs amongst families. The families Podosporaceae and Sordariaceae contain relatively low genome-wide levels of CUB, while the highest levels of CUB are found in Chaetomiaceae and the "BLLNS"-group, a monophyletic group of five other Sordariales families. Based on genomic clustering, we show that Podosporaceae and Sordariaceae are more similar to each other than either of them are to any of the other groups. Comparatively, the Chaetomiaceae and BLLNS show increased natural selection driving use of specific codons, resulting in higher genome-wide CUB. We hypothesize that the higher levels of CUB in Chaetomiaceae genomes might have been caused by ecological niche specialization, versus the more generalist nature of many Sordariaceae and Podosporaceae species.
Nuclear genome sequencing reveals the highly intron-rich architecture of the chlorarachniophyte alga Amorphochlora amoebiformis
Aoki D, Saiki H, Yamamoto K, Suzuki S and Hirakawa Y
Chlorarachniophyte algae possess complex plastids derived from endosymbiosis between a cercozoan protist and green alga. As evidence of this event, remnant nucleus of the endosymbiont, nucleomorph, is present in the plastid intermembrane space. Chlorarachniophytes are excellent models to study genome evolution via endosymbiosis. Although the three organelle genomes of mitochondrion, plastid, and nucleomorph have been sequenced in several chlorarachniophyte species, nuclear genome information is currently limited to Bigelowiella natans. To gain insights into the genome diversity and evolution of chlorarachniophytes, we sequenced the nuclear genome of another chlorarachniophyte, Amorphochlora amoebiformis. Its size is approximately 214 Mb, which is more than twice that of B. natans. Remarkably, three-quarters of the nuclear genome encodes spliceosomal introns, indicating its highly intron-rich structure compared to other known eukaryotic genomes. Single nucleotide polymorphism analysis revealed that A. amoebiformis possessed a diploid nuclear genome, unlike the haploid genome of B. natans. Additionally, we identified organellar DNA fragments within the nuclear genome, suggesting recent DNA migration from the three organelles to the nucleus. Overall, our findings reveal that chlorarachniophyte nuclear genomes differ substantially in size, structure, and ploidy across species, and provide evidence of ongoing endosymbiotic gene transfer.
Whole-genome sequencing of wild and ancestral Dura provides insight into the untapped genomic information of undomesticated oil palm (Elaeis guineensis Jacq.)
Aditama R, Siregar HA, Tanjung ZA, Dinarti D, Ardie SW, Suwarno WB, Suprianto E, Utomo C, Liwang T and Sudarsono S
Oil palm (Elaeis guineensis Jacq.) is a globally important crop, and its genetic improvements benefits from comprehensive genome sequencing. Here, we report the whole-genome sequencing and annotation of two key genetic resources: the wild (Eg-DCM) and ancestral (Eg-DBG) Dura accessions, using a combination of short- and long-read sequencing technologies. De novo assembly followed by polishing, proximity ligation, and reference-guided scaffolding yielded high-quality assemblies with ungapped lengths of 1.71 Gb (Eg-DBG) and 1.48 Gb (Eg-DCM). Eg-DCM and Eg-DBG genomes exhibited high completeness, with over 97% of Benchmarking Universal Single-Copy Orthologs (BUSCOs) recovered across the Eukaryota, Viridiplantae, and Embryophyta datasets. Repetitive elements, particularly retrotransposons, dominated both genomes, accounting for 46.10% of Eg-DBG and 43.85% of Eg-DCM. Gene prediction initially identified 61,256 (Eg-DBG) and 53,985 (Eg-DCM) genes, which were refined into high-confidence gene sets of 39,263 and 35,298, respectively. Additionally, 1,760 and 1,684 putative resistance (R) genes were identified in Eg-DCM and Eg-DBG, with similar class distributions. The five major R gene classes comprise KIN, RLK, RLP, CNL and CK. With further researches, the assembled whole-genome sequences and the annotated genes of Eg-DBG and Eg-DCM offer valuable insights into the untapped genomic information of undomesticated accessions, with implications for future breeding and crop improvement effort of oil palm.
The chromosome-level genome of Chinese indicine cattle breed provides insights into bovine adaptation and immunity
Ge F, Guo Y, Xu L, Low WY, Ma H, Li Q, Wang Z, Zhu B, Xu L, Gao X, Zhang L, Gao H, Li J and Chen Y
Genomic research is currently undergoing a paradigm shift from reliance on a single reference sequence to the use of breed-specific genomes. Chinese indicine cattle (Bos taurus indicus), characterized by their notable tick resistance and heat tolerance, display extensively genetic diversity than taurine. Here, we generated a chromosome level genome assembly of Chinese indicine cattle, achieving a contiguity N50 of 90.92 Mb and an overall size of 2.91Gb, utilizing PacBio HiFi sequencing complemented by Hi-C sequencing technology. The assembly is characterized by near-complete chromosomes, telomeres, and less gaps. Utilizing this highly quality assembly, we explored the phylogenetic relationship and speciation time. The gene family and selection signatures analyses indicated that candidate genes and biosynthetic pathways potentially contributing to disease immunity and thermotolerance of indicine cattle. Altogether, this study enriches the bovine pangenome repository and advances our understanding of the complex evolutionary patterns and distinctive adaptation traits of Chinese indicine cattle.
Diversity, evolution, and transcription of endogenous retroviruses in Chiroptera genomes
Zhou ZJ, Xiao Y, Fang J, Yao YX, Yang CH, Dacheux L, Luo DS, Qiu Y and Ge XY
Bats (Chiroptera) are a taxonomic group of immense biological and ecological importance. They are primary reservoirs and carriers of various zoonotic viruses. Endogenous retroviruses (ERVs) originate from ancient retroviruses invading the host, and ERV-derived sequences can function as regulatory elements which influence gene expression and contribute to both physiological and pathological processes. However, ERVs and ERV-like elements (ERVLEs) carried by bats have not been fully characterized. In this study, we systematically explored the ERVs in 61 bat species and identified 10,352 bat-ERVs and 5,884 bat-ERVLEs sequences, and these sequences covered 3 major virus genera and included 7 groups related to human ERVs in the subfamily Orthoretrovirinae. In particular, a relatively intact endogenous deltaretrovirus sequence was identified in Molossus molossus. Additionally, 358 bat-ERV and 33 bat-ERVLE were identified as recombinants. The integration time of bat-ERVs was estimated to be concentrated in the last 10 to 40 million years, indicating their role in shaping the bat genome during the long-term co-evolution of virus and host. Furthermore, carnivorous bats tended to have more relatively complete and younger ERVs compared to herbivorous bats. According to bat transcriptomes, we found that 1,385 bat-ERVs and 197 bat-ERVLEs had transcriptional potential in 20 different tissues of 25 bats, implying that bat-ERVs harboured actively expressed genes with potential functions. In summary, we comprehensively characterized bat-ERVs in terms of their evolution, types and potential functions, providing foundational data and a new perspective for further research on bat-ERVs.
Chromosome-scale genome assembly of Sauvagesia rhodoleuca (Ochnaceae) provides insights into its genome evolution and demographic history
Xiao TW, Wang XF, Wang ZF and Yan HF
Sauvagesia rhodoleuca is an endangered species endemic to southern China. Due to human activities, only 6 fragmented populations remain in Guangdong and Guangxi. Despite considerable conservation efforts, its demographic history and evolution remain poorly understood, particularly from a genomic perspective. To address this, we assembled a chromosome-scale genome of S. rhodoleuca using Nanopore long-read sequencing, DNA short-read sequencing, RNA-seq, and Hi-C data. A total of 290.37 Mb of assembled sequences, accounting for 99.76% of the genome, were successfully anchored to 19 pseudo-chromosomes, achieving a BUSCO completeness of 98.40% and a long terminal repeat assembly index of 17.28. Genome annotation identified 26,758 protein-coding genes and 369 tRNA genes. Demographic analysis revealed a sharp decline in the effective population size of S. rhodoleuca beginning approximately 1 million years ago. Whole-genome duplication (WGD) analysis revealed that S. rhodoleuca experienced a whole-genome triplication (WGT) followed by a more recent WGD after diverging from the Rhizophoraceae. Genes retained from WGT and WGD events played key roles in the development and survival of S. rhodoleuca, as indicated by Gene Ontology analysis. The high-quality genome of S. rhodoleuca provides insights into its genomic characteristics and evolutionary history, offering a valuable resource for conservation and genetic management.
Can classical statistics and deep learning converge on explainable, causally driven target discovery?
Chen L
Understanding the molecular causes of complex diseases remains one of the most pressing challenges in biomedicine. Despite large-scale genome-wide association studies mapping thousands of risk loci, identifying which genetic variants truly drive disease remains difficult. Traditional statistical genetics has laid a strong foundation for variant discovery, but it often struggles to capture nonlinear interactions and cannot fully integrate the breadth of the interconnected multi-omics data. In recent years, deep learning approaches have shown promise in bridging these gaps: modelling high-order genetic interactions, uncovering latent biological structure, and enabling multi-layered data integration. However, most current deep learning models for genomics remain exploratory in nature, and issues such as susceptibility to overfitting, difficulties in interpretability, and the general lack of standardized evaluation frameworks have limited their widespread adoption for genomics research. In this review, we explore how traditional statistical and deep learning methods can be applied to uncover causal mechanisms in complex disease. We critically compare these two frameworks for their advantages and limitations in detecting genetic associations and prioritizing causal associations. Towards the end, we propose a future direction centred around hybrid models that blend the scalability of deep learning with the inferential power of statistical genetics. Our goal is to guide researchers in developing next-generation computational tools to uncover the molecular basis of complex diseases and accelerate the translation of genetic findings into effective treatments.
Genome survey and evolutionary analysis of 8 Lamprotula species: SSR profiling, mitochondrial characterization, and population dynamics inference
Jiang M, Liu Q, Jiang C, Zhan M, Wen H, Shu F, Xie L, Liu T, Ren C, Tang W and Liu K
Freshwater bivalves are vital to aquatic ecosystems but face severe global threats. Understanding their genomic traits and evolution is crucial for effective conservation. This study conducted whole-genome sequencing on 8 Lamprotula species. These 8 species exhibited high genomic complexity, characterized by large genomes (1.89 to 2.65 Gb), high heterozygosity (>0.8), and high repeat content (>60%), estimated by k-mer analysis. Genome assemblies showed that L. caveata had the largest genome, while L. polysticta had the smallest. Furthermore, the assembled genome sizes of these 8 species exhibited an average increase of 22.58% compared to k-mer analysis estimates, largely due to their high heterozygosity. The mitochondrial genomes of these 8 species ranged in size from 15.69 kb to 17.13 kb, with GC contents varying from 36.36% to 40.77%. Phylogenetic analysis indicated early divergence of L. leai and L. caveata from the other 6 species. Pairwise Sequentially Markovian Coalescent analysis revealed population bottlenecks over the past million years, with L. rochechouarti showing more significant population size fluctuations during the Pleistocene Glacial Epoch. In summary, this study provides comprehensive genomic insights into 8 Lamprotula species, highlighting their high genomic complexity and evolutionary divergence, thereby establishing a crucial foundation for future conservation and genetic research efforts.
Satellite DNAs rising from the transposon graveyards
Šatović-Vukšić E, Majcen P and Plohl M
Repetitive DNA sequences, as transposable elements (TEs) and satellite DNA (satDNA) spread and diversify within host genomes, impacting genome biology in numerous ways. In the first part of this review, we emphasize the evolutionary importance of satDNAs and TEs, providing a short summary of their roles and the mechanisms by which they influence the structure and function of genomes. We also discuss the broad, complex, and extensive relationships between TEs and satDNAs. Following that, we bring together different mechanisms on the generation of satDNA from TE, as it has been demonstrated that almost any part of any type of TE can undergo tandemization and produce novel satDNAs. Importantly, we here present a hypothesis that would explain the existence of particular types of monomers, namely composite satDNA monomers which display multiple subsequent stretches of similarity to various TEs, for which the explanation was lacking so far. We propose that even highly shuffled and degraded TE remnants residing in heterochromatin 'TE graveyards' can give rise to new satDNA sequence monomers, transforming these genomic loci into DNA 'recycling yards'. Furthermore, we emphasize important evolutionary questions regarding the causes, mechanisms, and frequency of these occurrences.
Reference-based chromosome-scale assembly of Japanese barley (Hordeum vulgare ssp. vulgare) cultivar Hayakiso 2
Tanaka T, Haraguchi Y, Todoroki T, Saisho D, Abiko T and Kai H
Current advances in next-generation sequencing (NGS) technology and assembling programs permit construct chromosome-level genome assemblies in various plants. In contrast to resequencing, the genome sequences provide comprehensive annotation data useful for plant genetics and breeding. Herein, we constructed a reference-based genome assembly of winter barley (H. vulgare ssp. vulgare) cv. 'Hayakiso 2' using long and short read NGS data and barley reference genome sequences from 'Morex'. We constructed 'Hayakiso 2' genome sequences covering 4.3 Gbp with 55,477 genes. Comparative genomics revealed that 14,106 genes had orthologs to two barley data, wheat (A, B, and D homoeologs, respectively), and rice. From the gene ontology analysis, 2,494 orthologs against wheat and rice but not two barley contained agricultural important genes, such as 'response to biotic and abiotic stress' and 'metabolic process'. Phylogenetic analysis using 76 pangenome data indicated that 'Hayakiso 2' was clustered into Japanese-type genomes with unique alleles. 'Hayakiso 2' genome sequences showed known genes related to flowering and facilitated barley breeding through the development of various markers related to agronomically important alleles such as tolerance to various types of biotic and abiotic stress. Therefore, 'Hayakiso 2' genome sequences will be used for the further barley breeding.
Dynamic integration and evolutionary trajectory of endogenous IHHNV elements in crustacean genomes
Zhong X, Yuan J, Zhang X, Li S, Liu C, Si S, Hu J, Prachumwat A, Sritunyalucksana K and Li F
Endogenous viral elements (EVEs) serve as molecular fossils that record the ancient co-evolutionary arms race between viruses and their hosts. In this study, by analyzing 105 host crustacean genomes, we identified 252 infectious hypodermal and haematopoietic necrosis virus-derived EVEs (IHHNV-EVEs), which include 183 ancient and 6 recently inserted EVEs. These IHHNV-EVEs are widely distributed among Decapoda, Thoracica, and Isopoda, with some of them exhibiting a syntenic distribution relative to neighboring host sequences, suggesting that the IHHNV or its ancestor are potential pathogens of these species with a long-time dynamic interaction during the evolutionary history. An expansion of IHHNV-EVEs was observed in decapoda genomes, reflecting a reinforced arm race between decapoda and IHHNV. Notably, we found that nearly all recent IHHNV-EVEs were laboratory contaminants, except for a single authentic integration in Penaeus monodon that persists intact across 16 samples from the 2 populations. These temporal dynamics-ancient genomic stabilization versus modern colonization activity-highlight that EVEs serve as dual archives: historical records of past conflicts and active participants in current evolutionary battles. Our findings redefine viral genomic colonization as a continuum, where ancient EVE fixation coexists with persistent integration processes, providing new insights into host-virus co-evolutionary trajectories.
Nearly T2T, phased genome assemblies of corals reveal haplotype diversity and the evolutionary process of gene expansion
Takeuchi T, Suzuki Y, Shoguchi E, Fujie M, Kawamitsu M, Shinzato C, Satoh N and Myers EW
Gene family expansion illustrates a critical aspect of evolutionary adaptation. However, the mechanisms by which gene family expansions emerge and are maintained in the genome remain unclear. Here, we report de novo, nearly telomere-to-telomere (T2T), haplotype-phased genome assemblies of 2 coral species, Acropora tenuis and Acropora digitifera. By comparing haplotypes within a single individual and across species, we identified genomic regions spanning several megabases with highly disordered gene arrangements, termed non-syntenic regions (nSRs). In these nSRs, there are clusters of genes that emerged by lineage-specific gene family expansion. The gene repertoire within nSRs exhibits significant sequence diversity and distinct expression patterns, suggesting functional diversification. We propose that lineage-specific gene family expansion in nSRs occurs through recurrent tandem duplications mediated by non-allelic homologous recombination (NAHR) events, with nSRs serving as reservoirs for a diverse gene repertoire advantageous for survival. The nearly T2T-phased genomes provide new insights into the remarkable flexibility of genome organization and the evolution of gene family expansions.
Thin-diaPASEF: diaPASEF for maximizing proteome coverage in single-shot proteomics
Konno R, Ishikawa M, Nakajima D, Inukai K, Ohara O and Kawashima Y
Proteomics using mass spectrometry (MS) has significantly advanced, offering deep insights into complex proteomes. The timsTOF MS platform with its parallel accumulation-serial fragmentation (PASEF) technology has achieved high scan speeds and high-quality spectra. Bruker's timsTOF HT, which features TIMS-XR technology, offers an improved dynamic range and analysis depth, supporting high sample loadings. Moreover, various improvements to the data-independent acquisition method based on the PASEF technology (diaPASEF) have been reported. Despite these advancements, most high-level deep proteomic reports are based on the Orbitrap Astral and Orbitrap Exploris 480, and analytical systems using timsTOF MS still require improvement. Here, Bruker's timsTOF HT was used to validate and optimize key diaPASEF parameters, leading to the development of a Thin-diaPASEF method. This method provides a high quantitative accuracy and consistency. In our validation, 9,400 proteins were identified in a single shot from HEK cells (strictly controlled protein false discovery rate <1%), the highest number analysed by the timsTOF MS series using standard human cultured cells. Furthermore, by combining Thin-diaPASEF with an improved Lycopersicon esculentum lectin method, over 5,000 proteins were identified in a 24-sample/d analysis from the plasma, and we succeeded in constructing a system with high proteome coverage that can be used for biomarker discovery.