Research data articles within Nature Communications

Featured

  • Article
    | Open Access

    Extracting scientific data from published research is a complex task required specialised tools. Here the authors present a scheme based on large language models to automatise the retrieval of information from text in a flexible and accessible manner.

    • John Dagdelen
    • , Alexander Dunn
    •  & Anubhav Jain
  • Article
    | Open Access

    In this work, the authors report NMR lipids Databank to promote decentralised sharing of biomolecular molecular dynamics (MD) simulation data with an overlay design. Programmatic access enables analyses of rare phenomena and advances the training of machine learning models.

    • Anne M. Kiirikki
    • , Hanne S. Antila
    •  & O. H. Samuli Ollila
  • Article
    | Open Access

    Authors of scientific papers are generally discouraged from citing works that had no direct influence on their research. This paper uses simulations to show that such rhetorical citations may have underappreciated effects on the scientific community, such as deconcentrating attention away from already highly-cited papers.

    • Honglin Bao
    •  & Misha Teplitskiy
  • Article
    | Open Access

    Accurately benchmarking small variant calling accuracy is critical for the continued improvement of human genome sequencing. Here, the authors show that current approaches are biased towards certain variant representations and develop a new approach to ensure consistent and accurate benchmarking, regardless of the original variant representations.

    • Tim Dunn
    •  & Satish Narayanasamy
  • Article
    | Open Access

    Over their careers, medicinal chemists develop a gut feeling for what is a promising molecule. Here, the authors use machine learning models to learn this intuition and show that it can be successfully applied in several drug discovery scenarios.

    • Oh-Hyeon Choung
    • , Riccardo Vianello
    •  & José Jiménez-Luna
  • Article
    | Open Access

    Rare Mendelian disorders pose a major diagnostic challenge, but evaluation of automated tools that aim to uncover causal genes tools is limited. Here, the authors present a computational pipeline that simulates realistic clinical datasets to address this deficit.

    • Emily Alsentzer
    • , Samuel G. Finlayson
    •  & Isaac S. Kohane
  • Article
    | Open Access

    Aberrant coagulation and thrombosis are associated with severe SARS-CoV-2 infection. Here, the authors show that the E protein are associated with coagulation disorders in COVID-19 patients and could directly enhance platelet activation and thrombosis through a CD36/p38 MAPK/NF-kB signaling axis.

    • Zihan Tang
    • , Yanyan Xu
    •  & Tingting Liu
  • Article
    | Open Access

    Linking microscale cellular structures to macroscale features of the brain is required to fully understand its structure and function. Here, the authors present a resource which combines multi-contrast microscopy and MRI of a single whole macaque brain to facilitate multimodal analyses.

    • Amy F. D. Howard
    • , Istvan N. Huszar
    •  & Karla L. Miller
  • Comment
    | Open Access

    While there are a growing number of human pluripotent stem cell repositories, genetic diversity remains limited in most collections and studies. Here, we discuss the importance of incorporating diverse ancestries in these models to improve equity and accelerate biological discovery.

    • Sulagna Ghosh
    • , Ralda Nehme
    •  & Lindy E. Barrett
  • Article
    | Open Access

    There is a broad range of research available on the relationship between food security and mental health. Here the authors carry out a systematic mapping of evidence on food security and nutrition related to mental health and identifies trends in themes, setting, and study design over the 20 year period studied.

    • Thalia M. Sparling
    • , Megan Deeney
    •  & Suneetha Kadiyala
  • Article
    | Open Access

    This paper describes the ‘4DN Data Portal’ that hosts data generated by the 4D Nucleome network, including Hi-C and other chromatin conformation capture assays, as well as various sequencing-based and imaging-based assays. Raw data have been uniformly processed to increase comparability and the portal is implemented with visualization tools to browse the data without download.

    • Sarah B. Reiff
    • , Andrew J. Schroeder
    •  & Peter J. Park
  • Article
    | Open Access

    Transparent data sharing is central to scientific progress, but limited for human sequencing data because of patient privacy concerns. Here, the authors propose an approach that removes certain types of genetic information in sequencing data, without affecting count-based downstream analyses.

    • Christoph Ziegenhain
    •  & Rickard Sandberg
  • Article
    | Open Access

    Tree rings are a crucial archive for Common Era climate reconstructions, but the degree to which methodological decisions influence outcomes is not well known. Here, the authors show how different approaches taken by 15 different groups influence the ensemble temperature reconstruction from the same data.

    • Ulf Büntgen
    • , Kathy Allen
    •  & Jan Esper
  • Article
    | Open Access

    Forecasts of COVID-19 mortality have been critical inputs into a range of policies, and decision-makers need information about their predictive performance. Here, the authors gather a panel of global epidemiological models and assess their predictive performance across time and space.

    • Joseph Friedman
    • , Patrick Liu
    •  & Emmanuela Gakidou
  • Article
    | Open Access

    Sarcomas are morphologically heterogeneous tumours rendering their classification challenging. Here the authors developed a classifier using DNA methylation data from several soft tissue and bone sarcoma subtypes, which has the potential to improve classification for research and clinical purposes.

    • Christian Koelsche
    • , Daniel Schrimpf
    •  & Andreas von Deimling
  • Article
    | Open Access

    Working with cancer genomes from multiple projects can increase investigative power, but quality of sequences can vary. Here, the authors present a framework for comparing whole genome sequencing quality to help researchers guide downstream analyses and exclude poor quality samples.

    • Justin P. Whalley
    • , Ivo Buchhalter
    •  & Ivo G. Gut
  • Article
    | Open Access

    With the generation of large pan-cancer whole-exome and whole-genome sequencing projects, a question remains about how comparable these datasets are. Here, using The Cancer Genome Atlas samples analysed as part of the Pan-Cancer Analysis of Whole Genomes project, the authors explore the concordance of mutations called by whole exome sequencing and whole genome sequencing techniques.

    • Matthew H. Bailey
    • , William U. Meyerson
    •  & Christian von Mering
  • Article
    | Open Access

    Schulz et al. systematically benchmark performance scaling with increasingly sophisticated prediction algorithms and with increasing sample size in reference machine-learning and biomedical datasets. Complicated nonlinear intervariable relationships remain largely inaccessible for predicting key phenotypes from typical brain scans.

    • Marc-Andre Schulz
    • , B. T. Thomas Yeo
    •  & Danilo Bzdok
  • Article
    | Open Access

    Soil organism biodiversity contributes to ecosystem function, but biodiversity and function have not been equivalently studied across the globe. Here the authors identify locations, environment types, and taxonomic groups for which there is currently a lack of biodiversity and ecosystem function data in the existing literature.

    • Carlos A. Guerra
    • , Anna Heintz-Buschart
    •  & Nico Eisenhauer
  • Perspective
    | Open Access

    Questions of causality are ubiquitous in Earth system sciences and beyond, yet correlation techniques still prevail. This Perspective provides an overview of causal inference methods, identifies promising applications and methodological challenges, and initiates a causality benchmark platform.

    • Jakob Runge
    • , Sebastian Bathiany
    •  & Jakob Zscheischler
  • Comment
    | Open Access

    In research studies, the need for additional samples to obtain sufficient statistical power has often to be balanced with the experimental costs. One approach to this end is to sequentially collect data until you have sufficient measurements, e.g., when the p-value drops below 0.05. I outline that this approach is common, yet that unadjusted sequential sampling leads to severe statistical issues, such as an inflated rate of false positive findings. As a consequence, the results of such studies are untrustworthy. I identify the statistical methods that can be implemented in order to account for sequential sampling.

    • Casper Albers
  • Article
    | Open Access

    Biomedical image analysis challenges have increased in the last ten years, but common practices have not been established yet. Here the authors analyze 150 recent challenges and demonstrate that outcome varies based on the metrics used and that limited information reporting hampers reproducibility.

    • Lena Maier-Hein
    • , Matthias Eisenmann
    •  & Annette Kopp-Schneider
  • Article
    | Open Access

    Data sharing is recognized as a way to promote scientific collaboration and reproducibility, but some are concerned over whether research based on shared data can achieve high impact. Here, the authors show that neuroimaging papers using shared data are no less likely to appear in top-ranked journals.

    • Michael P. Milham
    • , R. Cameron Craddock
    •  & Arno Klein