Epoch 1️⃣7️⃣: This week in ML (+ Bioinformatics 🧬 and Astronomy 🌌)

Neural Geometric Level of Detail, Guided Diffusion, datasheet for CheXpert

May 14, 2021

Neural signed distance functions (SDFs) are emerging as an effective representation for 3D shapes. SDFs encode 3D surfaces with a function of position that returns the closest distance to a surface. State-of-the-art methods typically encode the SDF with a large, fixed-size neural network to approximate complex shapes with implicit surfaces. The authors introduce an efficient neural representation that, for the first time, enables real-time rendering of high-fidelity neural SDFs, while achieving state-of-the-art geometry reconstruction quality. They represent implicit surfaces using an octree-based feature volume which adaptively fits shapes with multiple discrete levels of detail (LODs), and enables continuous LOD with SDF interpolation

Guided Diffusion

The authors show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. They achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, they further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for sample quality using gradients from a classifier. Code available here.

ML + Bioinformatics 🧬

Big Data: Astronomical or Genomical?
Genomics is a Big Data science and is going to get much bigger, very soon, but it is not known whether the needs of genomics will exceed other Big Data domains. Projecting to the year 2025, the authors compared genomics with three other major generators of Big Data: astronomy, YouTube, and Twitter. Their estimates show that genomics is a “four-headed beast”—it is either on par with or the most demanding of the domains analyzed here in terms of data acquisition, storage, distribution, and analysis. They discuss aspects of new technologies that will need to be developed to rise up and meet the computational challenges that genomics poses for the near future. Now is the time for concerted, community-wide planning for the “genomical” challenges of the next decade.
Structured dataset documentation: a datasheet for CheXpert
Abstract: Billions of X-ray images are taken worldwide each year. Machine learning, and deep learning in particular, has shown potential to help radiologists triage and diagnose images. Following the structured format of Datasheets for Datasets, this paper expands on the original CheXpert paper and other sources to show the critical role played by radiologists in the creation of reliable labels and to describe the different aspects of the dataset composition in detail. Such structured documentation intends to increase the awareness in the machine learning and medical communities of the strengths, applications, and evolution of CheXpert, thereby advancing the field of medical image analysis. Another objective of this paper is to put forward this dataset datasheet as an example to the community of how to create detailed and structured descriptions of datasets. The authors believe that clearly documenting the creation process, the contents, and applications of datasets accelerates the creation of useful and reliable models.

Astroinformatics 🌌

Machine Learning the Fates of Dark Matter Subhalos: A Fuzzy Crystal Ball
Abstract: The evolution of a dark matter halo in a dark matter only simulation is governed purely by Newtonian gravity, making a clean testbed to determine what halo properties drive its fate.Using machine learning, the authors predict the survival, mass loss, final position, and merging time of subhalos within a cosmological N-body simulation, focusing on what instantaneous initial features of the halo, interaction, and environment matter most. Survival is well predicted, with their model achieving 96.5% accuracy using only 3 model inputs from the initial interaction.However, the mass loss, final location, and merging times are much more stochastic processes, with significant margins of error between the true and predicted quantities for much of our sample.
A one-armed CNN for exoplanet detection from lightcurves
Abstract: The authors propose Genesis, a one-armed simplified Convolutional Neural Network (CNN) for exoplanet detection, and compare it to the more complex, two-armed CNN called Astronet. Furthermore, they examined how Monte Carlo cross-validation affects the estimation of the exoplanet detection performance. They increased the input resolution twofold to assess its effect on performance. They conclude by arguing that further exploration of shallower CNN architectures may be beneficial in order to improve the generalizability of CNN-based exoplanet detection across surveys.