Epoch 8️⃣ : This week in ML (+ Bioinformatics 🧬 and Astronomy 🌌)
Generality in Transformers, VSF, MedT, Earth has a new layer 🤔
The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption. In this paper, Sharan Narang, Hyung Won Chung, et al. comprehensively evaluate many of these modifications in a shared experimental setting that covers most of the common uses of the Transformer in natural language processing. Surprisingly, they find that most modifications do not meaningfully improve performance. Link to the code.
Robotic fabric manipulation has applications in home robotics, textiles, senior care, and surgery. Existing fabric manipulation techniques, however, are designed for specific tasks, making it difficult to generalize across different but related tasks. Ryan Hoque, Daniel Seita, et al. extend their earlier work on VisuoSpatial Foresight (VSF), which learns visual dynamics on domain randomized RGB images and depth maps simultaneously and completely in simulation. In this earlier work, they evaluated VSF on multi-step fabric smoothing and folding tasks against 5 baseline methods in simulation and on the da Vinci Research Kit (dVRK) surgical robot without any demonstrations at train or test time. Link to the code.
ML + Bioinformatics 🧬
Jeya Maria Jose Valanarasu et al. explore Transformer-based solutions and study the feasibility of using transformer-based network architectures for medical image segmentation tasks. The majority of existing Transformer-based network architectures proposed for vision applications require large-scale datasets to train properly. However, compared to the datasets for vision applications, for medical imaging the number of data samples is relatively low, making it difficult to efficiently train transformers for medical applications. To this end, they propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module. Furthermore, to train the model effectively on medical images, they propose a Local-Global training strategy (LoGo) which further improves the performance. Specifically, we operate on the whole image and patches to learn global and local features, respectively. Link to the code.
COVID-19 Prognosis via Self-Supervised Representation Learning and Multi-Image Prediction
Due to the relative scarcity of COVID-19 patient data, existing solutions leverage supervised pretraining on related non-COVID images, but this is limited by the differences between the pretraining data and the target COVID-19 patient data. In this paper, Anuroop Sriram, Matthew Muckley, et al. use self-supervised learning based on the momentum contrast (MoCo) method in the pretraining phase to learn more general image representations to use for downstream tasks. Link to the code.
Astroinformatics 🌌
Scientists Detect Signs of a Hidden Structure Inside Earth's Core
Traditionally we've been taught the Earth has four main layers: the crust, the mantle, the outer core, and the inner core. Our knowledge of what lies beneath Earth's crust has been inferred mostly from what volcanoes have divulged and seismic waves have whispered. The authors used recent travel time data from the International Seismological Centre in conjunction with the Neighborhood Algorithm (NA) derivative-free direct-search algorithm, to provide a robust means of testing the idea that Earth's inner core may have two distinct layers, through an examination of an ensemble of models that satisfactorily fit the data.
A novel stellar spectrum denoising method based on deep Bayesian modeling
Spectrum denoising is an important procedure for large-scale spectroscopical surveys. This work by Xin Kang, Shiyuan He, Yanxia Zhang proposes a novel stellar spectrum denoising method based on deep Bayesian modeling. The construction of our model includes a prior distribution for each stellar subclass, a spectrum generator, and a flow-based noise model. This method takes into account the noise correlation structure, and it is not susceptible to strong sky emission lines and cosmic rays. Moreover, it is able to naturally handle spectra with missing flux values without ad-hoc imputation.