Epoch 1️⃣6️⃣: This week in ML (+ Bioinformatics 🧬 and Astronomy 🌌)

MLP-Mixer, Do you even need Attention ?, Scrambler Networks and much more ....

May 07, 2021

Abstract: Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper the authors show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. They present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-Mixer contains two types of layers: one with MLPs applied independently to image patches (i.e. "mixing" the per-location features), and one with MLPs applied across patches (i.e. "mixing" spatial information). Yannic Kilcher’s Video.

Do You Even Need Attention?

Abstract: The strong performance of vision transformers on image classification and other vision tasks is often attributed to the design of their multi-head attention layers. However, the extent to which attention is responsible for this strong performance remains unclear. In this short report, the author asks: is the attention layer even necessary? Specifically, what if we replace the attention layer in a vision transformer with a feed-forward layer applied over the patch dimension. The resulting architecture is simply a series of feed-forward layers applied over the patch and feature dimensions in an alternating fashion.

ML + Bioinformatics 🧬

Artificial Intelligence System Reduces False-Positive Findings in the Interpretation of Breast Ultrasound Exams
Ultrasound is an important imaging modality for the detection and characterization of breast cancer. Though consistently shown to detect mammographically occult cancers, especially in women with dense breasts, breast ultrasound has been noted to have high false-positive rates. In this work, the authors present an artificial intelligence (AI) system that achieves radiologist-level accuracy in identifying breast cancer in ultrasound images. On a test set consisting of 44,755 exams, the AI system achieved an area under the receiver operating characteristic curve (AUROC) of 0.976. In a reader study, the AI system achieved a higher AUROC than the average of ten board-certified breast radiologists (AUROC: 0.962 AI, 0.924±0.02 radiologists). With the help of the AI, radiologists decreased their false positive rates by 37.4% and reduced the number of requested biopsies by 27.8%, while maintaining the same level of sensitivity.
Interpreting Neural Networks for Biological Sequences by Learning Stochastic Masks

Sequence-based neural networks can learn to make accurate predictions from large biological datasets, but model interpretation remains challenging. Many existing feature attribution methods are optimized for continuous rather than discrete input patterns and assess individual feature importance in isolation, making them ill-suited for interpreting non-linear interactions in molecular sequences. Building on work in computer vision and natural language processing, the authors developed an approach based on deep generative modeling - Scrambler networks - wherein the most salient sequence positions are identified with learned input masks. Scramblers learn to generate Position-Specific Scoring Matrices (PSSMs) where unimportant nucleotides or residues are ‘scrambled’ by raising their entropy. They apply Scramblers to interpret the effects of genetic variants, uncover non-linear interactions between cis-regulatory elements, explain binding specificity for protein-protein interactions, and identify structural determinants of de novo designed proteins. They show that interpretation based on a generative model allows for efficient attribution across large datasets and results in high-quality explanations, often outperforming state-of-the-art methods.

Astroinformatics 🌌

Development of Convolutional Neural Networks for an Electron-Tracking Compton Camera
Electron-tracking Compton camera, which is a complete Compton camera with tracking Compton scattering electron by a gas micro time projection chamber, is expected to open up MeV gamma-ray astronomy. The technical challenge for achieving several degrees of the point spread function is the precise determination of the electron-recoil direction and the scattering position from track images. The authors attempted to reconstruct these parameters using convolutional neural networks. Two network models were designed to predict the recoil direction and the scattering position. These models marked 41 degrees of the angular resolution and 2.1 mm of the position resolution for 75 keV electron simulation data in Argon-based gas at 2 atm pressure.
Anomaly detection in Hyper Suprime-Cam galaxy images with generative adversarial networks
The problem of anomaly detection in astronomical surveys is becoming increasingly important as data sets grow in size. The authors present the results of an unsupervised anomaly detection method using a Wasserstein generative adversarial network (WGAN) on nearly one million optical galaxy images in the Hyper Suprime-Cam (HSC) survey. The WGAN learns to generate realistic HSC-like galaxies that follow the distribution of the data set; anomalous images are defined based on a poor reconstruction by the generator and outlying features learned by the discriminator. They find that the discriminator is more attuned to potentially interesting anomalies compared to the generator, so we use the discriminator-selected images to construct a high-anomaly sample of ∼13,000 objects. Code available here.