Triage with Random Forests: Machine Learning for Transient Classification

In the coming decade, astronomers plan to discover thousands of rare, poorly understood, exotic transients. However, telling these apart from the torrent of “normal” flashes quickly enough for useful follow up observations will pose a daunting challenge. Thankfully, new algorithms powered by machine learning techniques may be able to triage for us.

Transient Triage

Though it may appear tranquil and unchanging to the casual observer, if one looks carefully, they find that the night sky is actually crackling with small, slow flashes. Thanks to modern cameras and computers, astronomers have grown increasingly attentive and now catch more of these flashes than ever before. The community flags about 20,000 so-called “transients” each year, and the rate is only expected to grow in the next decade.

A large number of astronomical events, such as thermonuclear and core collapse supernovae, produce roughly similar-looking flashes, so simply spotting one does not reveal much useful information. To better study the underlying physics powering each transient, astronomers must revisit each with different types of detectors. Unfortunately, the staggering pace of discovery is too fast to thoroughly follow up on every transient. Faced with finite telescope time, astronomers need to play a constant game of triage: which transients could uncover something interesting with additional follow up, and which are just more run-of-the-mill supernovae that we can allow to fade unwatched without worry of missing something exciting? An increasingly promising way to decide is to cede the choice to a machine learning algorithm.

FLEET and the Forest

A 2D scatterplot where each point represents a single archival transient. The X axis marks the probability assigned by FLEET 2.0 of a transient corresponding to a superluminous supernova, and the Y axis marks the same but from FLEET 1.0. True known superluminous supernovae are shown in blue, and in general, they either lie in the upper right corner, or further along the X axis than Y.

A comparison of the old FLEET 1.0 algorithm and the improved FLEET 2.0. Here, both random forests were asked to consider archived observations of many previous transients. FLEET 2.0 generally outperformed its predecessor and uncovered tens of transients that may have actually been superluminous supernovae but went undiagnosed before fading. [Gomez et al. 2023a]

A pair of recent articles in the Astrophysical Journal led by Sebastian Gomez (Space Telescope Science Institute) details the performance and recent upgrades of one such algorithm. Named “Finding Luminous and Exotic Extragalactic Transients,” or FLEET, this random-forest classifier takes in the first few days of observations of a transient and metadata about its host galaxy, then outputs the probability that the transient is a certain type of astronomical event.

Gomez and collaborators were particularly interested in two types of rare explosions: superluminous supernovae and tidal disruption events. The community has only ever observed a handful of each, though more are likely hiding in the constant stream of transient discoveries. By training FLEET to latch onto subtle differences between transients and quickly extract the underlying event, the team could prioritize follow-up resources to target promising candidates and expand their so-far sparse catalogs.

A histogram with redshift on the X axis and number of TDEs/year on the Y. Below a redshift of 1, LSST should find 10^4 TDEs/year, of which FLEET would recover roughly 1,000.

They expected number of tidal disruption events the Vera C. Rubin Observatory will observe each year during its LSST program, and the number of those which FLEET is expected to confidently flag as a candidate worthy of follow up. Note that currently astronomers have observed fewer than 100 tidal disruption events. [Gomez et al. 2023a]

Since its original release in 2020, FLEET is responsible for flagging 41% of all recorded superluminous supernovae. Even more exciting than its previously impressive performance, however, is its future potential. The team made sure that their algorithm could plug into data streams of future surveys, like the upcoming Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) and Roman High Latitude Time Domain Survey, the former of which is expected to observe, but not immediately recognize, up to 10,000 tidal disruption events alone each year. Using FLEET, the community could extract up to 2,000 of these events each year for further study. Considering that our current understanding of these chaotic processes is built on fewer than 100 observations, this would revolutionize the field in ways we can’t yet predict.

Citations

“Identifying Tidal Disruption Events with an Expansion of the FLEET Machine-learning Algorithm,” Sebastian Gomez et al 2023 ApJ 949 113. doi:10.3847/1538-4357/acc535

“The First Two Years of FLEET: An Active Search for Superluminous Supernovae,” Sebastian Gomez et al 2023 ApJ 949 114. doi:10.3847/1538-4357/acc536