Using Machine Learning to Find Planets


How do we find the signals of exoplanets lurking in the vast quantity of data that comes out of a mission like Kepler or the Transiting Exoplanet Survey Satellite (TESS)? A new study has some suggestions for how best to get computers to do the heavy lifting for us.

Managing a Mess of Data

false positives

Two common false positives — grazing eclipsing binaries (left) and background eclipsing binaries (right) — can mimic the signal of a transiting planet. [NASA/Ames Research Center]

Recent years have seen a boom in exoplanet research — in large part due to the enormous data sets produced by transiting exoplanet missions like Kepler and, now, TESS. But the >3,000 confirmed Kepler planets weren’t all just magically apparent in the data! Instead, the discovery of planets is the result of careful classification of transit-like signals amid a sea of false positives from things like stellar eclipses and instrumental noise.

Given the number of light curves that need classifying, we can use any automated help we can get. Enter machine learning, a process by which computers can be trained to identify patterns and make decisions. Using a tool called deep learning, scientists have already shown that machines can do a pretty good job of automatically classifying Kepler transit signals as either exoplanets or false positives. But can we do even better?

light curves and centroids

Local (left) and global (right) views of the light curves (cyan) and centroids (maroon) for an example confirmed planet (top) and background eclipsing binary (bottom). Click for a closer look. [Ansdell et al. 2018]

The recent 2018 NASA Frontier Development Lab provided an excellent opportunity to find out. This eight-week research incubator was aimed at applying cutting-edge machine-learning algorithms to challenges in the space sciences. As part of this lab, two machine-learning experts were paired with two space-science researchers to try to improve machine-learning models for exoplanet transit classification. The results are presented in a new publication led by scientist Megan Ansdell (Center for Integrative Planetary Science, UC Berkeley).

Insider Knowledge

Ansdell and collaborators started with a basic machine-learning model that classified signals based on straightforward local and global views of the light curves. To improve upon it, they added scientific domain knowledge — information or insight that might not be generally known, but can be provided by a domain expert. 

Exonet recall and precision

Recall (top; the fraction of true planets recovered) and precision (bottom; the fraction of classifications that are correct) of the Exonet model, as a function of MES, a measure of the signal-to-noise of candidate transits. [Ansdell et al. 2018]

In particular, the authors used their knowledge of what types of false positives might come up. To help distinguish background eclipsing binary stars from planet transit signals, the team included data with each light curve showing how the line centroids — the pixel positions of the center of light — moved over time. To help the model identify false positives like giant-star eclipsing binaries, the authors fed in known stellar parameters with the light curves.

More Planets to Come

How did Ansdell and collaborators do? Using their modified model, “Exonet”, a computer can classify a Kepler data set with 97.5% accuracy and 98% average precision. That means that 97.5% of its classifications — exoplanet or false-positive — are correct, and an average of 98% of transits classified as planets are true planets. Not bad, for a machine!

One of the added benefits of the authors’ model is that it is ideal for generalization — for example, from Kepler to TESS data. The authors are currently working on a study using Exonet to classify simulated TESS data. And yesterday’s first public data release from TESS has provided plenty of fresh data to work with in the future!


“Scientific Domain Knowledge Improves Exoplanet Transit Classification with Deep Learning,” Megan Ansdell et al 2018 ApJL 869 L7. doi:10.3847/2041-8213/aaf23b