Filling In the Blanks with Machine Learning

The Gaia satellite is making the highest precision measurements yet of stellar movement. Specifically, it is measuring the positions, velocities, and parallaxes of stars across the sky. This is incredibly useful information to have, but it doesn’t provide a complete picture of a star’s motion. For instance, how do stars move along our line of sight?

A map showing the predicted paths of 40,000 stars over the next 400,000 years. The predicted paths are based on Gaia’s early third data release. The plotted stars are within 100 parsecs of the solar system. [ESA/Gaia/DPAC]

Not On All Dimensions

Currently, Gaia returns “5D astrometry” for most of the stars it observes, which consists of two position coordinates, two velocities, and a parallax (apparent change in position as the Earth orbits the Sun). A lot of science has been done with these data already, such as probing the origins of the stellar streams surrounding the Milky Way.

However, a majority of the astrometry provided by Gaia is restricted to the plane of the sky, which means we’re missing how stars move along our line of sight — that is, whether they’re moving towards or away from us. This puts some fascinating science out of reach, like mapping out subtle stellar and dark matter structures in the Milky Way. Spectroscopic surveys can give us line-of-sight velocities, but they are time-consuming and limited in the volume of space they can cover. Additionally, not many spectroscopic surveys overlap with the region being observed by Gaia.

All’s not lost though! A recent study led by Adriana Dropulic (Princeton University) shows how machine learning could be used to predict line-of-sight velocities for stars with 5D astrometry from Gaia.

The true, predicted, and error-sampled distributions of different stellar velocity components. The error-sampled distributions trace the true distributions more closely than the predicted distributions. [Adapted from Dropulic et al. 2019]

Training a Neural Network

To develop their machine learning technique, Dropulic and collaborators started with a publicly available catalog of mock Gaia data, which included line-of-sight velocities. They also added stars similar to those in Gaia Enceladus, a prominent stellar structure that emerges when velocities are plotted. The mock catalog ended up covering the space within roughly five kiloparsecs (or 16,000 light-years) of the Sun and contained roughly 75 million stars. The catalog was then separated into training, validation, and test sets, the former resembling the real Gaia sample that has line-of-sight velocities, or “6D” information. 

Dropulic and collaborators then trained a neural network to predict the line-of-sight velocity of a given star and the associated uncertainty with that velocity. An important caveat of this method is that it does not aim to make a near-perfect guess at the line-of-sight velocity for a single star. Rather, the goal is to get a reasonable estimate of the velocity distribution of a whole group of stars.

Moving On from Mock Data

An advantage of having the network output an uncertainty was that Dropulic and collaborators were able to construct “error-sampled” velocity distributions, which are produced by averaging multiple velocity and uncertainty predictions for a single star and repeating for the entire distribution. These error-sampled velocity distributions ended up being closer to the true distributions than the predicted distributions.

The next step in this work is to train the neural network on a real catalog of Gaia data and also explore more distant regions of the Milky Way. The upcoming third Gaia data release will contain roughly 30 million stars with full 6D astrometry, so it won’t be long till this machine learning method can be put into practice!


“Machine Learning the Sixth Dimension: Stellar Radial Velocities from 5D Phase-space Correlations,”  Adriana Dropulic et al 2021 ApJL 915 L14. doi:10.3847/2041-8213/ac09ef