Data: Insights and Challenges in a Time of Abundance

One of the most rapidly evolving elements of astronomy research is how we handle data. With telescopes and computer simulations progressively producing ever vaster quantities, how can we process and analyze this data? What tools can we use to turn it into new astronomical discoveries?

The future of astronomy relies on new innovations on this front, and in a Special Issue of the Astrophysical Journal Supplement Series, 23 papers explore different insights and challenges related to astronomical data — presenting new workflows, software instruments, databases, and tutorials that will aid astronomers in generating novel and significant research results.

Here are the broad categories of data in astronomy that are touched on in this special issue:

volume renderings

Volume renderings from a simulation of a low metallicity star. This is an example of the data that can be analyzed using cyberhub, a web-browser-based tool for medium-sized collaborations. [Herwig et al. 2018]

1. Cloud-Based Research Environments for Discovery

Collaborations in astronomy are often large and broadly distributed. As a result, the astronomy community needs the infrastructure to be able to access large data sets, combine them, and collaboratively process them to make discoveries. An article by Herwig et al. presents the cyberhubs system, a package for medium-sized scientific teams to collaboratively interact with data via web browser. Williams et al. discuss the challenges inherent in reducing a large photometric data set — in their case, data from the Panchromatic Hubble Andromeda Treasury (PHAT) — on the Amazon Elastic Compute Cloud (EC2), a commercial system of virtual computers that users can rent on demand. Heidorn et al. present Astrolabe, a cyberinfrastructure project of the University of Arizona and the American Astronomical Society that aims to ensure the long-term curation of astronomical data for future reference and use.

2. Software Instruments for Transient Detection, Alerts, and Analysis

time-variable sources

Just some of the time-variable sources that are detected and analyzed, and their characteristic timescales for variation. [Narayan et al. 2018]

Given the current boom of time-domain astronomy, the development of tools for studying transient astronomical phenomena is crucial. Necessary tools include not only those that will detect transients, but also those that provide alerting for rapid followup, and those that enable analysis of the large quantities of resulting data. Law et al. discuss realfast, a fast transient search system at the Jansky Very Large Array that will look for transients in real time as data comes in, reducing the amount of data that must be stored. Guillochon et al. introduce MOSFiT, a software package that enables rapid comparison of transient data to models. And Narayan et al. present ANTARES, an automated software system that sifts through, characterizes, annotates, and prioritizes transient events for followup, allowing for rapid alerting of the community to transients that warrant additional observations.

In addition to searching for unexpected transient events, time-domain astronomers also study the variability of single sources. He et al. describe a long-term study of magnetic-feature and flare activity of three Sun-like stars with Kepler. As for the Sun itself — studying it in detail produces terabytes-per-hour streams of data that must be captured and analyzed. Denker et al. present the challenges of managing such a stream of high-resolution observations at the GREGOR Solar Telescope, and Boubrahimi et al. explore how best to interpolate between solar data collected from a variety of ground-based and space-based solar observatories every day.

3. Statistical Properties of Data with Uncertainties or Gaps

How do we address the issue of incomplete or uncertain data? Correct application of statistical methods are an important aspect of data reduction. Hogg et al., Vianello, Huppenkothen et al., VanderPlas, Huijse et al., Ma et al., and Aggarwal et al. all present on methods of careful statistical handling of astronomical data — covering topics from an overview of Markov Chain Monte Carlo methods for sampling probability density functions, to a look at how we might use statistics to predict solar eruptions.


Blue dots represent the 838 characterized OSSOS discoveries of trans-Neptunian objects from a recent data release. [Bannister et al. 2018]

4. New Database Releases

The production of vast amounts of data isn’t enough — it must also be compiled in a useful way before it can be analyzed by the community. The regular release of large, updated databases are an important driver of astronomical discovery. In this Special Issue, Bannister et al. present the Outer Solar System Origins Survey (OSSOS), a data release of more than 800 trans-Neptunian objects, and Egeland introduces sunstardb, a database useful for studying stars in analogy to the Sun.

5. Astronomy Data in Publication

The big-data boom produces many important questions in scientific publishing, like how data will be cited and classified, whether software instrument source codes will be made available, and what impact these references might have on the future of astronomical publication. Novacescu et al. discuss the policy of data citation — in particular, using digital object identifiers (DOIs) to refer to data both analyzed and generated by research projects. Frey et al. present an update on the Unified Astronomy Thesaurus, an effort to unite astronomers under a single vocabulary to govern keywords and classification for astronomy research. Allen et al. address the issue of source code availability: can other researchers easily access the software you used, to explore or reproduce your results? Varga examines how metrics based on references or keywords can be used to predict citation impact for scientific articles.

6. Advances in Data Visualization


More screen captures of the WorldWide Telescope user interface. [Rosenfield et al. 2018]

One challenge of astronomy data echoes the challenge inherent in all of science: how can we best communicate and share it? Rosenfield et al. introduce a tool for this, the American Astronomical Society’s WorldWide Telescope (WWT). This project enables terabytes of astronomical images, data, and stories to be viewed and shared among researchers, exhibited in science museums, projected into full-dome immersive planetariums and virtual reality headsets, and taught in classrooms.

It’s evident that there are indeed many challenges raised by the production and management of vast amounts of astronomical data — but there are also many opportunities available. The articles in this Special Issue are meant to provide an introduction to some of the topics currently under consideration, but conversations will continue to evolve as we adapt to this age of big data.


Special ApJS Issue on Data

Frank Timmes and Leon Golub 2018 ApJS 236 1. doi:10.3847/1538-4365/aab770