Saturday, June 25, 2016

Guest blog post - Weighted Trend Filtering Algorithm and Machine-Learning Template Selection for Time-Series Analysis

This months exoplanet research blog post comes from a former undergraduate and current colleague of mine, Giri Gopalan.  Giri and I have known each other since 2009 when he was a Summer Undergraduate Research Fellow at Caltech.  I had already identified a student to work with me that summer, but then Giri showed up at my door.  He was so convincing in his confidence, so exuberant in his interest, and so knowledgeable with applied mathematics.  I had to take him on as a second summer student, and I'm glad I did.  The Kepler telescope hadn't launched yet, and discovering transiting exoplanets with ground-based surveys were de rigueur.  The result of our summer working together, which we revisited to finish last year, was recently published in the Publications of the Astronomical Society of the Pacific.  Giri took a successful algorithm for filtering ground-based time-series data, and improved upon it with mixed results. Giri is on his way to a three year PhD program at the University of Iceland in the fall of 2016.

###

Photometric time series data have provided a fruitful resource for astronomers in recent times; the curation and analysis of photometric time series have allowed for the detection of transients, perhaps most notably exoplanets. A planet which revolves around a star with respect to our line of sight results in a characteristic dip or transit  in the star’s light curve and by detecting such dips, astronomers gain evidence for the existence of an exoplanet.

Unfortunately photometric time series are subject to noise of two major varieties: white and systematic noise. White noise is of the standard independent and identical structure from measurement to measurement, whereas systematics correlate heavily between different stars and time scales (e.g., instrumental measurement errors or seeing conditions that vary on a nightly basis). Such noise complicates the process of detecting transients,  (e.g. the characteristic depressions or transits from exoplanets), and so it is prudent to decontaminate light curves of this noise before analyzing them to look for transients. It turns out that in practice, systematics tend to dominate noise patterns in comparison to white noise. Our work concerns the implementation and application of a well known systematic trend filtering methodology by Bakos and Kovacs, the Trend Filtering Algorithm (TFA), and the investigation into ways to improve its performance.   

The methodology (in the non-reconstructive mode used for our analysis) essentially consists of two major components: the creation of “template light curves” which are meant to encapsulate typical systematic noise patterns and the filtering of systematic noise from a particular light curve given this set of templates. TFA assumes that systematic noise is a linear combination of these “basis” vectors and the residual of the projection of a light curve onto this “basis” is the filtered signal. Hence we wrote MATLAB code to perform this least squares projection in addition to weighted least squares (where the weights are the inverse of measurement uncertainties of the light curve to be filtered as noted in the thesis of A.Pal) leveraging matrix algebra. An illustration below demonstrates the idea:





For the template selection procedure we implemented and tested a version of hierarchical clustering, introduced by Kim et al. in 2009.

Our overall results were mixed; the analysis on real data (select PTF and 2MASS data) seemed to indicate that overfiltering occurs if a template set is not chosen carefully. On the other hand simulation studies indicate that the modifications we investigated did not improve exoplanet detection, but potentially variable detection (“potentially” because the variable detection numbers were very small for all combinations of factors tried in our simulation study).

Nonetheless, it was an extremely gratifying experience to tie up work begun during the summer of 2009 as a Caltech summer undergraduate research fellow. Peter’s guidance and willingness to see this work through was pivotal and has set an example of mentorship that I hope to fulfill [Editor's note: Thanks Giri!]. Moreover, I am very grateful for the feedback I received from the remainder of my collaborators. Looking forward I still think there is much that can be done with this work: the first being the translation of the code into other languages (e.g., R or Python) and a systematic noise removal methodology which leverages a fully probabilistic (e.g., Bayesian hierarchical) model.