Saturday, October 9, 2021

The Role of Computing in Astronomy

 Cross posted from the MOON newsletter I edit: 

https://sites.google.com/view/georgemasonobservatory/newsletter-archive?authuser=0

BTW, if you want to subscribe to our Observatory Newsletter, sign up here:

https://docs.google.com/forms/d/e/1FAIpQLScbGcQG3o02ihW_ooXt3YmJaxjfPvOc_dbRLq4Rr75yt2FQFg/viewform?usp=sf_link 

I was recently quoted in The Verge, in an article shared widely about file and directory structure mental models, and operating system user data access:


https://www.theverge.com/22684730/students-file-folder-directory-structure-education-gen-z


While the article may have sensationalized the growing changes in student mental models for file organization, it does highlight how our use of computing evolves with time in everyday personal computing, and how those habits and skills directly impact the education and research that we do in astrophysics. The article in particular highlights how the pervasive use of “search” over the past two decades since the birth of internet search engines in the late 1990s, and its later sophistication and advances in operating systems and apps, both on computers, tablets and phones, has changed how many of our younger generations organize and access data of all kinds.

Older generations of computer users have long since internalized the ideas of folders and sub-folders, to arbitrarily deep levels of file organization.  The concepts of files and folders in a hierarchical structure for computing dates back to ERMA 1 and the Xerox Star in 1958 ( https://en.wikipedia.org/wiki/Directory_(computing) ), and of course back to their real-world paper counterparts before computing.


Interestingly, there is nothing fundamental about the organizing of data into “directories”, and really metadata tables and databases, tags, and other types of indexing as search engines generate are perfectly different ways of organizing data.  It’s just that many of us are so used to the concept we take it as a necessity. However, at the bit and byte level, operating systems map the data in entirely different ways into linearly addressable data array storage devices, be it solid-state drives, or the now vanishing technology of spinning platters of magnetic storage hard drives, or even older practically extinct floppy and magnetic tape drives.  Even our storage devices have changed dramatically in a human lifetime, so there’s no reason to be surprised that the ways in which we access that data through software will also evolve as computing power advances. 


Indeed, much research software these days, which often finds its way into educational use, relies on the assumption of data organized by files and directories.  Many of our operating systems do too, but everyone who has ever heard of a Windows Registry knows that databases and other forms of data organization are just as prevalent and important for operating systems. 


In astronomy, we often storage data in “raw” text files, csv files, or for imaging and spectroscopic data, FITS files, a relatively ancient data format ( https://en.wikipedia.org/wiki/FITS) first invented 40 years ago in 1981; at least I am slightly older than the FITS file format!  When the IDL programming language reigned supreme in astronomical data analysis during a bitter battle with the IRAF data analysis tools, it was common for scientists and students to pass around IDL save, or “.sav” files.  Nowadays, as Python’s reign of astronomical computing is in full bloom (with some contenders like Julia and others with minor or specialized use like R, Matlab, C/C++, Fortran and yes, still IDL), many students and scientists are passing around “npz” files, pythonized compressed data files for storing data structures either indexable with integers or keys. Python has gained such traction and use within astronomy for many reasons, only a few of which being it is free and thus accessible, and syntactically much simpler with the abstraction of data types and use of whitespace for nesting.  Astronomy also got heavily invested in Python through such efforts as the astrobetter blog led by Dr Kelle Cruz (https://www.astrobetter.com/ ), as well as the AstroPy set of libraries and utilities, which harkens back to the last generations use of iraf tools and utilities and the IDL astronomy library ( https://idlastro.gsfc.nasa.gov/ ). 


Computing plays a fundamental role in astronomy, from the control of hardware for data collection, to the analysis of that data with advanced Bayesian and frequentist statistical tools, to the implementation of theories in computational models for interpreting that data.  Files and folders are just a small piece of that, but even major archives such as the NASA Exoplanet Archive, SIMBAD, Vizier, NED, MAST, and iRSA increasingly rely on building more and more sophisticated search and data visualization tools for the organization and efficient user access to data.  Imagine if the GAIA data release ( https://www.cosmos.esa.int/web/gaia/earlydr3 ) consisted of solely a set of files and folders, with no way of searching by right ascension and declination, or star name. Search is here to stay, as well as the indexed databases that underlie them for the organization of the data contained therein.


It is interesting to ponder what the future of computing and its role in astronomy will be.  And to figure that out, one should look no further than our current undergraduate and graduate students, how they organize and access data.  Many organize their data (or not) within the Downloads, Desktop and Documents folders of whatever operating systems they use, and rely increasingly on operating system search, recent file histories and similar for access their data (and apps and mail!).  Some use tags, and some use sophisticated networks of nested directories. Regardless, search will clearly play a critical role in the future of computing in astronomy, but to date, searching of files is not at a sophisticated level within programming languages such as python; if anything most basic capabilities resemble the wildcard command line searches of previous generations.  Maybe that is because Python is a programming language built by the previous generation, and many of these sophisticated search tools for data access rely on languages like Python and other modern OS computing languages for development by programmers.


To put my futurist hat on, it’s interesting to imagine a revolution in computing starting with operating systems, and the abandonment of files and directories, one in which data is organized in a fundamentally different way, and indexed in database like structures and accessed in a search-centric operating system from the ground up. Whatever the future of computing ends up looking like, whether it involved graphical programming languages, artificial intelligence, or other cutting edge ideas, astronomy will go right along with it, and the next generation of astronomy computing tools will be developed by today’s students.  We’re just getting started on the role of computing in our handheld phones and wearable devices. 


This week a student in FOTO, our astronomy club, asked me what the diameter of the Ash telescope dome is. I didn’t know, and suggested we get a measuring tape. Instead, they pulled out an app on their iPhone 13, equipped with LIDAR-like capabilities, and made a decent 3D model of our Observatory with a LIDAR 3D app, on the spot:


 

The answer was 20 feet, as it turned out. It’s crazy to think about how hard generating this kind of 3D model would have been 20 years ago.  And that in a nutshell encapsulated for me how the our use of technology progresses and its role in astronomy computing evolves. While I have an iPhone with the Measure app (not as nice as the LIDAR 3D app), I never thought to make the measurement with my phone.  It’s interesting to ponder a day 20-40 years from now or more where Python and AstroPy are considered obsolete tools for astronomy data analysis, and no one uses FITS files, or npz files, or files at all for that matter.

 

 

No comments:

Post a Comment