Mozilla Science Lab, GitHub and Figshare team up to fix the citation of code in academia
The Mozilla Science Lab, GitHub and Figshare – a repository where academics can upload, share and cite their research materials – is starting to tackle the problem. The trio have developed a system so researchers can easily sync their GitHub releases with a Figshare account. It creates a Digital Object Identifier (DOI) automatically, which can then be referenced and checked by other people.
Discussion of the above article on YCombinator
...it always make me cringe when privately held companies want to define an "open standard" for scientific citations that (surprise!) relies completely on their proprietary infrastructure. I still remember the case of Mendeley, which promised to build an open repository for research documents, and which is now a subsidiary of Elsevier, an organization that does not really embrace "open science", to put it mildly.
Tool developed at CERN makes software citation easier
Researchers working at CERN have developed a tool that allows source code from the popular software development site GitHub to be preserved and cited through the CERN-hosted online repository Zenodo....
Now, people working on software in GitHub will be able to ensure that their code is not only preserved through Zenodo, but is also provided with a unique digital object identifier (DOI), just like an academic paper.
Webcite
WebCite is an on-demand archiving system for webreferences (cited webpages and websites, or other kinds of Internet-accessible digital objects), which can be used by authors, editors, and publishers of scholarly papers and books, to ensure that cited webmaterial will remain available to readers in the future.
DOIs unambiguously and persistently identify published, trustworthy, citable online scholarly literature. Right?
So DOIs unambiguously and persistently identify published, trustworthy, citable online scholarly literature. Right? Wrong.
The examples above are useful because they help elucidate some misconceptions about the DOI itself, the nature of the DOI registration agencies and, in particular issues being raised by new RAs and new DOI allocation models.
Thirty-five codes were added to the ASCL in February:
Aladin Lite: Lightweight sky atlas for browsers
ANAigm: Analytic model for attenuation by the intergalactic medium
ARTIST: Adaptable Radiative Transfer Innovations for Submillimeter Telescopes
astroplotlib: Astronomical library of plots
athena: Tree code for second-order correlation functions
BAOlab: Baryon Acoustic Oscillations software
BF_dist: Busy Function fitting
CASSIS: Interactive spectrum analyzer
Commander 2: Bayesian CMB component separation and analysis
CPL: Common Pipeline Library
Darth Fader: Galaxy catalog cleaning method for redshift estimation
DexM: Semi-numerical simulations for very large scales
FAMA: Fast Automatic MOOG Analysis
GalSim: Modular galaxy image simulation toolkit
Glue: Linked data visualizations across multiple files
gyrfalcON: N-body code
HALOFIT: Nonlinear distribution of cosmological mass and galaxies
HydraLens: Gravitational lens model generator
KROME: Chemistry package for astrophysical simulations
libsharp: Library for spherical harmonic transforms
MGHalofit: Modified Gravity extension of Halofit
Munipack: General astronomical image processing software
P2SAD: Particle Phase Space Average Density
PyGFit: Python Galaxy Fitter
PyVO: Python access to the Virtual Observatory
PyWiFeS: Wide Field Spectrograph data reduction pipeline
QUICKCV: Cosmic variance calculator
QuickReduce: Data reduction pipeline for the WIYN One Degree Imager
SPLAT-VO: Spectral Analysis Tool for the Virtual Observatory
SPLAT: Spectral Analysis Tool
TARDIS: Temperature And Radiative Diffusion In Supernovae
UVMULTIFIT: Fitting astronomical radio interferometric data
Vissage: ALMA VO Desktop Viewer
wssa_utils: WSSA 12 micron dust map utilities
XNS: Axisymmetric equilibrium configuration of neutron stars
The ASCL has 779 codes in it now, some of which date back to the 1990s. With the speed at which both the web and code authors (often grad students or post docs) move, links to some code sites are bound to go bad over time. We use a checker regularly to test links to ensure we're not pointing to dead links; when we do find a broken link (defined as one we haven't been able to reach for at least 2 weeks), we look for a new one and, if that doesn't work, email the code author(s) to ask where the code has moved.
We can't always find a good link, and code authors sometimes don't reply to our emails. Currently, eight codes -- 1% of our entries -- have bad links. Of these, for half of them we either cannot find the code author or the code author has not replied to numerous emails.
What else can we do?
I assume that some code authors forget their codes. Having moved on perhaps to another institution and other work, they do not have time nor incentive to create a new web home for a code they wrote some years ago. That's understandable, but then the code, a unique solution to a problem, an artifact of astrophysics research, a method used in research, is lost.
We'd like to save the codes (Save the Codes! I may have to put that on glow-in-the-dark pencils); here are a few ideas for authors who no longer want to maintain a site for their codes:
I don't know about option 4, but options 1-3 should take 15 minutes or less. Surely a code is worth that little bit of extra time to make it available to others even if you don't want to be bothered with it anymore.
Please save your code; don't let it go bad!
There are currently 768 codes registered in the ASCL; the percentages of codes hosted on different popular sites are:
GitHub: 4.17%
SourceForge: 3.78%
Code.Google: 1.96%
Bitbucket: 0.52%
That means 11% of codes indexed by the ASCL are hosted on a public site conducive to social programming. That's higher than the 7% from two years ago (by coincidence, almost exactly two years ago) and not unexpected, given the growth of GitHub. Fewer than 1% of ASCL codes were in GitHub two years ago (only 3 at that time -- wow!); now there are 32 hosted on GitHub. For comparison, there were 14 codes on SourceForge two years ago, so while that number has doubled, the growth in use of GitHub is obviously much greater.
Though stored on sites conducive to collaboration, most of these codes are not big collaborations; the majority of codes in the ASCL in these repositories have 4 or fewer authors.
I expect the percentage of codes on such sites to grow as more people use these tools for versioning; I think those who use such tools may also be more open to sharing their codes and advertising them (via links in papers if nothing else), making them easier to find/register in the ASCL, too.
On Tuesday, January 7, the AAS Working Group on Astronomical Software (WGAS) and the ASCL sponsored a special session on code sharing as a follow up to the splinter meeting “Astrophysics Code Sharing?” held at AAS 221. We continue the dialogue for ways to improve the transparency and efficiency of research by sharing codes and to mitigate the negative aspects of releasing them.

Before the session started, however, there were a few nerve-wracking moments; weather- and Amtrak-related delays had one of the presenters arriving at AAS at 2:40 AM the day of the session rather than before lunch on Monday, and another getting to AAS after the session had started (!) but before his talk was to begin. So yes! There were minutes to spare!
The standing-room-only session was moderated by Peter Teuben of the University of Maryland and chairman of the ASCL Advisory Committee; Robert Hanisch, STScI, outgoing chair of the WGAS and also a member of the ASCL Advisory Committee, provided closing remarks. Those not in the room were not without news of what was being said in it, as there was much tweeting about the session (#aas223, #astroCodeShare).
Peter started the session by introducing the speakers (present or not) and explaining a bit how the session would work: code case studies would have 2-minute question periods for any clarifications or questions about the cases themselves, and other questions would be deferred until the open discussion period, which was approximately the latter half of the session.
Presentations
A very brief summary of some main points of the sessions, along with their titles, presenters, and links to slides where applicable, is given here.
Discussion
After David's presentation, Peter opened the floor for questions and discussion, and Kelle Cruz from Hunter College was ready! Kelle said that AAS should require code release and then asked whether anyone from the AAS journals was present. There was not.

Kelle then suggested to Daniel Katz that the NSF should take stronger role in enforcement. Dan said he will see what he can do to get astronomy reviewers training for what to look for, and that he already does this for his area. David Hogg said there aren't any mechanisms for long-term stewardship of software and asked whether the NSF was looking at this. Dan said it is not at this time, and that the NSF generally avoids long-term commitment of funds.
Someone in the back of the room pointed out that protection of code can also lead to the protection of errors, told a sad anecdote to illustrate that point, and commented that code sharing fosters improvements in coding practice. In response to a question about whether it was worthwhile to share very specific code, David answered yes, just post it, that if it's not useful to others, so what? But it just might be! And Benjamin Weiner suggested the code be put in GitHub.
Two questions came from someone else in the back of the room, one on whether export control restrictions (ITAR) would be changing; the second question relayed that PhD students write a code for their thesis but then protect it because, in their perception, the code makes them employable, and did the panel have anything to say about that? Erik Tollerud made the point that people are hired for the skills that went into creating the code, not for a particular code. David replied that he has seen this with data, that proprietary data does sometimes give someone leverage for employment. Dan answered the ITAR question by saying that changes in ITAR were probably not going to happen soon.
Another attendee asked about the cost of making code shareable, of what that cost is, and felt that the panelists had swept it under the rug. Ben replied that it's a community problem, the community needs to reward it, and there needs to be a values change. In the meantime, put it out there anyway; clean it up if you can, but put it out. David agrees there are costs, but the benefits are more substantial than the costs. The cost is not very large and the upside is larger than the downside. Bruce thinks it is worth the effort to plan upfront; that will save time/money later on. This is harder if the code is not initially planned, but one should try to address this when possible.
Nuria Lorente, who was following the session from Australia through Twitter, tweeted that "NOT releasing code also comes at a price, which is often forgotten."
Andrej Prsa from Villanova made a strong appeal to post code to arXiv; he stated that astro-ph should be open to other things beside preprints. Someone else pointed out that arXiv doesn't necessarily agree. David said that he put the documentation for emcee, the MCMC hammer on arxiv and that gets cited. Erik made the point about additional contributors to a software development project such as Astropy don't get credit if they are not on the author list on the paper uploaded to arXiv. Alberto Accomazzi from ADS mentioned that updating the author list on arXiv was a way to fix that and give others credit, even though that will not be reflected on ADS.
Someone commented on the need for some sort of code sharing infrastructure to help with sharing. David commented that he wants all flowers to bloom, but some flowers are more valuable than others. Erik said that better search engines over time will help, that Astropy is more findable because of better search engines and because more people now link to it. It was mentioned that with more code sharing, finding useful codes may become more difficult as the signal to noise ratio goes down.
Alberto Accomazzi brought up the uncertain provenance of code, code that does not have a license, and sometimes no author, attached to it, and stated that it is hard to deal with because it cannot be shared. This was echoed by David, who pointed out that a lack of a license for a code can prevent release. Bruce suggested a licensing workshop would be a good idea, and this idea got traction among attendees. The recent re-licensing of yt was brought up. Dan Katz looks specifically for licensing information when looking at proposal, and it's clear to him that many people don't know what they are doing on this and could use guidance. David suggested that people use BSD or MIT licenses if they know nothing about licensing.
Peter Teuben then brought the discussion to an end and turned the podium over to Robert Hanisch for closing remarks.
Session wrapup
Robert Hanisch reiterated that software sharing is fundamental to the dissemination and validation of research results, and though there are carrots and sticks for software sharing, the sticks are not very strong. He also pointed out that nothing within the funding agencies offers support for software development and that there is a disconnect between national policy and implementation. Journals at best only encourage code release, too; they do not require it. A sociological change is necessary; in the meantime, he hopes those attending will just put codes out there! The benefits outweigh the costs.
He talked also about opportunity for change; as of Sunday January 5, the Working Group on Astronomical Software has Frossie Economou as its new chair, and that over the weekend, the Council of AAS had suggested that the WGAS be elevated from a Working Group to a Division within AAS. He had requested that the Council have the WGAS offer a prize specifically for software, and though the Council did not accept the idea upon presentation, Bob noted that a Division can award prizes independently. Having a Division focused on software will also provide more visibility for it, and on this hopeful note, the session ended.
... though the discussion continues...
My thoughts (just a few)
This is the fourth discussion session the ASCL has arranged; previous sessions include one at AAS 221 and two at the previous two ADASS meetings. (Links to materials or discussion from previous sessions are below.)
I was glad to hear several of the presenters say the concerns people have about releasing their codes are overplayed. I was particularly happy when David said that if people would only go ahead and release their imperfect software, other people would see that released codes are also imperfect and thus feel more emboldened to release their own imperfect work. Yes! Lose the fear, gain the codes! It really doesn't need to be perfect; Nick Barnes, among others, have written eloquently, or amusingly, on this subject already. Astronomical software wants to be free; please release it, let it show!
It was hard for me to stay silent when the need for a code sharing infrastructure was mentioned, not because I disagree with the need -- I believe the need is very great! -- but because the ASCL is trying hard to help with that. I've looked at other similar efforts tried over the years, and either they have started, lived (usually briefly) and in one case, even flowered, and died, or they still exist but are mostly silent and their efforts in code sharing dormant. The ASCL has been around since 1999 and is indexed by ADS, and use of it has been increasing. It's not perfect, but it does work and is actively growing.
I believe that science should be as transparent as possible, that code release (absent ITAR and other truly compelling reasons) even if only for examination, not reuse, is part of this transparency, and that ultimately, code release is better for code authors, especially if the astronomy community works together to make it better for them. Code sharing can make astronomy more efficient, too, which is especially important in the current financial climate.
Finally, I want to thank Peter for moderating the session, Bob for offering closing remarks, and the most excellent Ben, Bruce, Gary, Erik, Dan, and David for presenting at this session and also for not protesting even one time about the innumerable emails they received from me from May on. I also have to thank our wonderful volunteer whose name I did not get, alas, for her great work and for counting the 149 (!) attendees, the AAS for accepting the proposal in the first place, and the amazing people who sent this session literally around the world through their tweets. Thank you!
AAS 221 Astronomy Code Sharing? links
Announcement
Omar Laurino joins Astronomy Code Sharing panel
Brief blog post
Astronomy Computing Today post
Slides used at meeting: Google Doc PDF
ADASS XXIII (2013) links
Announcement
Our eight questions
The eight questions that were discussed/links to discussion notes
Pre-print of proceedings paper
ADASS XXII (2012) links
Birds of a Feather session
Resources used/linked to for ADASS
Pre-print of proceedings paper
Twenty codes were added to the ASCL in July, and eighteen in August.
July:
AstroTaverna: Tool for Scientific Workflows in Astronomy
cosmoxi2d: Two-point galaxy correlation function calculation
CTI Correction Code
DustEM: Dust extinction and emission modelling
ETC++: Advanced Exposure-Time Calculations
FieldInf: Field Inflation exact integration routines
im2shape: Bayesian Galaxy Shape Estimation
ITERA: IDL Tool for Emission-line Ratio Analysis
K3Match: Point matching in 3D space
LENSVIEW: Resolved gravitational lens images modeling
MAH: Minimum Atmospheric Height
Monte Python: Monte Carlo code for CLASS in Python
NEST: Noble Element Simulation Technique
Obit: Radio Astronomy Data Handling
orbfit: Orbit fitting software
phoSim: Photon Simulator
PURIFY: Tools for radio-interferometric imaging
Shapelets: Image Modelling
SIMX: Event simulator
SOPT: Sparse OPTimisation
August:
APPSPACK: Asynchronous Parallel Pattern Search
BASIN: Beowulf Analysis Symbolic INterface
Ceph_code: Cepheid light-curves fitting
ChiantiPy: Python package for the CHIANTI atomic database
CReSyPS: Stellar population synthesis code
CRUSH: Comprehensive Reduction Utility for SHARC-2 (and more...)
GYRE: Stellar oscillation code
JHelioviewer: Visualization software for solar physics data
LensEnt2: Maximum-entropy weak lens reconstruction
LOSSCONE: Capture rates of stars by a supermassive black hole
MapCurvature: Map Projections
MoogStokes: Zeeman polarized radiative transfer
RADLite: Raytracer for infrared line spectra
SMILE: Orbital analysis and Schwarzschild modeling of triaxial stellar systems
SPEX: High-resolution cosmic X-ray spectra analysis
SYN++: Standalone SN spectrum synthesis
SYNAPPS: Forward-modeling of supernova spectroscopy data sets
THELI GUI: Optical, near- & mid-infrared imaging data reduction
Also in August, we added one very cool web resource, the NASA Exoplanet Archive.
Sixteen codes were added to the ASCL in June:
BEHR: Bayesian Estimation of Hardness Ratios
Bessel: Fast Bessel Function Jn(z) Routine for Large n,z
grmonty: Relativistic radiative transport Monte Carlo code
Harmony: Synchrotron Emission Coefficients
LRG DR7 Likelihood Software
MADCOW: Microwave Anisotropy Dataset Computational softWare
MAPPINGS III: Modelling And Prediction in PhotoIonized Nebulae and Gasdynamical Shocks
Pico: Parameters for the Impatient Cosmologist
PROM4: 1D isothermal and isobaric modeler for solar prominences
PROS: Multi-mission X-ray analysis software system
SAC: Sheffield Advanced Code
STF: Structure Finder
Tapir: A web interface for transit/eclipse observability
VHD: Viscous pseudo-Newtonian accretion
Yaxx: Yet another X-ray extractor
I'm behind and am glad my next weekend starts on Thursday and runs for four days; perhaps I'll get caught up! Eight codes added in June are now available and more are coming. Watch
Fifteen codes were added to the ASCL in May:
AdaptaHOP: Subclump finder
ESTER: Evolution STEllaire en Rotation
FITDisk: Cataclysmic Variable Accretion Disk Demonstration Tool
GaussFit: Solving least squares and robust estimation problems
GILDAS: Grenoble Image and Line Data Analysis Software
MapCUMBA: Multi-grid map-making algorithm for CMB experiments
Merger Trees: Formation history of dark matter haloes
Non-Gaussian Realisations
PINOCCHIO: PINpointing Orbit-Crossing Collapsed HIerarchical Objects
PkdGRAV2: Parallel fast-multipole cosmological code
Pressure-Entropy SPH: Pressure-entropy smooth-particle hydrodynamics
pynbody: N-Body/SPH analysis for python
TAU: 1D radiative transfer code for transmission spectroscopy of extrasolar planet atmospheres
TPM: Tree-Particle-Mesh code
YNOGK: Calculating null geodesics in the Kerr spacetime
We also added ExoVis to our web resources and tools page.
ExoVis, the winner of the 2013 Open Exoplanet Catalogue visualization contest, is an exosystem visualizer programmed by Tom Hands, a PdD student at the University of Leicester. It's quite elegant. ExoVis has been added to our list of Web Resources and Tools.
Streams Going Notts: The tidal debris finder comparison project popped up on arXiv recently. This paper, which has been added to our thread for papers of possible interest, discusses testing four codes, S-Tracker, VELOCIraptor (formerly known as the STructure Finder, STF), ROCKSTAR, and HOT6D, to determine how well they find tidal debris in a fully cosmological Milky Way type simulation. The paper compares the algorithms used by the codes and quantifies the findings.