CERN's open source heritage: Building blocks to share
Photo by Jan Huber on Unsplash
The European Organization for Nuclear Research, known as CERN, has long been a driving force in scientific discovery and technological advancement. Beyond its groundbreaking research, CERN has also quietly championed open-source software for decades. But how to measure CERN’s impact on the global open-source community?
"Member states will appreciate that CERN is not only carrying out physics research but also contributes back through open source," says Axel Naumann, Chair of CERN's Open Source Program Office (OSPO). "That said, we've been scratching our heads on how to measure the impact. It's a non-trivial task, and while we lack statistically sound information about our user base, we can look at the code that’s been produced for insights."
A treasure hunt
The challenge is to gather as much of that code as possible. With over 2,200 stable employees and a shifting influx of 15,000-17,000 researchers, CERN is a nexus of scientific activity.
CERN’s community contributes to software projects at CERN, at their participating universities and institutes, and to projects maintained elsewhere and simply used at CERN. The contributors range from CERN employees to researchers visiting or working with CERN, from all around the globe. This complexity means focusing on CERN-hosted source code would miss many of the most interesting contributions.
The OSPO has engaged a student researcher to explore the world's largest open-source software archive with Software Heritage (SWH). The archive, which houses over 50 billion software artifacts secured by SWHID, offers unparalleled traceability across the entire software ecosystem. During the 12-month project, the student will be supported by experts from CERN's Scientific Information Service and the OSPO team.
"This partnership is not only a chance to document CERN's legacy," says Roberto Di Cosmo, Director of Software Heritage, "but also an opportunity to explore how open-source software accelerates scientific discovery and technological development worldwide."
By gathering and consolidating code scattered across various repositories over the years, what else does the OSPO team hope to learn? Naumann estimates that about 10% of the "official" open source code is produced by employees. That leaves the other 90% from visiting researchers who work together, devise solutions, finish projects, publish papers and code then move on. Some of these solutions, along with the code, go on to be widely adopted outside CERN, but so far remain unaccounted for and, essentially, unmapped.
"It's a bit of a treasure hunt," says Naumann, now a Senior Applied Physicist who has been working at CERN for 19 years. "I know of at least one or two of these unmapped projects, but how many others are there? How big is our impact, actually? That's what we're looking for and hoping to find."
This is where Software Heritage comes in. By using its archive and tools like the SWHID, the project aims to:
- Identify CERN-related projects: Unearth software projects that mention CERN or were developed by CERN-affiliated researchers.
- Track software lineage: Analyze how these projects have evolved over time, including forks, derivatives, and related contributions.
- Measure impact: Quantify the influence of CERN’s open-source software on the global community, its adoption in scientific research, and its broader contributions to technology.
Measuring impact for the future
Beyond CERN, this project also holds broader implications for other institutions and organizations grappling with the challenge of measuring their own open-source contributions.
This collaboration between CERN and Software Heritage is the beginning of a larger quest to map the intersection of science and software. As open-source software becomes an increasingly vital part of technological and scientific progress, understanding its impact is crucial but for society as a whole.
The project goes beyond CERN, Naumann says. "It's a creative solution for a problem that many people, many businesses, and many institutions have: 'How do we measure our open-source impact?' I think we've found a promising lead to answering this question."
CERN will make the project open source, so others can benefit from not only the findings but also the underlying analytical approach. This way, people can learn from their methods and potentially apply them to their own research.
Stay tuned for updates on the investigation into CERN's open-source contributions.
By Nicole Martinelli, editorial consultant