Title: Software Heritage: a common infrastructure to preserve our Software Commons
Abstract:
Software is at the heart of our digital society and embodies a growing part of
our scientific, technical and organisational knowledge.
Software Heritage is an open non-profit initiative whose mission is to ensure
that this precious body of knowledge will be preserved over time and made
available to all.
We do this for multiple reasons. To preserve the scientific and technological
knowledge embedded in software source code, and contribute to collecting the
history of its development as a human endeavour. To allow better software development
and reuse for society and industry. To foster better science, building the
infrastructure for preserving, sharing and referencing research software, a
stepping stone for reproducibility, and a necessary complement to Open Access.
We do this now, to address the risk of massive lossage of source code developed
by the Free and Open Source community, with code hosting sites that shut down
when their popularity decreases.
On the one hand, we look for exhaustiveness, and we have already collected over
10 billions unique source files from over 150 millions repositories, and
organised them into a giant Merkle graph, with full deduplication across all
repositories.
On the other hand, we care for quality, and we have partnered with UNESCO and
the University of Pisa to develop the SWHAP process for rescuing, curating and
archiving landmark legacy source code.
Now we call on everybody, from individuals to companies, from public to
private entities, to contribute actively to this mission.
Short bio:
After obtaining a PhD in Computer Science at the University of Pisa, Roberto Di Cosmo was associate professor for almost a decade at Ecole Normale Supérieure in Paris, and became a Computer Science full professor at University Paris Diderot in 1999. He is currently on leave at Inria. He has been actively involved in research in theoretical computing, specifically in functional programming, parallel and distributed programming, the semantics of programming languages, type systems, rewriting and linear logic. His main focus is now on the new scientific problems posed by the general adoption of Free Software, with a particular focus on static analysis of large software collections, that were at the core of the european reseach project Mancoosi. Following the evolution of our society under the impact of IT with great interest, he is a long term Free Software advocate, contributing to its adoption since 1998 with the best-seller Hijacking the world, seminars, articles and software. He created the Free Software thematic group of Systematic in October 2007, and since 2010 he is director of IRILL, a research structure dedicated to Free and Open Source Software quality. In 2016, he co-founded and directs Software Heritage, an initiative to build the universal archive of all the source code publicly available.