diff --git a/common/modules/ese-research.org b/common/modules/ese-research.org index 88389d2..cb7009f 100644 --- a/common/modules/ese-research.org +++ b/common/modules/ese-research.org @@ -1,100 +1,104 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+INCLUDE: "prelude.org" :minlevel 1 * Software provenance and evolution :PROPERTIES: :CUSTOM_ID: provenance :END: ** Software provenance and evolution #+BEAMER: \begin{center} \includegraphics[width=0.7\textwidth]{commit-time-distro} \end{center} \vspace{-2mm} *** Key findings - The amount of original commits in public code doubles every ~30 months and has been doing so for 20+ years; original source code files double every ~22 months - It is possible to trace the provenance of source code artifacts at this scale in a compact relational model via the notion of isochrone graphs. #+BEAMER: \vspace{-2mm} *** #+BEGIN_EXPORT latex \vspace{-2mm} \footnotesize \begin{thebibliography}{Foo Bar, 1969} \bibitem{Rousseau2020} Rousseau, Di Cosmo, Zacchiroli\newblock Software Provenance Tracking at the Scale of Public Source Code\newblock In Empirical Software Engineering, 2020 \end{thebibliography} #+END_EXPORT * Software forks :PROPERTIES: :CUSTOM_ID: forks :END: ** Software forks *** Idea - Forks can be detected via either platform metadata (e.g., GitHub keeping track of who clicked "fork" on what repo; the most common approach), or via shared version control system history. - Thanks to deduplication and platform agnosticity, Software Heritage provide a privileged observation point on the global fork ecosystem in public code. *** Research questions - What is the right definition of "being a fork"? (methodology) - How many forks could we miss by looking only at platform metadata? - How many "cross-platform" forks (e.g., GitHub → GitLab) exist in the wild? ** Software forks (cont.) *** Findings - Forks classification: based on platform metadata (“type 1” forks), sharing at least one commit (“type 2”), sharing a common root directory at some point in VCS history (“type 3”). - Up to 16% forks could be overlooked by considering only GitHub type 1 forks (a potentially significant threat to validity!). - Relevant independent development activity can happen on GitLab.com for projects initially just mirrored from GitHub. *** #+BEGIN_EXPORT latex \vspace{-3mm} \footnotesize \begin{thebibliography}{Foo Bar, 1969} \bibitem{Pietri2020} Pietri, Rousseau, Zacchiroli.\newblock Forking Without Clicking: on How to Identify Software Repository Forks.\newblock MSR 2020 \bibitem{Bhattacharjee2020} Bhattacharjee et al.\newblock An exploratory study to find motives behind cross-platform forks from Software Heritage dataset.\newblock MSR 2020 \end{thebibliography} #+END_EXPORT * Diversity, equity, and inclusion :PROPERTIES: :CUSTOM_ID: diversity :END: ** Diversity, equity, and inclusion *** Idea Archived commit metadata contains public information that can be mined to study long-term trends of diversity, equity, and inclusion (DEI) traits of the global population of public code contributors. *** Key findings on the gender gap - Male authors contributed 92% of public code commits up to 2019. - The ratio of female authors (and their contributions) has grown stably for 15 years reaching for the first time 10% of yearly contributions in 2019. - The COVID-19 pandemic has reversed the trend. ** Diversity, equity, and inclusion (cont.) *** Key findings on the geographic gap - The early decades of public code were dominated by contributions from North America, followed by a period of alternating dominance between North America and Europe. - Since then geographic diversity has increased constantly, with raising importance of contributions from Central and South America. - The trend of increased female contributions is almost worlwide, with the notable exception of specific regions of Asia were it is either slower or flat. *** References #+BEAMER: \footnotesize - - Zacchiroli. /Gender differences in public code contributions: a 50-year - perspective/. IEEE Software, 2021 - - Rossi and Zacchiroli. /Worldwide gender differences in public code - contributions/. ICSE SEIS, 2022 - - Rossi and Zacchiroli. /Geographic diversity in public code - contributions/. MSR 2022 + - Zacchiroli. + /Gender differences in public code contributions: a 50-year perspective/. + IEEE Software, 2021 + - Rossi and Zacchiroli. + /Worldwide gender differences in public code contributions (and how they have been affected by the COVID-19 + pandemic)/. + ICSE SEIS, 2022 + - Rossi and Zacchiroli. + /Geographic diversity in public code contributions/. + MSR 2022