diff --git a/common/images/archive-growth.png b/common/images/archive-growth.png index ea64c43..31ce9b9 120000 --- a/common/images/archive-growth.png +++ b/common/images/archive-growth.png @@ -1 +1 @@ -2020-12-archive-growth.png \ No newline at end of file +2021-01-archive-growth.png \ No newline at end of file diff --git a/common/modules/swh-dig.org b/common/modules/swh-dig.org index da5793a..364a8f9 100644 --- a/common/modules/swh-dig.org +++ b/common/modules/swh-dig.org @@ -1,33 +1,36 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+INCLUDE: "prelude.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" * Deposit Interest Group :PROPERTIES: :CUSTOM_ID: main :END: ** The Source Code Deposit Interest Group (DIG) in a nutshell + :PROPERTIES: + :CUSTOM_ID: overview + :END: *** - Interested in depositing and tracing software source code? + For all those Interested in depositing and tracking software source code. *** Source code deposit interest group (DIG) :PROPERTIES: :BEAMER_env: block :END: \vspace{-10pt} #+ATTR_LATEX: :environment longtable :align |r|c|c|c|c| |--------------------+-----------+------+-----------+-------| | Benefits / Level | strategic | core | solutions | basic | |--------------------+-----------+------+-----------+-------| | strategic advisory | Y | | | | |--------------------+-----------+------+-----------+-------| | technical advisory | Y | Y | | | |--------------------+-----------+------+-----------+-------| | general assembly | Y | Y | Y | | |--------------------+-----------+------+-----------+-------| | deposit code | Y | Y | Y | Y | |--------------------+-----------+------+-----------+-------| *** Current members \hfill OIN, VMware, DINUM, CNRS, MESRI, University of Paris *** How to join \hfill contact us at mailto:sponsor@softwareheritage.org diff --git a/talks-public/2021-04-07-llw/2021-04-07-llw.org b/talks-public/2021-04-07-llw/2021-04-07-llw.org index 1178e7b..493e23f 100644 --- a/talks-public/2021-04-07-llw/2021-04-07-llw.org +++ b/talks-public/2021-04-07-llw/2021-04-07-llw.org @@ -1,372 +1,376 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Towards an Open Data and Open Source Code Scanner #+SUBTITLE: for your Open Compliance #+BEAMER_HEADER: \date[7 Apr 2021, LLW]{7 April 2021\\Legal \& Licensing Workshop\\online\\[-2ex]} #+AUTHOR: Stefano Zacchiroli #+DATE: 7 April 2021 #+EMAIL: zack@upsilon.cc #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" #+BEAMER_HEADER: \institute[Software Heritage]{Software Heritage --- {\tt zack@upsilon.cc, @zacchiro}} #+BEAMER_HEADER: \author{Stefano Zacchiroli} #+LATEX_HEADER_EXTRA: \usepackage{pifont} #+LATEX_HEADER_EXTRA: \usepackage{xspace} #+LATEX_HEADER_EXTRA: \def\OK{\mbox{\ding{51}}\xspace} #+LATEX_HEADER_EXTRA: \def\KO{\mbox{\ding{55}}\xspace} +#+LATEX_HEADER: \definecolor{links}{HTML}{2A1B81} +#+LATEX_HEADER: \hypersetup{colorlinks,linkcolor=,urlcolor=links} ** About the speaker #+INCLUDE: "this/zack.org::#bio" :only-contents t * Open Compliance ** Open Compliance # (Open Compliance, /noun/ --- the reason we are gathered here today)\\ # More seriously, here is My own take on a comprehensive definition of our shared interests: #+BEAMER: \begin{definition}[Open Compliance] The *pursuit of compliance* with /license obligations/ and other /best practices/ for the management of open source software components *using only open technology*, such as: _open source_ software, _open data_ information, and _open access_ documentation. #+BEAMER: \end{definition} *** Why - Reduced lock-in risks - Lower total cost of ownership (TCO) - Allow to crowdsource expensive compliance steps (e.g., scanning, curation) - Aligned with the ethos of free/open source software (FOSS) communities *** Long-discussed in FOSS compliance circles. Many well-established collaboration initiatives: Open Source Tooling Group, Open Compliance Program, Double Open, ... ** Reuse is the new rule ... \hfill ... KYSW is coming! *** Reuse is the new rule 80% to 90% of a new application is ... just reuse! \hfill (Sonatype survey, 2017) #+BEAMER: \pause *** Where does reused software come from? :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .5 :END: #+BEGIN_EXPORT latex \begin{center} \includegraphics[width=.7\linewidth]{myriadsources} \end{center} #+END_EXPORT #+BEAMER: \pause *** Do /you/ know where it comes from? :B_block: :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .4 :END: - the software you ship - the software you use - the software you acquire - the software that + has that bug + has that vulnerability *** :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+BEAMER: \pause *** KYSW: Know Your SoftWare \hfill Like KYC in banking, KYSW is now essential all over IT ** Anatomy of a KYSW toolchain #+BEAMER: \begin{center}{\includegraphics[width=0.8\textwidth]{compliance-toolchain}}\end{center} #+BEAMER: {\tiny \vspace{-1mm} source: [[https://upsilon.cc/~zack/talks/2016/2016-01-31-fosdem-compliance.pdf][/A Community Take on the License Compliance Industry/]], Stefano Zacchiroli, FOSDEM 2016, Legal and Policy Issues devroom, https://upsilon.cc/~zack/talks/2016/2016-01-31-fosdem-compliance.pdf #+BEAMER: } *** A *code scanner* is the key ingredient of all KYSW toolchains: it scans a local source code base and compares it to a FOSS knowledge base, summarizing findings. (We will ignore other features for the purpose of this talk.) ** An Open Compliance Source Code Scanner --- Requirements *** Be Open Compliance-...compliant - front-end: open source client, running locally on your code base - back-end: open data knowledge base, remote or self-hosted *** Practical needs - known/unknown information (has this been published before?) - license information - provenance information - scanning granularity: both file-level and snippet-level - knowledge-base coverage: cover all of FOSS #+BEAMER: \pause *** #+BEAMER: \bfseries Claim: we still lack a source code scanning tool that is compliant with Open Compliance principles and addresses industry practical needs. * Software Heritage ** Software Heritage, in a nutshell \hfill www.softwareheritage.org #+BEAMER: \transdissolve #+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :only-contents t :minlevel 3 ** A principled infrastructure \hfill \url{http://bit.ly/swhpaper} #+latex: \begin{center} #+ATTR_LATEX: :width 0.8\linewidth file:SWH-as-foundation-slim.png #+latex: \end{center} #+BEAMER: \pause *** Technology :PROPERTIES: :BEAMER_col: 0.34 :BEAMER_env: block :END: - transparency and FOSS - replicas all the way down *** Content :PROPERTIES: :BEAMER_col: 0.32 :BEAMER_env: block :END: - intrinsic identifiers - facts and provenance *** Organization :PROPERTIES: :BEAMER_col: 0.33 :BEAMER_env: block :END: - non-profit - mirror network ** An international, non profit initiative\hfill built for the long term :PROPERTIES: :CUSTOM_ID: support :END: *** Sharing the vision :B_block: :PROPERTIES: :CUSTOM_ID: endorsement :BEAMER_COL: .5 :BEAMER_env: block :END: #+LATEX: \begin{center}{\includegraphics[width=\extblockscale{.4\linewidth}]{unesco_logo_en_285}}\end{center} #+LATEX: \vspace{-0.8cm} #+LATEX: \begin{center}\vskip 1em \includegraphics[width=\extblockscale{1.4\linewidth}]{support.pdf}\end{center} #+latex: \small And many more ...\\ #+latex:\mbox{}~~~~~~~\tiny\url{www.softwareheritage.org/support/testimonials} #+BEAMER: \pause *** Donors, members, sponsors :B_block: :PROPERTIES: :CUSTOM_ID: sponsors :BEAMER_COL: .5 :BEAMER_env: block :END: #+LATEX: \begin{center}\includegraphics[width=\extblockscale{.4\linewidth}]{inria-logo-new}\end{center} #+LATEX: \begin{center} # #+LATEX: \includegraphics[width=\extblockscale{.2\linewidth}]{sponsors-levels.pdf} #+LATEX: \colorbox{white}{\includegraphics[width=\extblockscale{1.4\linewidth}]{sponsors.pdf}} #+LATEX: \end{center} # - sponsoring / partnership :: \hfill \url{sponsorship.softwareheritage.org} ** The largest free/open source software archive #+INCLUDE: "../../common/modules/status-extended.org::#archive" :only-contents t :minlevel 3 ** Automation, and storage #+BEAMER: \begin{center} #+BEAMER: \mode{\only<1>{\includegraphics[width=\extblockscale{1.1\textwidth}]{swh-dataflow-merkle-listers.pdf}}} #+BEAMER: \only<2-3>{\includegraphics[width=\extblockscale{1.1\textwidth}]{swh-dataflow-merkle.pdf}} #+BEAMER: \end{center} #+BEAMER: \pause #+BEAMER: \pause Full development history *permanently archived* in a *uniform data model*. ** Meet the Software Heritage Identifiers (SWHIDs) \hfill [[https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html][(full spec)]] #+INCLUDE: "../../common/modules/swhid.org::#oneslide" :only-contents t ** "It's +Turtles+ SWHIDs all the way down" *** :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .3 :END: #+BEAMER: \centering \includegraphics[width=\linewidth]{git-merkle/merkle-vertical} *** :PROPERTIES: :BEAMER_env: block :BEAMER_COL: .73 :END: Reference */any/ source code artifact* that has ever been shared---source code file, tree, commit, release, repository state---using the same, standard identifier. #+BEAMER: \end{block} \begin{block}{} Try it out: #+BEAMER: \footnotesize #+BEGIN_SRC $ pip install swh.model $ swh identify /srv/src/linux/kernel/ swh:1:dir:b770a2aed8db52df737f88f18ca6bf39a1582240 #+END_SRC * swh-scanner ** Tech preview: swh-scanner #+BEAMER: \vspace{-1mm} *** Vision swh-scanner is an *open source* and *open data* code scanner for *open compliance* workflows, backed by the *largest archive* of free/open source software source code. #+BEAMER: \pause #+BEAMER: \vspace{-1mm} *** Design - query the Software Heritage archive as source of truth about public code - leverages the Merkle DAG model and SWHIDs for maximum scanning efficiency - e.g., no need to query the back-end for files contained in a known directory - file-level granularity - output: source tree partition into known (= published before) v. unknown ** swh-scanner --- Demo #+BEAMER: \scriptsize \vspace{-3mm} #+BEGIN_SRC $ pip install swh.scanner $ swh scanner scan -f json /srv/src/linux/kernel { [...] "/srv/src/linux/kernel/auditsc.c": { "known": true, "swhid": "swh:1:cnt:814406a35db163080bbf937524d63690861ff750" }, "/srv/src/linux/kernel/backtracetest.c": { "known": true, "swhid": "swh:1:cnt:a2a97fa3071b1c7ee6595d61a172f7ccc73ea40b" }, "/srv/src/linux/kernel/bounds.c": { "known": true, "swhid": "swh:1:cnt:9795d75b09b2323306ad6a058a6350a87a251443" }, "/srv/src/linux/kernel/bpf": { "known": true, "swhid": "swh:1:dir:fcd9987804d26274fee1eb6711fac38036ccaee7" }, "/srv/src/linux/kernel/capability.c": { "known": true, "swhid": "swh:1:cnt:1444f3954d750ba685b9423e94522e0243175f90" }, [...] } 0,53s user 0,61s system 145% cpu 1,867 total $ #+END_SRC ** swh-scanner --- Demo (cont.) #+BEAMER: \scriptsize #+BEGIN_SRC $ du -sh --exclude=.git /srv/src/linux 1,1G /srv/src/linux $ time swh scanner scan -f json -x *.git /srv/src/linux { [...] "/srv/src/linux/arch": { "known": true, "swhid": "swh:1:dir:590c329d3548b7d552fc913a51965353f01c9e2f" }, [...] "/srv/src/linux/scripts/kallsyms.c": { "known": true, "swhid": "swh:1:cnt:0096cd9653327584fe62ce56ba158c68875c5067" }, "/srv/src/linux/scripts/kconfig": { "known": false, "swhid": "swh:1:dir:548afc93bd01d2fba0dfcc0fd8c69f4b082ab8c6" }, "/srv/src/linux/scripts/kconfig/.conf.o.cmd": { "known": false, "swhid": "swh:1:cnt:0d8be19e430c082ece6a3803923ad6ecb9e7d413" }, [...] } 20,84s user 1,52s system 103% cpu 21,540 total $ #+END_SRC ** swh-scanner --- Demo (cont.) Interactive mode to drill-down and inspect unknown files: #+BEAMER: \footnotesize #+BEGIN_SRC $ swh scanner scan -f sunburst -x *.git /srv/src/linux #+END_SRC #+BEAMER: \begin{center} \includegraphics[width=0.6\linewidth]{swh-scanner-sunburst} \end{center} * Outlook ** swh-scanner --- Checklist *** Open Compliance #+BEAMER: \begin{itemize} #+BEAMER: \item[\OK] front-end: open source client, running locally on your code base #+BEAMER: \item[\OK] back-end: open data knowledge base, remote or self-hosted #+BEAMER: \end{itemize} #+BEAMER: \pause *** Practical needs #+BEAMER: \begin{itemize} #+BEAMER: \item[\OK] known/unknown information (has this been published before?) #+BEAMER: \item[\KO] license information #+BEAMER: \item[\KO] provenance information #+BEAMER: \item[\OK] file-level granularity #+BEAMER: \item[\KO] snippet-level granularity #+BEAMER: \item[\OK] knowledge-base coverage: all of +FOSS+ Software Heritage #+BEAMER: \end{itemize} ** swh-scanner --- Going further *** swh-scanner shows that /it is possible/ to create a source code scanner that is both open source and backed by the most comprehensive open data FOSS archive. #+BEAMER: \pause *** Roadmap swh-scanner is /not a production-ready scanner/. The following features are still missing: - license information \hfill $\to$ in-house scanning + ClearlyDefined - provenance information \hfill $\to$ Software Heritage crawling info - increase granularity to snippet/SLOC Some of these are low-hanging fruits, some require substantial R&D investments. #+BEAMER: \pause *** Feedback welcome - feel free to play with swh-scanner, feedback is very welcome! - caveat: intensive use will result in hitting the API rate-limit +** Getting involved + #+INCLUDE: "../../common/modules/swh-dig.org::#overview" :only-contents t :minlevel 3 ** Wrapping up #+BEGIN_EXPORT latex \begin{center} \includegraphics[width=.45\linewidth]{SWH-logo+motto.pdf}\\ \hfill \href{https://www.softwareheritage.org}{www.softwareheritage.org} \hfill \href{https://twitter.com/swheritage}{@swheritage} \hfill~ \end{center} #+END_EXPORT *** - *open compliance* is about FOSS management using /only/ open technology - we still lack a *fully open*---open source, backed by an open data knowledge base---*source code scanner* for open compliance toolchains - *swh-scanner* is a /prototype scanner/ showing that it is possible, today, to develop such a scanner, building on *Software Heritage* as an extensive knowledge base - swh-scanner is not an industry-ready scanner, but might become one; its architecture and components can be *reused elsewhere* #+BEAMER: \vspace{-2mm} *** Contacts [[https://upsilon.cc/~zack/][Stefano Zacchiroli]] / [[mailto:zack@upsilon.cc][zack@upsilon.cc]] / [[https://twitter.com/zacchiro][@zacchiro]] / [[https://mastodon.xyz/@zacchiro][@zacchiro@mastodon.xyz]] * Appendix :B_appendix: :PROPERTIES: :BEAMER_env: appendix :END: ** Complete Corresponding Source (CCS) hosting #+INCLUDE: "../../common/modules/deposit.org::#ccs-compliance" :only-contents t ** Depositing source code to Software Heritage #+INCLUDE: "../../common/modules/deposit.org::#overview" :only-contents t ** Web UI --- Browse the Great Library of Source Code # #+INCLUDE: "../../common/modules/webui.org::#oneslide" :only-contents t #+latex: \centering \vspace{-3mm} #+ATTR_LATEX: :width .9\textwidth file:webui-source-file.png #+latex: \centering \vspace{-2mm} https://archive.softwareheritage.org / ** Web API --- Integrate your tools with the Software Heritage archive #+INCLUDE: "../../common/modules/status-extended.org::#apiintro" :only-contents t