diff --git a/talks-public/2021-04-07-llw/2021-04-07-llw.org b/talks-public/2021-04-07-llw/2021-04-07-llw.org new file mode 100644 index 0000000..1178e7b --- /dev/null +++ b/talks-public/2021-04-07-llw/2021-04-07-llw.org @@ -0,0 +1,372 @@ +#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) +#+TITLE: Towards an Open Data and Open Source Code Scanner +#+SUBTITLE: for your Open Compliance +#+BEAMER_HEADER: \date[7 Apr 2021, LLW]{7 April 2021\\Legal \& Licensing Workshop\\online\\[-2ex]} +#+AUTHOR: Stefano Zacchiroli +#+DATE: 7 April 2021 +#+EMAIL: zack@upsilon.cc + +#+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 +#+INCLUDE: "../../common/modules/169.org" +#+BEAMER_HEADER: \institute[Software Heritage]{Software Heritage --- {\tt zack@upsilon.cc, @zacchiro}} +#+BEAMER_HEADER: \author{Stefano Zacchiroli} + +#+LATEX_HEADER_EXTRA: \usepackage{pifont} +#+LATEX_HEADER_EXTRA: \usepackage{xspace} +#+LATEX_HEADER_EXTRA: \def\OK{\mbox{\ding{51}}\xspace} +#+LATEX_HEADER_EXTRA: \def\KO{\mbox{\ding{55}}\xspace} + +** About the speaker + #+INCLUDE: "this/zack.org::#bio" :only-contents t +* Open Compliance +** Open Compliance + # (Open Compliance, /noun/ --- the reason we are gathered here today)\\ + # More seriously, here is + My own take on a comprehensive definition of our shared interests: + #+BEAMER: \begin{definition}[Open Compliance] + The *pursuit of compliance* with /license obligations/ and other /best + practices/ for the management of open source software components *using only + open technology*, such as: _open source_ software, _open data_ information, + and _open access_ documentation. + #+BEAMER: \end{definition} +*** Why + - Reduced lock-in risks + - Lower total cost of ownership (TCO) + - Allow to crowdsource expensive compliance steps (e.g., scanning, + curation) + - Aligned with the ethos of free/open source software (FOSS) communities +*** + Long-discussed in FOSS compliance circles. Many well-established + collaboration initiatives: Open Source Tooling Group, Open Compliance + Program, Double Open, ... + +** Reuse is the new rule ... \hfill ... KYSW is coming! +*** Reuse is the new rule + 80% to 90% of a new application is ... just reuse! \hfill (Sonatype survey, 2017) + #+BEAMER: \pause +*** Where does reused software come from? :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .5 + :END: + #+BEGIN_EXPORT latex + \begin{center} + \includegraphics[width=.7\linewidth]{myriadsources} + \end{center} + #+END_EXPORT + #+BEAMER: \pause +*** Do /you/ know where it comes from? :B_block: + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .4 + :END: + - the software you ship + - the software you use + - the software you acquire + - the software that + + has that bug + + has that vulnerability +*** :B_ignoreheading: + :PROPERTIES: + :BEAMER_env: ignoreheading + :END: +#+BEAMER: \pause +*** KYSW: Know Your SoftWare + \hfill Like KYC in banking, KYSW is now essential all over IT + +** Anatomy of a KYSW toolchain + #+BEAMER: \begin{center}{\includegraphics[width=0.8\textwidth]{compliance-toolchain}}\end{center} + #+BEAMER: {\tiny \vspace{-1mm} + source: [[https://upsilon.cc/~zack/talks/2016/2016-01-31-fosdem-compliance.pdf][/A Community Take on the License Compliance Industry/]], Stefano + Zacchiroli, FOSDEM 2016, Legal and Policy Issues devroom, + https://upsilon.cc/~zack/talks/2016/2016-01-31-fosdem-compliance.pdf + #+BEAMER: } +*** + A *code scanner* is the key ingredient of all KYSW toolchains: it scans a + local source code base and compares it to a FOSS knowledge base, + summarizing findings. (We will ignore other features for the purpose of + this talk.) + +** An Open Compliance Source Code Scanner --- Requirements +*** Be Open Compliance-...compliant + - front-end: open source client, running locally on your code base + - back-end: open data knowledge base, remote or self-hosted +*** Practical needs + - known/unknown information (has this been published before?) + - license information + - provenance information + - scanning granularity: both file-level and snippet-level + - knowledge-base coverage: cover all of FOSS + #+BEAMER: \pause +*** + #+BEAMER: \bfseries + Claim: we still lack a source code scanning tool that is compliant with + Open Compliance principles and addresses industry practical needs. + +* Software Heritage +** Software Heritage, in a nutshell \hfill www.softwareheritage.org + #+BEAMER: \transdissolve + #+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :only-contents t :minlevel 3 + +** A principled infrastructure \hfill \url{http://bit.ly/swhpaper} + #+latex: \begin{center} + #+ATTR_LATEX: :width 0.8\linewidth + file:SWH-as-foundation-slim.png + #+latex: \end{center} + #+BEAMER: \pause +*** Technology + :PROPERTIES: + :BEAMER_col: 0.34 + :BEAMER_env: block + :END: + - transparency and FOSS + - replicas all the way down +*** Content + :PROPERTIES: + :BEAMER_col: 0.32 + :BEAMER_env: block + :END: + - intrinsic identifiers + - facts and provenance +*** Organization + :PROPERTIES: + :BEAMER_col: 0.33 + :BEAMER_env: block + :END: + - non-profit + - mirror network +** An international, non profit initiative\hfill built for the long term + :PROPERTIES: + :CUSTOM_ID: support + :END: +*** Sharing the vision :B_block: + :PROPERTIES: + :CUSTOM_ID: endorsement + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + #+LATEX: \begin{center}{\includegraphics[width=\extblockscale{.4\linewidth}]{unesco_logo_en_285}}\end{center} + #+LATEX: \vspace{-0.8cm} + #+LATEX: \begin{center}\vskip 1em \includegraphics[width=\extblockscale{1.4\linewidth}]{support.pdf}\end{center} + #+latex: \small And many more ...\\ + #+latex:\mbox{}~~~~~~~\tiny\url{www.softwareheritage.org/support/testimonials} +#+BEAMER: \pause +*** Donors, members, sponsors :B_block: + :PROPERTIES: + :CUSTOM_ID: sponsors + :BEAMER_COL: .5 + :BEAMER_env: block + :END: + #+LATEX: \begin{center}\includegraphics[width=\extblockscale{.4\linewidth}]{inria-logo-new}\end{center} + #+LATEX: \begin{center} + # #+LATEX: \includegraphics[width=\extblockscale{.2\linewidth}]{sponsors-levels.pdf} + #+LATEX: \colorbox{white}{\includegraphics[width=\extblockscale{1.4\linewidth}]{sponsors.pdf}} + #+LATEX: \end{center} +# - sponsoring / partnership :: \hfill \url{sponsorship.softwareheritage.org} +** The largest free/open source software archive + #+INCLUDE: "../../common/modules/status-extended.org::#archive" :only-contents t :minlevel 3 +** Automation, and storage + #+BEAMER: \begin{center} + #+BEAMER: \mode{\only<1>{\includegraphics[width=\extblockscale{1.1\textwidth}]{swh-dataflow-merkle-listers.pdf}}} + #+BEAMER: \only<2-3>{\includegraphics[width=\extblockscale{1.1\textwidth}]{swh-dataflow-merkle.pdf}} + #+BEAMER: \end{center} + #+BEAMER: \pause + #+BEAMER: \pause + Full development history *permanently archived* in a *uniform data model*. +** Meet the Software Heritage Identifiers (SWHIDs) \hfill [[https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html][(full spec)]] + #+INCLUDE: "../../common/modules/swhid.org::#oneslide" :only-contents t +** "It's +Turtles+ SWHIDs all the way down" +*** + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .3 + :END: + #+BEAMER: \centering \includegraphics[width=\linewidth]{git-merkle/merkle-vertical} +*** + :PROPERTIES: + :BEAMER_env: block + :BEAMER_COL: .73 + :END: + Reference */any/ source code artifact* that has ever been shared---source + code file, tree, commit, release, repository state---using the same, + standard identifier. + #+BEAMER: \end{block} \begin{block}{} + Try it out: + #+BEAMER: \footnotesize + #+BEGIN_SRC + $ pip install swh.model + $ swh identify /srv/src/linux/kernel/ + swh:1:dir:b770a2aed8db52df737f88f18ca6bf39a1582240 + #+END_SRC +* swh-scanner +** Tech preview: swh-scanner + #+BEAMER: \vspace{-1mm} +*** Vision + swh-scanner is an *open source* and *open data* code scanner for *open + compliance* workflows, backed by the *largest archive* of free/open source + software source code. + + #+BEAMER: \pause + #+BEAMER: \vspace{-1mm} +*** Design + - query the Software Heritage archive as source of truth about public code + - leverages the Merkle DAG model and SWHIDs for maximum scanning efficiency + - e.g., no need to query the back-end for files contained in a known + directory + - file-level granularity + - output: source tree partition into known (= published before) v. unknown + +** swh-scanner --- Demo + #+BEAMER: \scriptsize \vspace{-3mm} + #+BEGIN_SRC + $ pip install swh.scanner + + $ swh scanner scan -f json /srv/src/linux/kernel + { + [...] + "/srv/src/linux/kernel/auditsc.c": { + "known": true, + "swhid": "swh:1:cnt:814406a35db163080bbf937524d63690861ff750" }, + "/srv/src/linux/kernel/backtracetest.c": { + "known": true, + "swhid": "swh:1:cnt:a2a97fa3071b1c7ee6595d61a172f7ccc73ea40b" }, + "/srv/src/linux/kernel/bounds.c": { + "known": true, + "swhid": "swh:1:cnt:9795d75b09b2323306ad6a058a6350a87a251443" }, + "/srv/src/linux/kernel/bpf": { + "known": true, + "swhid": "swh:1:dir:fcd9987804d26274fee1eb6711fac38036ccaee7" }, + "/srv/src/linux/kernel/capability.c": { + "known": true, + "swhid": "swh:1:cnt:1444f3954d750ba685b9423e94522e0243175f90" }, + [...] + } + 0,53s user 0,61s system 145% cpu 1,867 total + $ + #+END_SRC + +** swh-scanner --- Demo (cont.) + #+BEAMER: \scriptsize + #+BEGIN_SRC + $ du -sh --exclude=.git /srv/src/linux + 1,1G /srv/src/linux + + $ time swh scanner scan -f json -x *.git /srv/src/linux + { + [...] + "/srv/src/linux/arch": { + "known": true, + "swhid": "swh:1:dir:590c329d3548b7d552fc913a51965353f01c9e2f" }, + [...] + "/srv/src/linux/scripts/kallsyms.c": { + "known": true, + "swhid": "swh:1:cnt:0096cd9653327584fe62ce56ba158c68875c5067" }, + "/srv/src/linux/scripts/kconfig": { + "known": false, + "swhid": "swh:1:dir:548afc93bd01d2fba0dfcc0fd8c69f4b082ab8c6" }, + "/srv/src/linux/scripts/kconfig/.conf.o.cmd": { + "known": false, + "swhid": "swh:1:cnt:0d8be19e430c082ece6a3803923ad6ecb9e7d413" }, + [...] + } + 20,84s user 1,52s system 103% cpu 21,540 total + $ + #+END_SRC + +** swh-scanner --- Demo (cont.) + Interactive mode to drill-down and inspect unknown files: + #+BEAMER: \footnotesize + #+BEGIN_SRC + $ swh scanner scan -f sunburst -x *.git /srv/src/linux + #+END_SRC + #+BEAMER: \begin{center} \includegraphics[width=0.6\linewidth]{swh-scanner-sunburst} \end{center} + +* Outlook +** swh-scanner --- Checklist +*** Open Compliance + #+BEAMER: \begin{itemize} + #+BEAMER: \item[\OK] + front-end: open source client, running locally on your code base + #+BEAMER: \item[\OK] + back-end: open data knowledge base, remote or self-hosted + #+BEAMER: \end{itemize} + #+BEAMER: \pause +*** Practical needs + #+BEAMER: \begin{itemize} + #+BEAMER: \item[\OK] + known/unknown information (has this been published before?) + #+BEAMER: \item[\KO] + license information + #+BEAMER: \item[\KO] + provenance information + #+BEAMER: \item[\OK] + file-level granularity + #+BEAMER: \item[\KO] + snippet-level granularity + #+BEAMER: \item[\OK] + knowledge-base coverage: all of +FOSS+ Software Heritage + #+BEAMER: \end{itemize} + +** swh-scanner --- Going further +*** + swh-scanner shows that /it is possible/ to create a source code scanner + that is both open source and backed by the most comprehensive open data + FOSS archive. + + #+BEAMER: \pause +*** Roadmap + swh-scanner is /not a production-ready scanner/. The following features are + still missing: + - license information \hfill $\to$ in-house scanning + ClearlyDefined + - provenance information \hfill $\to$ Software Heritage crawling info + - increase granularity to snippet/SLOC + Some of these are low-hanging fruits, some require substantial R&D + investments. + + #+BEAMER: \pause +*** Feedback welcome + - feel free to play with swh-scanner, feedback is very welcome! + - caveat: intensive use will result in hitting the API rate-limit + +** Wrapping up + #+BEGIN_EXPORT latex + \begin{center} + \includegraphics[width=.45\linewidth]{SWH-logo+motto.pdf}\\ + \hfill \href{https://www.softwareheritage.org}{www.softwareheritage.org} + \hfill \href{https://twitter.com/swheritage}{@swheritage} \hfill~ + \end{center} + #+END_EXPORT +*** + - *open compliance* is about FOSS management using /only/ open technology + - we still lack a *fully open*---open source, backed by an open data + knowledge base---*source code scanner* for open compliance toolchains + - *swh-scanner* is a /prototype scanner/ showing that it is possible, + today, to develop such a scanner, building on *Software Heritage* as an + extensive knowledge base + - swh-scanner is not an industry-ready scanner, but might become one; its + architecture and components can be *reused elsewhere* + #+BEAMER: \vspace{-2mm} +*** Contacts + [[https://upsilon.cc/~zack/][Stefano Zacchiroli]] / [[mailto:zack@upsilon.cc][zack@upsilon.cc]] / [[https://twitter.com/zacchiro][@zacchiro]] / [[https://mastodon.xyz/@zacchiro][@zacchiro@mastodon.xyz]] +* Appendix :B_appendix: + :PROPERTIES: + :BEAMER_env: appendix + :END: +** Complete Corresponding Source (CCS) hosting + #+INCLUDE: "../../common/modules/deposit.org::#ccs-compliance" :only-contents t + +** Depositing source code to Software Heritage + #+INCLUDE: "../../common/modules/deposit.org::#overview" :only-contents t + +** Web UI --- Browse the Great Library of Source Code + # #+INCLUDE: "../../common/modules/webui.org::#oneslide" :only-contents t + #+latex: \centering \vspace{-3mm} + #+ATTR_LATEX: :width .9\textwidth + file:webui-source-file.png + + #+latex: \centering \vspace{-2mm} + https://archive.softwareheritage.org / + +** Web API --- Integrate your tools with the Software Heritage archive + #+INCLUDE: "../../common/modules/status-extended.org::#apiintro" :only-contents t + diff --git a/talks-public/2021-04-07-llw/Makefile b/talks-public/2021-04-07-llw/Makefile new file mode 100644 index 0000000..68fbee7 --- /dev/null +++ b/talks-public/2021-04-07-llw/Makefile @@ -0,0 +1 @@ +include ../Makefile.slides diff --git a/talks-public/2021-04-07-llw/this/zack.org b/talks-public/2021-04-07-llw/this/zack.org new file mode 100644 index 0000000..d2e2011 --- /dev/null +++ b/talks-public/2021-04-07-llw/this/zack.org @@ -0,0 +1,12 @@ + +** Short Bio: Stefano Zacchiroli + :PROPERTIES: + :CUSTOM_ID: bio + :END: +*** + - Associate Professor of Computer Science, Université de Paris, on leave at + Inria + - Free/Open Source Software activist (20+ years) + - Debian Developer & Former 3x Debian Project Leader + - Former Open Source Initiative (OSI) director + - Software Heritage co-founder & CTO