diff --git a/talks-public/2020-swh-team-onboarding/2020-swh-team-onboarding.org b/talks-public/2020-swh-team-onboarding/2020-swh-team-onboarding.org index e829f8f..9dbfb1a 100644 --- a/talks-public/2020-swh-team-onboarding/2020-swh-team-onboarding.org +++ b/talks-public/2020-swh-team-onboarding/2020-swh-team-onboarding.org @@ -1,187 +1,234 @@ #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt) #+TITLE: Software Heritage #+SUBTITLE: Welcome on Board! #+BEAMER_HEADER: \date[2020-09-01, Paris]{1 September 2020\\Inria Paris} #+AUTHOR: The Software Heritage team #+DATE: 1 September 2020 #+INCLUDE: "../../common/modules/prelude-toc.org" :minlevel 1 #+INCLUDE: "../../common/modules/169.org" # Syntax highlighting setup #+LATEX_HEADER_EXTRA: \usepackage{minted} #+LaTeX_HEADER_EXTRA: \usemintedstyle{tango} #+LaTeX_HEADER_EXTRA: \newminted{sql}{fontsize=\scriptsize} #+name: setup-minted #+begin_src emacs-lisp :exports results :results silent (setq org-latex-listings 'minted) (setq org-latex-minted-options '(("fontsize" "\\scriptsize"))) (setq org-latex-to-pdf-process '("pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f" "pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f" "pdflatex -shell-escape -interaction nonstopmode -output-directory %o %f")) #+end_src # End syntax highlighting setup * Project overview ** Software Heritage in a nutshell \hfill [[https://softwareheritage.org][softwareheritage.org]] #+INCLUDE: "../../common/modules/swh-goals-oneslide-vertical.org::#goals" :only-contents t :minlevel 3 ** An international, non profit initiative\hfill built for the long term :PROPERTIES: :CUSTOM_ID: support :END: *** Sharing the vision :B_block: :PROPERTIES: :CUSTOM_ID: endorsement :BEAMER_COL: .5 :BEAMER_env: block :END: #+LATEX: \begin{center}{\includegraphics[width=\extblockscale{.4\linewidth}]{unesco_logo_en_285}}\end{center} #+LATEX: \vspace{-0.8cm} #+LATEX: \begin{center}\vskip 1em \includegraphics[width=\extblockscale{1.4\linewidth}]{support.pdf}\end{center} #+latex:\mbox{}~~~~~~~\tiny\url{www.softwareheritage.org/support/testimonials} *** Donors, members, sponsors :B_block: :PROPERTIES: :CUSTOM_ID: sponsors :BEAMER_COL: .5 :BEAMER_env: block :END: #+LATEX: \begin{center}\includegraphics[width=\extblockscale{.4\linewidth}]{inria-logo-new}\end{center} #+LATEX: \begin{center} #+LATEX: \colorbox{white}{\includegraphics[width=\extblockscale{1.4\linewidth}]{sponsors.pdf}} #+latex:\mbox{}~~~~~~~\tiny\url{www.softwareheritage.org/support/sponsors} #+LATEX: \end{center} ** Status :B_ignoreheading: :PROPERTIES: :BEAMER_env: ignoreheading :END: #+INCLUDE: "../../common/modules/status-extended.org::#archivinggoals" :minlevel 2 #+INCLUDE: "../../common/modules/status-extended.org::#architecture" :minlevel 2 :only-contents t #+INCLUDE: "../../common/modules/status-extended.org::#merkletree" :minlevel 2 #+INCLUDE: "../../common/modules/status-extended.org::#datamodel" :minlevel 2 :only-contents t #+INCLUDE: "../../common/modules/status-extended.org::#dagdetailsmall" :minlevel 2 :only-contents t #+INCLUDE: "../../common/modules/status-extended.org::#archive" :minlevel 2 * Software stack -** to be written - TODO (=douardda= in charge of a first draft) +** Overall architecture +*** It's just a (big) database +2 parts: +- the object storage: swh-objstorage + - store blob objects + - content addressable + - typically accessed via an HTTP RPC API + - multiple backends supported (local FS, S3, Azure, Ceph, ...) +- the graph storage: swh-storage + - stores the Merkle DAG (+ other things) + - provide access to the data according the data model declared in swh.model + - typically accessed via an HTTP RPC API + - multiple backend supported (postgresql, casssandra) + +** Overall architecture +*** It's an append-only database +- we never (almost) modify entries in the main database +- both storages (obj and graph) are expected to be idempotent + +** Overall architecture +*** With a bit of tooling +- A frontend interface: swh-web + - Provides both the main GUI and a public (REST-like) API +- A scrapping scaffolding + - Listers: look for origins to ingest + - Loaders: ingest origins +- An indexing machine + - Crunches objects in the archive and generates metadata +- A metadata storage + - Heavily under construction + - Important distinction between intrinsic and extrinsic metadata + +** Overall architecture +*** Using a bit of code +#+BEAMER: \vspace{1mm} +#+BEAMER: \centering \includegraphics[width=\extblockscale{1.4\linewidth}]{swh-modules-deps-internal} + +Actually it's not so big: +- ~20ksloc of python3 +- ~80 python dependencies +- a bunch of js +- ... keep it as simple as possible, but no simpler... (almost) + +** The big picture +#+BEAMER: \vspace{1mm} +#+BEAMER: \centering \includegraphics[height=.9\textheight]{general-architecture} +https://docs.softwareheritage.org/devel/architecture.html + +#+INCLUDE: "../../common/modules/status-extended.org::#swstack" :minlevel 2 * Development workflow ** Starting points *** Development documentation https://docs.softwareheritage.org/devel/ - in particular, Developer setup: https://docs.softwareheritage.org/devel/developer-setup.html - i.e.: virtualenv + pip + tox *** "Software Development" pages on the public wiki https://wiki.softwareheritage.org/wiki/Category:Software_development (most of these will be covered in the following) ** Development forge #+BEAMER: \vspace{-2mm} *** Phabricator https://forge.softwareheritage.org/ - all development activities happen here - take the time to get familiar and become efficient using Phabricator #+BEAMER: \vspace{-2mm} *** The classics - VCS: Git, with repo browsing using Diffusion https://forge.softwareheritage.org/diffusion/ - Tasks and Bugs: Maniphest https://forge.softwareheritage.org/maniphest/ - one project tag for each software product, e.g., Git Loader: https://forge.softwareheritage.org/project/view/17/ - we use task priorities, assignees, and tags (not much the per-product kanban boards) - visibility: all dev tasks are public (they can be made private moving them to the space "S2: Staff", but it has never happened) - you will need one task associated to each planned dev activity ** Development forge (cont.) *** The classics (cont.) - Code review: Differential https://forge.softwareheritage.org/differential/ (more on this later) - Communication - English - day-by-day: in the relevant task on the forge - async: swh-devel mailing list https://sympa.inria.fr/sympa/info/swh-devel - sync: IRC, #swh-devel channel on FreeNode - ad-hoc pokes (in person) or calls (remote), as needed ** Code reviews code review is not mandatory, but recommended in most cases - guidelines https://wiki.softwareheritage.org/wiki/Code_review - technical setup: https://wiki.softwareheritage.org/wiki/Code_review_in_Phabricator ** QA: linting and testing - most of the code we write is Python. Python code is all Python 3 - code formatting: fully automated via black https://black.readthedocs.io/ - code linting / static analysis - flake8 https://flake8.pycqa.org/ (usually trivial, thanks to black) - type checking via mypy http://mypy-lang.org/ - WIP, type coverage vary depending on the module - rule of thumb: always type new code; opportunistically add typing to old code - code testing - pytest - code coverage goal: >= 80% SLOCs of each module - do try this +at home+ locally (e.g., before pushing a diff or commit) #+BEGIN_EXAMPLE $ tox #+END_EXAMPLE ** Continuous integration - Jenkins: https://jenkins.softwareheritage.org/ - integrated with Phabricator 1) CI runs on each submitted diff, reporting back results in the diff 2) CI runs on each landed commit, notifying author in case of failures 3) CI runs daily on the entire software stack, notifying #swh-devel of failures ** Style guidelines - Git style guide [[https://wiki.softwareheritage.org/wiki/Git_style_guide][=wiki.softwareheritage.org/wiki/Git_style_guide=]] - Python style guide [[https://wiki.softwareheritage.org/wiki/Python_style_guide][=wiki.softwareheritage.org/wiki/Python_style_guide=]] - check for adherence to these during code reviews ! + check for adherence to these during code reviews! * Infrastructure ** to be written TODO (=olasd= in charge of a first draft) ** TODO Deployment of SWH Python modules Zack: this can go either here or before under development workflow. From the point of view of developers we need to say that =git tag= is enough; from the point of view of the sysadm we can add more details about what happens behind the scene. * AdministraInriatrivia ** Onboarding procedure -*** +*** https://intranet.softwareheritage.org/wiki/Onboarding -*** +*** - let's walk through it together... - a team member will be assigned as your mentor to complete it ** Team charter -*** +*** https://intranet.softwareheritage.org/wiki/Team_charter *** Highlights - weekly meeting(s) - public weekly reporting - persistent IRC connection ** TODO to be completed what else should be added to this section? * Appendix :B_appendix: :PROPERTIES: :BEAMER_env: appendix :END: