Page MenuHomeSoftware Heritage

2015-11-27-seqbio-orsay.org
No OneTemporary

2015-11-27-seqbio-orsay.org

#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %10BEAMER_act(Act) %4BEAMER_col(Col) %10BEAMER_extra(Extra) %8BEAMER_opt(Opt)
#+TITLE: Finding a needle in a (source code) haystack
#+AUTHOR: Roberto Di Cosmo
#+DATE: 27 Nov 2015
#+EMAIL: roberto@dicosmo.org
#+DESCRIPTION: Preserving the technological knowledge of mankind
#+KEYWORDS: software heritage legacy preservation knowledge mankind technology
#+BEAMER_HEADER: \title[Finding a needle in a source code haystack]{Finding a needle in a source code haystack\\a challenge\\{\large SeqBio 2015}}
#
# Prelude contains all the information needed to export the main beamer latex source
#
#+INCLUDE: "prelude.org" :minlevel 1
* An old problem
** Code search
*** A natural need
:PROPERTIES:
:BEAMER_act: +-
:END:
- Find the definition of a function/class/procedure/type/structure
- Search examples of code usage in an archive of source code
- you name it...
*** A natural approach
:PROPERTIES:
:BEAMER_act: +-
:END:
- Regular expressions
*** We have all used /grep/ since the 1970's!
:PROPERTIES:
:BEAMER_act: +-
:END:
\hfill where is the challenge?
* A new challenge
** Size matters!
How do we search in /millions/ of source code files?
*** Google code search
:PROPERTIES:
:BEAMER_act: +-
:END:
open 2006, closed 2011 see https://swtch.com/~rsc/regexp/regexp4.html
*** Debian code search
:PROPERTIES:
:BEAMER_act: +-
:END:
open 2013 see http://sources.debian.net/
*** how
:PROPERTIES:
:BEAMER_act: +-
:END:
- inverted index of trigrams from source files
- map regexps to trigrams
- filter source files that may match
- run grep on each file (using the cloud)
*** performance
:PROPERTIES:
:BEAMER_act: +-
:END:
scaled reasonably well up to /1 billion lines of codes/
** Size Matters!
*** what about /all the source code/ in the world?
:PROPERTIES:
:BEAMER_act: +-
:END:
#+latex: \begin{center}
#+ATTR_LATEX: :width \extblockscale{.9\linewidth}
file:swh-archive-stats.png
#+latex: \end{center}
- That's roughly /one hundred billions/ single lines of code...
- And it's only the beginning!
* Looking for a new approach
** We need your help
*** Wanna join?
- a post-doc position is open at INRIA in Paris\\
deadline: /7 december 2015/ !!!
- we look forward to future collaborations
*** contact
\hfill roberto@dicosmo.org\\
\hfill http://www.dicosmo.org

File Metadata

Mime Type
text/x-tex
Expires
Fri, Jul 4, 4:31 PM (2 w, 4 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3243996

Event Timeline