Page MenuHomeSoftware Heritage

converters: Write object_bytes and raw_manifest on revisions and releases
ClosedPublic

Authored by vlorentz on Jan 7 2022, 5:28 PM.

Details

Summary

This allows representing all git objects instead of rejecting
objects that do not fit in our "normal" data model.

This diff is restricted to revisions and releases for now,
a future diff will add directories.

Depends on D6890.

Test Plan

will fail until D6890 is released.

Diff Detail

Repository
rDLDG Git loader
Branch
weird-git
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 25948
Build 40552: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 40551: arc lint + arc unit

Event Timeline

vlorentz published this revision for review.Jan 7 2022, 5:28 PM

Build has FAILED

Patch application report for D6894 (id=25005)

Rebasing onto 0cc96c25ab...

Current branch diff-target is up to date.
Changes applied before test
commit cecc39e23675b249b00d89048ab7ab86bb12a565
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 7 17:27:10 2022 +0100

    converters: Write object_bytes and raw_manifest.
    
    This allows representing all git objects instead of rejecting
    objects that do not fit in our "normal" data model.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/159/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/159/console

Build has FAILED

Patch application report for D6894 (id=25006)

Rebasing onto 0cc96c25ab...

Current branch diff-target is up to date.
Changes applied before test
commit 380504edc486914d5cddec17dab234ddbc55daa8
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 7 17:27:10 2022 +0100

    converters: Write object_bytes and raw_manifest.
    
    This allows representing all git objects instead of rejecting
    objects that do not fit in our "normal" data model.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/160/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/160/console

Should we submit a change to dulwich to get access to, at least, the raw author/committer lines (or even better raw unparsed headers)?

requirements-swh.txt
3

Spurious v

swh/loader/git/converters.py
285

s/Adding/Recording/ ?

Build has FAILED

Patch application report for D6894 (id=25014)

Rebasing onto 0cc96c25ab...

Current branch diff-target is up to date.
Changes applied before test
commit 0ae25b4d7055a369a10e7a8754cccf3c00ab8b22
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 7 17:27:10 2022 +0100

    converters: Write object_bytes and raw_manifest.
    
    This allows representing all git objects instead of rejecting
    objects that do not fit in our "normal" data model.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/161/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/161/console

I don't know, this seems like a very niche feature

Build has FAILED

Patch application report for D6894 (id=25014)

Rebasing onto 0cc96c25ab...

Current branch diff-target is up to date.
Changes applied before test
commit 0ae25b4d7055a369a10e7a8754cccf3c00ab8b22
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 7 17:27:10 2022 +0100

    converters: Write object_bytes and raw_manifest.
    
    This allows representing all git objects instead of rejecting
    objects that do not fit in our "normal" data model.

Link to build: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/162/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/162/console

Build is green

Patch application report for D6894 (id=25041)

Rebasing onto 4dfd2ed12f...

Current branch diff-target is up to date.
Changes applied before test
commit 96929ff0e50a219b51e183ebcb647be0a4c59239
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 7 17:27:10 2022 +0100

    converters: Write object_bytes and raw_manifest.
    
    This allows representing all git objects instead of rejecting
    objects that do not fit in our "normal" data model.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/171/ for more details.

actually implement this for commits as well

Build is green

Patch application report for D6894 (id=25073)

Rebasing onto 64050aa08b...

First, rewinding head to replay your work on top of it...
Applying: converters: Write object_bytes and raw_manifest.
Changes applied before test
commit 85afafeb03e70d8dba097a792467adf21d52c292
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 7 17:27:10 2022 +0100

    converters: Write object_bytes and raw_manifest.
    
    This allows representing all git objects instead of rejecting
    objects that do not fit in our "normal" data model.
    
    This commit is restricted to revisions and releases for now,
    a future commit will add directories.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/173/ for more details.

vlorentz retitled this revision from converters: Write object_bytes and raw_manifest. to converters: Write object_bytes and raw_manifest on revisions and releases.Jan 12 2022, 11:56 AM

Build is green

Patch application report for D6894 (id=25082)

Rebasing onto 64050aa08b...

First, rewinding head to replay your work on top of it...
Applying: converters: Write object_bytes and raw_manifest on revisions and releases
Changes applied before test
commit 36ab86c7fc17052b4ad69e50d19f10a326dc8e16
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 7 17:27:10 2022 +0100

    converters: Write object_bytes and raw_manifest on revisions and releases
    
    This allows representing all git objects instead of rejecting
    objects that do not fit in our "normal" data model.
    
    This commit is restricted to revisions and releases for now,
    a future commit will add directories.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/174/ for more details.

This all looks fine to me, thanks!

swh/loader/git/converters.py
142–144

So, we'll be dropping these arguments once we drop them off of storage, correct?

This revision is now accepted and ready to land.Jan 17 2022, 4:22 PM
swh/loader/git/converters.py
142–144

yes

Build is green

Patch application report for D6894 (id=25226)

Rebasing onto 64050aa08b...

Current branch diff-target is up to date.
Changes applied before test
commit 44cc7d083327449594e93e8ad2d01abb6982be82
Author: Valentin Lorentz <vlorentz@softwareheritage.org>
Date:   Fri Jan 7 17:27:10 2022 +0100

    converters: Write object_bytes and raw_manifest on revisions and releases
    
    This allows representing all git objects instead of rejecting
    objects that do not fit in our "normal" data model.
    
    This commit is restricted to revisions and releases for now,
    a future commit will add directories.

See https://jenkins.softwareheritage.org/job/DLDG/job/tests-on-diff/180/ for more details.