Page MenuHomeSoftware Heritage

docs: remove PostgreSQL local setup
ClosedPublic

Authored by seirl on Apr 26 2022, 4:08 PM.

Details

Reviewers
zack
Group Reviewers
Reviewers
Summary

We no longer support exporting the dataset as PostgreSQL dumps. It's
pretty much useless for big data analysis; we rather encourage
researchers to use big data engine (Hadoop, Hive, Presto...) which all
support the ORC format.

Alternatives to import the dataset on PostgreSQL include
https://github.com/HighgoSoftware/orc_fdw/ , or using the swh mirroring
pipeline to spawn a local storage instance with a PostgreSQL backend.

Diff Detail

Repository
rDDATASET Datasets
Branch
master
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 28840
Build 45070: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 45069: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D7686 (id=27781)

Rebasing onto 230829c69b...

First, rewinding head to replay your work on top of it...
Applying: docs: remove PostgreSQL local setup
Changes applied before test
commit 40c61745d2647f8a77fe946c274f7c2b7b48ad9a
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Tue Apr 26 16:02:46 2022 +0200

    docs: remove PostgreSQL local setup
    
    We no longer support exporting the dataset as PostgreSQL dumps. It's
    pretty much useless for big data analysis; we rather encourage
    researchers to use big data engine (Hadoop, Hive, Presto...) which all
    support the ORC format.
    
    Alternatives to import the dataset on PostgreSQL include
    https://github.com/HighgoSoftware/orc_fdw/ , or using the swh mirroring
    pipeline to spawn a local storage instance with a PostgreSQL backend.

See https://jenkins.softwareheritage.org/job/DDATASET/job/tests-on-diff/137/ for more details.

seirl requested review of this revision.Apr 26 2022, 4:10 PM
This revision is now accepted and ready to land.Apr 26 2022, 4:36 PM