Page MenuHomeSoftware Heritage
Feed Advanced Search

Oct 22 2018

vlorentz added a comment to T1237: Update mappings to CodeMeta crosswalk table .

Do you mind if I claim this task?

Oct 22 2018, 1:54 PM · Metadata workflow, Indexer
vlorentz closed T1231: Create an origin indexer that lists the most recent revision in HEAD branch as Resolved.

Done in D490.

Oct 22 2018, 12:39 PM · Indexer, Metadata workflow
vlorentz closed T1231: Create an origin indexer that lists the most recent revision in HEAD branch, a subtask of T1228: Create a component that sends a list of revision sha1 to the metadataRevisionIndexer, as Resolved.
Oct 22 2018, 12:39 PM · Indexer, Metadata workflow

Oct 19 2018

ardumont edited subtasks for T359: Indexers: batch content analyzer infrastructure, added: T1227: General improvments of the indexer: Schedule indexer tasks; removed: T991: Indexers: Send range of ids instead of list of ids.
Oct 19 2018, 8:47 AM · Indexer, General
ardumont added a parent task for T1227: General improvments of the indexer: Schedule indexer tasks: T359: Indexers: batch content analyzer infrastructure.
Oct 19 2018, 8:47 AM · Indexer, Scheduling utilities
ardumont removed a parent task for T991: Indexers: Send range of ids instead of list of ids: T359: Indexers: batch content analyzer infrastructure.
Oct 19 2018, 8:47 AM · Indexer
ardumont renamed T359: Indexers: batch content analyzer infrastructure from Indexers: batch blob analyzer infrastructure to Indexers: batch content analyzer infrastructure.
Oct 19 2018, 8:46 AM · Indexer, General
ardumont added a parent task for T991: Indexers: Send range of ids instead of list of ids: T359: Indexers: batch content analyzer infrastructure.
Oct 19 2018, 8:44 AM · Indexer

Oct 18 2018

zack edited projects for T439: Indexers: compute (and maintain up-to-date) the filetype of all blobs, added: Indexer; removed Developers.
Oct 18 2018, 9:17 PM · Indexer
olasd removed a project from T1277: swh-journal: Create a journal client for listing origin visits: Developers.
Oct 18 2018, 4:51 PM · Indexer, Journal
ardumont triaged T1277: swh-journal: Create a journal client for listing origin visits as Normal priority.
Oct 18 2018, 4:03 PM · Indexer, Journal

Oct 15 2018

vlorentz closed T1256: Add pipeline/workflow capabilities to the indexer orchestrator, a subtask of T1228: Create a component that sends a list of revision sha1 to the metadataRevisionIndexer, as Wontfix.
Oct 15 2018, 6:19 PM · Indexer, Metadata workflow
vlorentz closed T1256: Add pipeline/workflow capabilities to the indexer orchestrator as Wontfix.

Actually, this won't be needed, we can use BaseIndexer.next_step to schedule new Celery tasks.

Oct 15 2018, 6:19 PM · Metadata workflow, Indexer
moranegg added a comment to T834: deploy revision_indexer (at least for codemeta.json).

The deployment of the RevisionMetadataIndexer will be correlated with the deployment of the OriginMetadataIndexer when the workflow is ready on a new separated task.

Oct 15 2018, 12:16 PM · Indexer, Metadata workflow
moranegg closed T834: deploy revision_indexer (at least for codemeta.json) as Wontfix.
Oct 15 2018, 12:15 PM · Indexer, Metadata workflow

Oct 12 2018

zack edited Description on Indexer.
Oct 12 2018, 4:02 PM

Oct 11 2018

ardumont created P312 D505/507 test errors.
Oct 11 2018, 10:46 AM · Core & foundations, Indexer

Oct 10 2018

vlorentz added a revision to T1256: Add pipeline/workflow capabilities to the indexer orchestrator: D508: Add tests for the orchestrator..
Oct 10 2018, 3:01 PM · Metadata workflow, Indexer
vlorentz triaged T1256: Add pipeline/workflow capabilities to the indexer orchestrator as Normal priority.
Oct 10 2018, 10:54 AM · Metadata workflow, Indexer

Oct 9 2018

vlorentz added a revision to T1231: Create an origin indexer that lists the most recent revision in HEAD branch: D490: Add OriginIndexer + OriginHeadIndexer..
Oct 9 2018, 9:32 AM · Indexer, Metadata workflow

Oct 8 2018

vlorentz claimed T1228: Create a component that sends a list of revision sha1 to the metadataRevisionIndexer.
Oct 8 2018, 3:34 PM · Indexer, Metadata workflow

Oct 4 2018

ardumont created P308 indexer: make test ko.
Oct 4 2018, 4:19 PM · Indexer
moranegg closed T1230: Indexers: Improve readme to be more explicit on how to run locally as Resolved by committing rDCIDX68593287782c: docs: refix README path.
Oct 4 2018, 3:58 PM · Indexer, Scheduling utilities
moranegg closed T1230: Indexers: Improve readme to be more explicit on how to run locally, a subtask of T1227: General improvments of the indexer: Schedule indexer tasks, as Resolved.
Oct 4 2018, 3:58 PM · Indexer, Scheduling utilities
moranegg claimed T1230: Indexers: Improve readme to be more explicit on how to run locally.
Oct 4 2018, 12:11 PM · Indexer, Scheduling utilities

Oct 3 2018

moranegg added a parent task for T1228: Create a component that sends a list of revision sha1 to the metadataRevisionIndexer: T1232: Search over intrinsic metadata associated to an origin.
Oct 3 2018, 3:56 PM · Indexer, Metadata workflow
moranegg added a subtask for T1236: Refactor metadata translator to parse different types of files: T1237: Update mappings to CodeMeta crosswalk table .
Oct 3 2018, 3:55 PM · Metadata workflow, Indexer
moranegg added a parent task for T1237: Update mappings to CodeMeta crosswalk table : T1236: Refactor metadata translator to parse different types of files.
Oct 3 2018, 3:55 PM · Metadata workflow, Indexer
moranegg triaged T1237: Update mappings to CodeMeta crosswalk table as Normal priority.
Oct 3 2018, 3:53 PM · Metadata workflow, Indexer
moranegg created T1237: Update mappings to CodeMeta crosswalk table .
Oct 3 2018, 3:53 PM · Metadata workflow, Indexer
moranegg added a parent task for T1236: Refactor metadata translator to parse different types of files: T1235: Refactor metadata detector tool to add easily new file names to detect.
Oct 3 2018, 3:48 PM · Metadata workflow, Indexer
moranegg added a subtask for T1235: Refactor metadata detector tool to add easily new file names to detect: T1236: Refactor metadata translator to parse different types of files.
Oct 3 2018, 3:48 PM · Metadata workflow, Indexer
moranegg triaged T1236: Refactor metadata translator to parse different types of files as Normal priority.
Oct 3 2018, 3:47 PM · Metadata workflow, Indexer
moranegg added a parent task for T1235: Refactor metadata detector tool to add easily new file names to detect: T1232: Search over intrinsic metadata associated to an origin.
Oct 3 2018, 3:43 PM · Metadata workflow, Indexer
moranegg triaged T1235: Refactor metadata detector tool to add easily new file names to detect as Normal priority.
Oct 3 2018, 3:42 PM · Metadata workflow, Indexer
moranegg added a parent task for T1231: Create an origin indexer that lists the most recent revision in HEAD branch: T1232: Search over intrinsic metadata associated to an origin.
Oct 3 2018, 3:20 PM · Indexer, Metadata workflow
vlorentz claimed T1231: Create an origin indexer that lists the most recent revision in HEAD branch.
Oct 3 2018, 3:09 PM · Indexer, Metadata workflow
moranegg triaged T1231: Create an origin indexer that lists the most recent revision in HEAD branch as Normal priority.
Oct 3 2018, 3:07 PM · Indexer, Metadata workflow
vlorentz added a watcher for Indexer: vlorentz.
Oct 3 2018, 3:03 PM
ardumont renamed T1230: Indexers: Improve readme to be more explicit on how to run locally from Indexers: Align readme with other modules to Indexers: Improve readme to be more explicit on how to run locally.
Oct 3 2018, 12:20 PM · Indexer, Scheduling utilities
ardumont triaged T1230: Indexers: Improve readme to be more explicit on how to run locally as Normal priority.
Oct 3 2018, 12:19 PM · Indexer, Scheduling utilities
ardumont triaged T1229: Indexers: Make orchestrators use swh-scheduler for scheduling as Normal priority.
Oct 3 2018, 12:04 PM · Indexer, Scheduling utilities
ardumont updated the task description for T1227: General improvments of the indexer: Schedule indexer tasks.
Oct 3 2018, 12:03 PM · Indexer, Scheduling utilities
moranegg triaged T1228: Create a component that sends a list of revision sha1 to the metadataRevisionIndexer as Normal priority.
Oct 3 2018, 12:03 PM · Indexer, Metadata workflow
ardumont renamed T991: Indexers: Send range of ids instead of list of ids from General improvement of the indexers: Send range of ids instead of raw ids to Indexers: Send range of ids instead of raw ids.
Oct 3 2018, 12:02 PM · Indexer
ardumont added a subtask for T1227: General improvments of the indexer: Schedule indexer tasks: T991: Indexers: Send range of ids instead of list of ids.
Oct 3 2018, 12:00 PM · Indexer, Scheduling utilities
ardumont added a parent task for T991: Indexers: Send range of ids instead of list of ids: T1227: General improvments of the indexer: Schedule indexer tasks.
Oct 3 2018, 12:00 PM · Indexer
ardumont updated the task description for T1227: General improvments of the indexer: Schedule indexer tasks.
Oct 3 2018, 12:00 PM · Indexer, Scheduling utilities
ardumont triaged T1227: General improvments of the indexer: Schedule indexer tasks as Normal priority.
Oct 3 2018, 11:59 AM · Indexer, Scheduling utilities
olasd added a comment to T991: Indexers: Send range of ids instead of list of ids.

As an implementation strategy, I think we can aim at:

Oct 3 2018, 11:31 AM · Indexer
ardumont added a comment to T991: Indexers: Send range of ids instead of list of ids.

That would allow to be closer to use the real scheduler (swh-scheduler) [1] and not the volatile one [2]

Oct 3 2018, 11:27 AM · Indexer

Sep 28 2018

zack renamed T818: indexer DB should not use bytea for mimetype and encoding columns from Indexer mimetype: Use text instead of bytea for mimetype and encoding columns? to indexer DB should not use bytea for mimetype and encoding columns.
Sep 28 2018, 10:34 AM · Storage manager, Indexer

Sep 7 2018

ardumont updated the task description for T713: Index existing contents (mimetype, language, license).
Sep 7 2018, 11:55 PM · Indexer

Sep 6 2018

ardumont added a comment to T713: Index existing contents (mimetype, language, license).

First batch is done: 3.7 billion [1]

Sep 6 2018, 5:41 PM · Indexer

Jun 25 2018

ardumont closed T1094: swh-indexer db replica on azure as Resolved.
Jun 25 2018, 2:05 PM · Indexer, Web app, System administration
ardumont closed T1113: Update streaming replication documentation, a subtask of T1094: swh-indexer db replica on azure, as Resolved.
Jun 25 2018, 2:05 PM · Indexer, Web app, System administration
ardumont closed T1113: Update streaming replication documentation as Resolved.
Jun 25 2018, 2:05 PM · Indexer, Web app, System administration
ardumont updated the task description for T1113: Update streaming replication documentation.
Jun 25 2018, 10:54 AM · Indexer, Web app, System administration
ardumont renamed T1113: Update streaming replication documentation from Update streaming replication to Update streaming replication documentation.
Jun 25 2018, 10:52 AM · Indexer, Web app, System administration
ardumont triaged T1113: Update streaming replication documentation as Normal priority.
Jun 25 2018, 10:13 AM · Indexer, Web app, System administration
ardumont updated subscribers of T1094: swh-indexer db replica on azure.

Status on this, replication up and running.

Jun 25 2018, 10:11 AM · Indexer, Web app, System administration

Jun 15 2018

ardumont added a comment to T1094: swh-indexer db replica on azure.

Status on this, running cluster and db ;)

Jun 15 2018, 4:31 PM · Indexer, Web app, System administration
ardumont added a comment to T1094: swh-indexer db replica on azure.

about us moving away from pg_logical replication (and targets the page about the new one [2]).

Jun 15 2018, 11:48 AM · Indexer, Web app, System administration
ardumont added a comment to T1094: swh-indexer db replica on azure.

Replying to self here as well.

Jun 15 2018, 11:42 AM · Indexer, Web app, System administration

Jun 14 2018

ardumont closed T782: Web API: make endpoints that expose extracted metadata return *lists* of factual information as Resolved.

Thanks again for your contributions.

Jun 14 2018, 1:58 PM · Indexer, Easy hack, Web app
ardumont added a comment to T1094: swh-indexer db replica on azure.

wiki documentation about provision node updated as well.

Jun 14 2018, 10:33 AM · Indexer, Web app, System administration

Jun 13 2018

ardumont changed the status of T1094: swh-indexer db replica on azure from Open to Work in Progress.
Jun 13 2018, 6:32 PM · Indexer, Web app, System administration
ardumont added a comment to T1094: swh-indexer db replica on azure.

Relatedly, fixed/adapted according to latest use to provision a new node:

Jun 13 2018, 6:31 PM · Indexer, Web app, System administration
ardumont claimed T1094: swh-indexer db replica on azure.
Jun 13 2018, 1:03 PM · Indexer, Web app, System administration
ardumont closed T1095: indexer: Remove temporary table usage for read-only queries as Resolved.
Jun 13 2018, 1:02 PM · Indexer, Web app, System administration
ardumont closed T1095: indexer: Remove temporary table usage for read-only queries, a subtask of T1094: swh-indexer db replica on azure, as Resolved.
Jun 13 2018, 1:02 PM · Indexer, Web app, System administration

Jun 11 2018

ardumont added a revision to T1095: indexer: Remove temporary table usage for read-only queries: D343: indexer: Migrate away from temporary table.
Jun 11 2018, 2:34 PM · Indexer, Web app, System administration
ardumont triaged T1095: indexer: Remove temporary table usage for read-only queries as Normal priority.
Jun 11 2018, 11:01 AM · Indexer, Web app, System administration

Jun 8 2018

ardumont renamed T1094: swh-indexer db replica on azure from swh-indexer replica on azure to swh-indexer db replica on azure.
Jun 8 2018, 1:55 PM · Indexer, Web app, System administration

Jun 6 2018

s added a comment to T782: Web API: make endpoints that expose extracted metadata return *lists* of factual information.
In T782#20044, @zack wrote:

So, good news at last! The CLA is ready, here it is: L3 . (It misses a preamble, but it's pretty short anyway without one.)
@s : Can you have a look and sign it (using your legal name) ?
Happy to answer any question you might have !

Jun 6 2018, 11:50 PM · Indexer, Easy hack, Web app
zack added a comment to T782: Web API: make endpoints that expose extracted metadata return *lists* of factual information.

So, good news at last! The CLA is ready, here it is: L3 . (It misses a preamble, but it's pretty short anyway without one.)
@s : Can you have a look and sign it (using your legal name) ?
Happy to answer any question you might have !

Jun 6 2018, 3:44 PM · Indexer, Easy hack, Web app

Apr 20 2018

s added a comment to T782: Web API: make endpoints that expose extracted metadata return *lists* of factual information.

"zack (Stefano Zacchiroli)" <forge@softwareheritage.org> writes:

Apr 20 2018, 8:23 AM · Indexer, Easy hack, Web app

Apr 18 2018

zack added a comment to T782: Web API: make endpoints that expose extracted metadata return *lists* of factual information.
In T782#18978, @s wrote:

Can I:

  • Merge and push the changes in D301 to swh-indexer master branch?
  • Merge and push the changes in D302 to swh-web master branch?
Apr 18 2018, 2:31 PM · Indexer, Easy hack, Web app

Apr 16 2018

s added a comment to T782: Web API: make endpoints that expose extracted metadata return *lists* of factual information.

@ardumont, Can I:

Apr 16 2018, 8:58 AM · Indexer, Easy hack, Web app

Mar 21 2018

ardumont changed the status of T782: Web API: make endpoints that expose extracted metadata return *lists* of factual information from Open to Work in Progress.
Mar 21 2018, 5:10 PM · Indexer, Easy hack, Web app

Mar 17 2018

s added a revision to T782: Web API: make endpoints that expose extracted metadata return *lists* of factual information: D302: swh.web.common.service: Update lookup_content_license..
Mar 17 2018, 9:51 PM · Indexer, Easy hack, Web app

Mar 16 2018

ardumont added a project to T782: Web API: make endpoints that expose extracted metadata return *lists* of factual information: Indexer.
Mar 16 2018, 10:02 AM · Indexer, Easy hack, Web app

Mar 14 2018

ardumont triaged T991: Indexers: Send range of ids instead of list of ids as Normal priority.
Mar 14 2018, 1:22 PM · Indexer
ardumont updated the title for P238 How to load and index data from Draft: How to load and index data to How to load and index data.
Mar 14 2018, 12:23 PM · Indexer
ardumont added a project to P238 How to load and index data: Indexer.
Mar 14 2018, 12:19 PM · Indexer
ardumont edited P236 ~/.config/swh/indexer/mimetype.yml.
Mar 14 2018, 11:49 AM · Indexer
ardumont edited P236 ~/.config/swh/indexer/mimetype.yml.
Mar 14 2018, 10:42 AM · Indexer
ardumont edited P237 ~/.config/swh/indexer/fossology_license.yml.
Mar 14 2018, 10:41 AM · Indexer
ardumont created P237 ~/.config/swh/indexer/fossology_license.yml.
Mar 14 2018, 10:39 AM · Indexer
ardumont created P236 ~/.config/swh/indexer/mimetype.yml.
Mar 14 2018, 10:37 AM · Indexer
ardumont created P235 ~/.config/swh/indexer/orchestrator_text.yml.
Mar 14 2018, 10:36 AM · Indexer
ardumont created P234 ~/.config/swh/indexer/orchestrator.yml.
Mar 14 2018, 10:36 AM · Indexer

Dec 21 2017

moranegg closed T733: add content_metadata logic to storage as Resolved.
Dec 21 2017, 7:24 PM · Metadata workflow, Indexer
moranegg closed T733: add content_metadata logic to storage, a subtask of T715: create indexing strategy for metadata, as Resolved.
Dec 21 2017, 7:24 PM · Metadata workflow, Indexer

Dec 14 2017

ardumont closed T864: Indexers - Find and implement a proper scheduling content messages indexing method as Resolved.
Dec 14 2017, 1:30 PM · Indexer
ardumont closed T864: Indexers - Find and implement a proper scheduling content messages indexing method, a subtask of T713: Index existing contents (mimetype, language, license), as Resolved.
Dec 14 2017, 1:30 PM · Indexer

Dec 13 2017

ardumont added a comment to T864: Indexers - Find and implement a proper scheduling content messages indexing method.

Heads up, the output was too verbose so i updated that script to only show the last sha1 sent for computations:

Dec 13 2017, 6:43 PM · Indexer
ardumont added a comment to T864: Indexers - Find and implement a proper scheduling content messages indexing method.

After adding indexer dependency on the scheduler setup (rSPSITEfb6faecaaa928c4ddcbdbc81181bf3ffac2ace4c), this has been rescheduled through:

Dec 13 2017, 12:22 PM · Indexer

Dec 11 2017

olasd added a parent task for T880: Add softwareheritage-indexer backup: T881: PostgreSQL backups based on pg_dump.
Dec 11 2017, 3:52 PM · System administration
ardumont created T880: Add softwareheritage-indexer backup.
Dec 11 2017, 3:45 PM · System administration