Page MenuHomeSoftware Heritage
Feed Advanced Search

Dec 11 2017

ardumont updated the task description for T867: Separate indexers' database model to its own database - meta task.
Dec 11 2017, 3:41 PM · Storage manager, Indexer

Dec 7 2017

ardumont created T874: Keep indexer table in sync with new contents.
Dec 7 2017, 10:17 PM · Journal, Indexer
ardumont closed T867: Separate indexers' database model to its own database - meta task as Resolved.
Dec 7 2017, 9:53 PM · Storage manager, Indexer
ardumont closed T873: Clean up indexer's data reference in softwareheritage's main db, a subtask of T867: Separate indexers' database model to its own database - meta task, as Resolved.
Dec 7 2017, 9:52 PM · Storage manager, Indexer
ardumont closed T873: Clean up indexer's data reference in softwareheritage's main db as Resolved.
Dec 7 2017, 9:52 PM · Storage manager, Indexer
ardumont added a comment to T873: Clean up indexer's data reference in softwareheritage's main db.

I'll do this tomorrow when i'm fresh.

Dec 7 2017, 9:39 PM · Storage manager, Indexer
ardumont added a comment to T873: Clean up indexer's data reference in softwareheritage's main db.

I'll do this tomorrow when i'm fresh.

Dec 7 2017, 6:00 PM · Storage manager, Indexer
ardumont created T873: Clean up indexer's data reference in softwareheritage's main db.
Dec 7 2017, 5:40 PM · Storage manager, Indexer
ardumont closed T871: Migrate swh-storage api functions relative to indexers to swh-indexer as Resolved.
Dec 7 2017, 5:28 PM · SWORD deposit, Core Loader, Web app, Development environment, Storage manager, Indexer
ardumont closed T871: Migrate swh-storage api functions relative to indexers to swh-indexer, a subtask of T867: Separate indexers' database model to its own database - meta task, as Resolved.
Dec 7 2017, 5:28 PM · Storage manager, Indexer
ardumont closed T872: Deploy and restart indexers as Resolved.
Dec 7 2017, 5:28 PM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont closed T872: Deploy and restart indexers, a subtask of T871: Migrate swh-storage api functions relative to indexers to swh-indexer, as Resolved.
Dec 7 2017, 5:28 PM · SWORD deposit, Core Loader, Web app, Development environment, Storage manager, Indexer
ardumont updated the task description for T872: Deploy and restart indexers.
Dec 7 2017, 5:24 PM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont updated the task description for T872: Deploy and restart indexers.
Dec 7 2017, 5:10 PM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont updated the task description for T872: Deploy and restart indexers.
Dec 7 2017, 5:10 PM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont changed the status of T872: Deploy and restart indexers from Open to Work in Progress.
Dec 7 2017, 4:41 PM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont changed the status of T872: Deploy and restart indexers, a subtask of T871: Migrate swh-storage api functions relative to indexers to swh-indexer, from Open to Work in Progress.
Dec 7 2017, 4:41 PM · SWORD deposit, Core Loader, Web app, Development environment, Storage manager, Indexer
ardumont edited parent tasks for T872: Deploy and restart indexers, added: T871: Migrate swh-storage api functions relative to indexers to swh-indexer; removed: T867: Separate indexers' database model to its own database - meta task.
Dec 7 2017, 4:40 PM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont removed a subtask for T867: Separate indexers' database model to its own database - meta task: T872: Deploy and restart indexers.
Dec 7 2017, 4:40 PM · Storage manager, Indexer
ardumont added a subtask for T871: Migrate swh-storage api functions relative to indexers to swh-indexer: T872: Deploy and restart indexers.
Dec 7 2017, 4:40 PM · SWORD deposit, Core Loader, Web app, Development environment, Storage manager, Indexer
ardumont updated the task description for T872: Deploy and restart indexers.
Dec 7 2017, 4:40 PM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont updated the task description for T871: Migrate swh-storage api functions relative to indexers to swh-indexer.
Dec 7 2017, 4:39 PM · SWORD deposit, Core Loader, Web app, Development environment, Storage manager, Indexer
ardumont added projects to T872: Deploy and restart indexers: Core Loader, SWORD deposit.
Dec 7 2017, 3:39 PM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont updated the task description for T872: Deploy and restart indexers.
Dec 7 2017, 3:07 PM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont updated the task description for T872: Deploy and restart indexers.
Dec 7 2017, 1:57 PM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont updated the task description for T872: Deploy and restart indexers.
Dec 7 2017, 12:29 PM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont updated the task description for T871: Migrate swh-storage api functions relative to indexers to swh-indexer.
Dec 7 2017, 12:28 PM · SWORD deposit, Core Loader, Web app, Development environment, Storage manager, Indexer
ardumont updated the task description for T871: Migrate swh-storage api functions relative to indexers to swh-indexer.
Dec 7 2017, 11:26 AM · SWORD deposit, Core Loader, Web app, Development environment, Storage manager, Indexer
ardumont updated the task description for T871: Migrate swh-storage api functions relative to indexers to swh-indexer.
Dec 7 2017, 10:38 AM · SWORD deposit, Core Loader, Web app, Development environment, Storage manager, Indexer
ardumont updated the task description for T871: Migrate swh-storage api functions relative to indexers to swh-indexer.
Dec 7 2017, 10:38 AM · SWORD deposit, Core Loader, Web app, Development environment, Storage manager, Indexer
ardumont added projects to T871: Migrate swh-storage api functions relative to indexers to swh-indexer: Core Loader, SWORD deposit.
Dec 7 2017, 10:35 AM · SWORD deposit, Core Loader, Web app, Development environment, Storage manager, Indexer
ardumont updated the task description for T872: Deploy and restart indexers.
Dec 7 2017, 9:55 AM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont updated the task description for T872: Deploy and restart indexers.
Dec 7 2017, 9:53 AM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont added projects to T872: Deploy and restart indexers: Web app, Storage manager.
Dec 7 2017, 9:53 AM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont created T872: Deploy and restart indexers.
Dec 7 2017, 9:52 AM · SWORD deposit, Core Loader, Storage manager, Web app, Puppet recipes, Indexer
ardumont renamed T867: Separate indexers' database model to its own database - meta task from Separate indexers' database model to its own database to Separate indexers' database model to its own database - meta task.
Dec 7 2017, 9:43 AM · Storage manager, Indexer
ardumont created T871: Migrate swh-storage api functions relative to indexers to swh-indexer.
Dec 7 2017, 9:42 AM · SWORD deposit, Core Loader, Web app, Development environment, Storage manager, Indexer
ardumont created T870: Create new database for the indexer and populate it.
Dec 7 2017, 9:35 AM · Indexer

Dec 6 2017

Hafis added a comment to T713: Index existing contents (mimetype, language, license).
Dec 6 2017, 11:38 PM · Indexer
ardumont updated the task description for T867: Separate indexers' database model to its own database - meta task.
Dec 6 2017, 12:49 PM · Storage manager, Indexer

Dec 2 2017

ardumont edited P198 improving indexing using derivative form of materialized view.
Dec 2 2017, 1:09 PM · Indexer
ardumont added a comment to T864: Indexers - Find and implement a proper scheduling content messages indexing method.

Thinking more about this.

Dec 2 2017, 1:08 PM · Indexer
ardumont created P198 improving indexing using derivative form of materialized view.
Dec 2 2017, 1:06 PM · Indexer
ardumont renamed T864: Indexers - Find and implement a proper scheduling content messages indexing method from Find and implement a proper scheduling content messages indexing method to Indexers - Find and implement a proper scheduling content messages indexing method.
Dec 2 2017, 12:42 PM · Indexer
ardumont created T867: Separate indexers' database model to its own database - meta task.
Dec 2 2017, 12:40 PM · Storage manager, Indexer

Dec 1 2017

ardumont added a parent task for T817: analyze bogus mimetype values in content_mimetype table: T713: Index existing contents (mimetype, language, license).
Dec 1 2017, 2:15 PM · Archive content, Indexer
ardumont added a subtask for T713: Index existing contents (mimetype, language, license): T817: analyze bogus mimetype values in content_mimetype table.
Dec 1 2017, 2:15 PM · Indexer
ardumont updated the task description for T864: Indexers - Find and implement a proper scheduling content messages indexing method.
Dec 1 2017, 2:06 PM · Indexer
ardumont created P197 errors in indexer when uffizi's disk's full -> OSError: [Errno 28] No space left on device.
Dec 1 2017, 1:52 PM · System administrators, Indexer, Storage manager
ardumont updated the task description for T864: Indexers - Find and implement a proper scheduling content messages indexing method.
Dec 1 2017, 12:21 PM · Indexer
ardumont created T864: Indexers - Find and implement a proper scheduling content messages indexing method.
Dec 1 2017, 12:18 PM · Indexer
ardumont added a parent task for T861: mimetype indexer: edge case makes the indexer fail miserably: T713: Index existing contents (mimetype, language, license).
Dec 1 2017, 11:50 AM · Indexer
ardumont added a subtask for T713: Index existing contents (mimetype, language, license): T861: mimetype indexer: edge case makes the indexer fail miserably.
Dec 1 2017, 11:50 AM · Indexer
ardumont renamed T862: mimetype indexer: work around mimetype detection problem to unstuck indexing workers from mimetype indexer: work around mimetype detection to unstuck indexer to mimetype indexer: work around mimetype detection problem to unstuck indexing workers.
Dec 1 2017, 11:42 AM · Indexer

Nov 29 2017

ardumont added a comment to T862: mimetype indexer: work around mimetype detection problem to unstuck indexing workers.

Developed, Packaged, Deployed, indexers unstuck \m/

Nov 29 2017, 10:39 AM · Indexer
ardumont updated the task description for T862: mimetype indexer: work around mimetype detection problem to unstuck indexing workers.
Nov 29 2017, 10:31 AM · Indexer
ardumont closed T862: mimetype indexer: work around mimetype detection problem to unstuck indexing workers, a subtask of T861: mimetype indexer: edge case makes the indexer fail miserably, as Resolved.
Nov 29 2017, 10:29 AM · Indexer
ardumont closed T862: mimetype indexer: work around mimetype detection problem to unstuck indexing workers as Resolved by committing rDCIDX3e4641b5a168: swh.indexer.mimetype: Work around problem in mimetype detection.
Nov 29 2017, 10:29 AM · Indexer
ardumont created T862: mimetype indexer: work around mimetype detection problem to unstuck indexing workers.
Nov 29 2017, 10:25 AM · Indexer
ardumont added a comment to T861: mimetype indexer: edge case makes the indexer fail miserably.

In the mean time, as my main concern is not the indexer, i'll work around this to avoid stopping entirely the indexers (as some batch can then be stuck in the rescheduling loop).

Nov 29 2017, 10:13 AM · Indexer
ardumont added a comment to T861: mimetype indexer: edge case makes the indexer fail miserably.

Sounds like a bug against python3-magic.

Nov 29 2017, 10:09 AM · Indexer
ardumont added a comment to T861: mimetype indexer: edge case makes the indexer fail miserably.

Example Sha1 with that error is '099c7254742e2be54a86d03a3a1826a7b8e757d0':

Nov 29 2017, 9:44 AM · Indexer
ardumont renamed T861: mimetype indexer: edge case makes the indexer fail miserably from mimetype indexer: edge case makes the indexer fails miserably to mimetype indexer: edge case makes the indexer fail miserably.
Nov 29 2017, 9:36 AM · Indexer
ardumont renamed T861: mimetype indexer: edge case makes the indexer fail miserably from mimetype indexer: when no result is returned, indexer fails miserably to mimetype indexer: edge case makes the indexer fails miserably.
Nov 29 2017, 9:31 AM · Indexer
ardumont created T861: mimetype indexer: edge case makes the indexer fail miserably.
Nov 29 2017, 9:03 AM · Indexer

Nov 28 2017

moranegg renamed T834: deploy revision_indexer (at least for codemeta.json) from deploy revision_indexer (for codemeta.json) to deploy revision_indexer (at least for codemeta.json).
Nov 28 2017, 4:06 PM · Indexer, Metadata workflow

Nov 23 2017

ardumont closed T817: analyze bogus mimetype values in content_mimetype table as Resolved.
Nov 23 2017, 12:30 PM · Archive content, Indexer
ardumont closed T854: clean up bogus mimetype values in content_mimetype table, a subtask of T817: analyze bogus mimetype values in content_mimetype table, as Resolved.
Nov 23 2017, 12:11 PM · Archive content, Indexer
ardumont closed T854: clean up bogus mimetype values in content_mimetype table as Resolved.
Nov 23 2017, 12:11 PM · Archive content, Indexer
ardumont closed T849: Fix bogus mimetype values detection in the mimetype indexer implementation as Resolved.
Nov 23 2017, 12:10 PM · Archive content, Indexer
ardumont closed T849: Fix bogus mimetype values detection in the mimetype indexer implementation, a subtask of T850: reschedule indexing of contents with bogus mimetype values, as Resolved.
Nov 23 2017, 12:10 PM · Archive content, Indexer
ardumont closed T849: Fix bogus mimetype values detection in the mimetype indexer implementation, a subtask of T817: analyze bogus mimetype values in content_mimetype table, as Resolved.
Nov 23 2017, 12:10 PM · Archive content, Indexer
ardumont closed T850: reschedule indexing of contents with bogus mimetype values as Resolved.
Nov 23 2017, 12:07 PM · Archive content, Indexer
ardumont closed T850: reschedule indexing of contents with bogus mimetype values, a subtask of T817: analyze bogus mimetype values in content_mimetype table, as Resolved.
Nov 23 2017, 12:07 PM · Archive content, Indexer
ardumont added a comment to T850: reschedule indexing of contents with bogus mimetype values.

The old tool is id 7, the new one is 9:

Nov 23 2017, 12:07 PM · Archive content, Indexer

Nov 22 2017

ardumont changed the status of T850: reschedule indexing of contents with bogus mimetype values from Open to Work in Progress.
Nov 22 2017, 4:01 PM · Archive content, Indexer
ardumont changed the status of T850: reschedule indexing of contents with bogus mimetype values, a subtask of T817: analyze bogus mimetype values in content_mimetype table, from Open to Work in Progress.
Nov 22 2017, 4:01 PM · Archive content, Indexer
ardumont added a comment to T850: reschedule indexing of contents with bogus mimetype values.

Depends on T761

Nov 22 2017, 4:00 PM · Archive content, Indexer
ardumont added a comment to T854: clean up bogus mimetype values in content_mimetype table.

Bogus mimetype values are identified by the following queries:

softwareheritage=> select count(*) from content_mimetype where mimetype LIKE '[%' or mimetype like '' and indexer_configuration_id=7;
 count
-------
 50733
(1 row)
Nov 22 2017, 3:59 PM · Archive content, Indexer

Nov 20 2017

ardumont closed T853: Index existing contents - Stop language indexing, a subtask of T713: Index existing contents (mimetype, language, license), as Resolved.
Nov 20 2017, 9:02 AM · Indexer
ardumont closed T853: Index existing contents - Stop language indexing as Resolved by committing rSPSITEc3ddbacbde17: data/default: Remove language indexer from azure runtime.
Nov 20 2017, 9:02 AM · Indexer

Nov 17 2017

ardumont closed T851: Make the indexers register themselves to storage when starting as Resolved by committing rDCIDX9e986c3cca66: swh.indexer: Make indexers register tools in prepare method.
Nov 17 2017, 3:09 PM · Indexer

Nov 16 2017

ardumont added a comment to T817: analyze bogus mimetype values in content_mimetype table.

Status:

  • Final listing of bogus values: /srv/storage/space/lists/indexer/mimetype/sha1-with-bogus-values.txt.gz (50733)
  • Queue reached the sane point.
  • workers stopped.
Nov 16 2017, 9:07 AM · Archive content, Indexer

Nov 15 2017

ardumont added a comment to T817: analyze bogus mimetype values in content_mimetype table.

I am waiting for the queue to drop at 10000 as that will avoid rescheduling the already done 10000 (well except for the new bogus values :)

Nov 15 2017, 4:55 PM · Archive content, Indexer
ardumont renamed T817: analyze bogus mimetype values in content_mimetype table from analyze bogus mimetype values in content_mimetypes table to analyze bogus mimetype values in content_mimetype table.
Nov 15 2017, 4:24 PM · Archive content, Indexer
ardumont created T854: clean up bogus mimetype values in content_mimetype table.
Nov 15 2017, 4:24 PM · Archive content, Indexer
ardumont updated the task description for T850: reschedule indexing of contents with bogus mimetype values.
Nov 15 2017, 4:23 PM · Archive content, Indexer
ardumont added a comment to T817: analyze bogus mimetype values in content_mimetype table.

There might be other bogus values in the stats that I haven't noticed.

Nov 15 2017, 4:01 PM · Archive content, Indexer
ardumont created T853: Index existing contents - Stop language indexing.
Nov 15 2017, 3:33 PM · Indexer
moranegg closed T715: create indexing strategy for metadata as Resolved.
Nov 15 2017, 1:55 PM · Metadata workflow, Indexer
ardumont added a comment to T817: analyze bogus mimetype values in content_mimetype table.

I don't see how i can easily check this though since we don't have the sha1 provenance yet.

Nov 15 2017, 12:31 PM · Archive content, Indexer
zack renamed T850: reschedule indexing of contents with bogus mimetype values from Schedule back bogus mimetype values for indexation to reschedule indexing of contents with bogus mimetype values.
Nov 15 2017, 12:25 PM · Archive content, Indexer
ardumont created T851: Make the indexers register themselves to storage when starting.
Nov 15 2017, 12:13 PM · Indexer
ardumont added a subtask for T850: reschedule indexing of contents with bogus mimetype values: T849: Fix bogus mimetype values detection in the mimetype indexer implementation.
Nov 15 2017, 11:44 AM · Archive content, Indexer
ardumont added a parent task for T849: Fix bogus mimetype values detection in the mimetype indexer implementation: T850: reschedule indexing of contents with bogus mimetype values.
Nov 15 2017, 11:44 AM · Archive content, Indexer
ardumont renamed T850: reschedule indexing of contents with bogus mimetype values from Schedule back bogus mimetype values for indexing to Schedule back bogus mimetype values for indexation.
Nov 15 2017, 11:43 AM · Archive content, Indexer
ardumont created T850: reschedule indexing of contents with bogus mimetype values.
Nov 15 2017, 11:42 AM · Archive content, Indexer
ardumont renamed T817: analyze bogus mimetype values in content_mimetype table from bogus mimetype values in content_mimetypes table to analyze bogus mimetype values in content_mimetypes table.
Nov 15 2017, 11:39 AM · Archive content, Indexer
ardumont created T849: Fix bogus mimetype values detection in the mimetype indexer implementation.
Nov 15 2017, 11:38 AM · Archive content, Indexer
ardumont added a comment to T817: analyze bogus mimetype values in content_mimetype table.

From the top of my head, i would say that i forgot to clean up those bogus values after the initial runs around december 2016.
I don't see how i can easily check this though since we don't have the sha1 provenance yet.

Nov 15 2017, 11:35 AM · Archive content, Indexer