In T3870#77430, @swh-sentry-integration wrote:Sentry issue: SWH-LOADER-SVN-58
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Jan 21 2022
Jan 21 2022
anlambert added a comment to T3870: Analyze svn externals repository ingestion with loader.svn v1.0.
anlambert added a comment to D7009: Consider unauthorized access to origin as a not found visit status.
I think it will be better to add a new visit status, like restricted or unauthorized, what do you think ?
anlambert updated the task description for T3870: Analyze svn externals repository ingestion with loader.svn v1.0.
Jan 20 2022
Jan 20 2022
anlambert closed T3403: Use forge URL network location as default lister instance name, a subtask of T3127: Compute and display distribution of origins by forge, as Resolved.
This task has been complete since a while now, closing it.
Is there a reason not to close this task?
Jan 19 2022
Jan 19 2022
Time to switch production database to a new sanitized one.
@ardumont deployed swh-loader-svn v1.0.0 on staging and restarted the loader service.
anlambert committed rDLDSVN11740ebc2993: replay: Prevent removal of external paths overlapping versioned ones (authored by anlambert).
replay: Prevent removal of external paths overlapping versioned ones
anlambert requested review of D6977: replay: Prevent removal of external paths overlapping versioned ones.
This is what I have done to sanitize the production database and plug it into our testbed for testing.
Jan 18 2022
Jan 18 2022
anlambert lowered the priority of T3862: Sanitize WordPress production database from High to Normal.
So turns out the customizer issue was due to the recent upgrade of the wp-extra-file-types plugin.
Rename ra module to replay
In D6962#181131, @vlorentz wrote:Indeed! I have no idea what "ra" means :)
anlambert committed rDLDSVNf6fbbb789715: ra: Send modified objects only to storage after replaying a revision (authored by anlambert).
ra: Send modified objects only to storage after replaying a revision
anlambert committed rDLDSVNa820d7eab8d5: ra: Put externals in cache to avoid exporting them again (authored by anlambert).
ra: Put externals in cache to avoid exporting them again
anlambert committed rDLDSVN473fe145f4b7: ra: Add support for subversion external definitions (authored by anlambert).
ra: Add support for subversion external definitions
anlambert committed rDLDSVNf1913512a5fa: utils: Add a function to parse a subversion external definition (authored by anlambert).
utils: Add a function to parse a subversion external definition
anlambert updated the diff for D6950: ra: Send modified objects only to storage after replaying a revision.
Rebase
Rebase
Fix wrong rebase
Update: Add missing root path check in DirEditor.add_directory
anlambert added a comment to D6950: ra: Send modified objects only to storage after replaying a revision.
Currently the subversion loader is the only one that needs that directories diff feature so I think we can keep the implementation as it is at the
moment but I will create a task to implement the directories diff features in swh.model.from_disk.
anlambert updated the diff for D6950: ra: Send modified objects only to storage after replaying a revision.
Rebase
Rebase
Remove not needed pass instruction
lgtm but I'm not sure i understood everything (code wise).
anlambert updated the diff for D6950: ra: Send modified objects only to storage after replaying a revision.
Rebase
Rebase
Rebase
anlambert updated the diff for D6839: utils: Add a function to parse a subversion external definition.
Rebase
anlambert added inline comments to D6961: Empty atom request request should raise a bad request (400).
anlambert updated the diff for D6950: ra: Send modified objects only to storage after replaying a revision.
Rebase
Update: Preserve symlinks when copying an external tree.
- Improve test when all externals got removed from a path
anlambert updated the diff for D6839: utils: Add a function to parse a subversion external definition.
Update: Handle a couple of edge cases found when testing with real world repositories.
Jan 17 2022
Jan 17 2022
I modified the concerned line in the /srv/data/etc/cron/anacrontab file the following way:
It looks like it is not critical if the both tables above are not present in the WordPress database, publications list is still correctly displayed
while list of authors and tags will appear empty in the teachPress admin dashboard.
There is two tables with the index too large issue, wp_teachpress_authors and wp_teachpress_tags.
So it turns out that we cannot perform any operations on tables whose Index column size too large, sighs ...
Using the --single-transaction=TRUE option of mysqldump helped to get the name of the table with Index column size too large.
Jan 14 2022
Jan 14 2022
anlambert updated the diff for D6950: ra: Send modified objects only to storage after replaying a revision.
Add comment
anlambert added a comment to D6950: ra: Send modified objects only to storage after replaying a revision.
In D6950#180642, @olasd wrote:This looks like an impressive speedup, kudos.
Rather than add this logic on the svn loader only, we could consider either making swh.model.from_disk support incremental computations by keeping track of the ctime / mtime of on-disk data, and by collecting objects for the new loader. This would make this logic reusable by all loaders.
Without changing the swh.model.from_disk logic, we could also just diff the sets of objects between iterations (and rely on the OS cache for the new computation to be vaguely efficient), and only send new ones to the storage.
anlambert updated the diff for D6950: ra: Send modified objects only to storage after replaying a revision.
Rebase
Rebase
Also filter out empty lines and commented ones when parsing external
definitions after a checkout.
anlambert updated the diff for D6950: ra: Send modified objects only to storage after replaying a revision.
Rebase
Rebase
- Use context manager to create temporary directory for checkout
- Update years in test_loader.py license header
anlambert requested review of D6950: ra: Send modified objects only to storage after replaying a revision.
Rebase
Optimize subversion export operation when dealing with externals: use the
origin URL as export parameter only if we know that some externals are
defined as relative to the repository URL and targets a path outside the
repository.
anlambert updated the diff for D6839: utils: Add a function to parse a subversion external definition.
Add a boolean in the tuple returned by the parse function indicating
if the external URL was defined as relative to the repository one
bu targets a path outside the repository.
anlambert triaged T3848: Activate saved origin browse link only when loading data are available in database as Normal priority.
Jan 13 2022
Jan 13 2022
Great, thanks !
anlambert committed rDWAPPS3809abf63da7: Makefile.local: Pass --frozen-lockfile option to yarn install (authored by anlambert).
Makefile.local: Pass --frozen-lockfile option to yarn install
anlambert committed rDWAPPS68ec3d83ab5d: browse/snapshot_context: Fix revisions log link display regression (authored by anlambert).
browse/snapshot_context: Fix revisions log link display regression
anlambert requested review of D6942: browse/snapshot_context: Fix revisions log link display regression.
anlambert committed rDWAPPS491576531af7: common/origin_save: Fix elasticsearch request (authored by anlambert).
common/origin_save: Fix elasticsearch request
Jan 12 2022
Jan 12 2022
anlambert added a comment to T3845: Requesting cooking with email address + "Enter " key returns an error.
This is because the vault modal does not contain a real HTML form so there is no submit event sent when pressing Enter.
anlambert closed T3836: Define and implement an anti-DoS policy for graph visits using the max_edges parameter as Resolved.
Anti-DoS policy has been implemented and deployed. The max_edges thresholds can be easily changed by configuration.
anlambert closed T3840: "'NoneType' object has no attribute 'split'" on /browse/snapshot/log/ as Resolved.
Fixed and deployed.
anlambert committed rDWAPPS9e7732ccebe3: api/graph: Handle query parameters that might be passed in graph_query (authored by anlambert).
api/graph: Handle query parameters that might be passed in graph_query
anlambert committed rDWAPPSe870ec2b9a40: api/graph: Implement anti-DoS policies for graph visits (authored by anlambert).
api/graph: Implement anti-DoS policies for graph visits
Rebase and ensure path of external root directory is stored in the external paths set.
anlambert updated the diff for D6839: utils: Add a function to parse a subversion external definition.
Rebase
Rebase
anlambert updated the diff for D6919: api/graph: Handle query parameters that might be passed in graph_query.
Use urlunparse.
anlambert added inline comments to D6919: api/graph: Handle query parameters that might be passed in graph_query.
How do the tests check integrity of the results? (ie. there are the right revisions, the right files, etc.)
Does the incremental loader update the remote repo if the svn:externals property was not touched?
It should be faster indeed, looks good to me.
Looks good to me.