Thanks for diving into this.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jul 3 2020
Jul 2 2020
Jun 23 2020
Jun 3 2020
Jun 2 2020
May 7 2020
After retrying to load the repository manually, the real cause of the dump that failed to mount is the following:
svnadmin: E125005: Valeur d'une propriété invalide dans le flot de sauvegarde ; envisager de corriger la source ou utiliser l'option --bypass-prop-validation au chargement. svnadmin: E125005: Propriété 'svn:log' refusée car non codée en UTF-8
Passing the --bypass-prop-validation option effectively fixes the loading issue.
I think we should use it as we already handle properties decoding errors in the loader implementation.
Apr 21 2020
svn loader now uses HEAD as branch name (against master in the early days).
Jan 22 2020
Related to T611
Aug 20 2019
Jul 3 2019
May 25 2019
closing, we do have an SVN loader now: it has still some issues, but the bulk of the job is done
@anlambert what's the status of ingesting very large SVN repos, now that we have put the loader in production?
Oct 15 2018
Oct 4 2018
Oct 2 2018
It does not indeed, I need more thinking on this...
Oct 1 2018
Thanks for the clarification, i needed it.
In T611#22696, @ardumont wrote:@zack Can you enlighten me as to why we want to store that information at the directory level (and not say at the revision one)?
@zack Can you enlighten me as to why we want to store that information at the directory level (and not say at the revision one)?
According to the official documentation (marked not a smart idea to reference), there has been a breaking migration format from svn 1.5 onwards.
Sep 30 2018
And also make sure the one visit date is the right one:
Sep 28 2018
as in T946#22626:
- 23 origins in error scheduled back [1]
- workers' dashboard log
softwareheritage=> \copy (select o.url, fh.status, fh.stderr from origin o inner join origin_visit ov on o.id=ov.origin inner join fetch_history fh on fh.origin=o.id where o.type='svn' and no
t fh.status and ov.visit = (select max(visit) from origin_visit where origin=o.id) and stderr like '%Inconsistency. CRLF detected in a converted file%') to '/home/ardumont/data-transit' ;
- new loader svn packaged and deployed
- origins in error scheduled back
- workers logs (kibana dashboard) [1]
All good! Time to close T946.
- loader.svn.tests: Add a scenario around user-defined svn properties
Sep 27 2018
Ok, so going for that fix.
\m/
So here are the results:
I was trying to play with your Python scripts to query kibana logs but it's been a while since I
did not write a query for elastic search and their json format is still awful.
Today is not a good day for me ;)
Great to have kibana back!
Another try with another svn repo gives me the following output:
Great to have kibana back!
I had to configure back the new kibana0 (was banco before) to start parsing those logs back.
This issue is reproduced on 6 repositories (so far, still running locally on some other big repositories).
To my old self, what are those other 5 repositories?
Sep 26 2018
To get some ideas on what we can found, below are some examples of svn:externals property values from googlecode svn projects.
https://wow-xlog.googlecode.com/svn/ LibXEvent-1.0 https://wow-xlog.googlecode.com/svn/branches/LibXEvent-1.0/
Sep 25 2018
Digging deeper to try and improve the result to return (at the moment, empty string).
I tried to use chardet to detect the language to try and decode the bytes.
This fails as nothing appropriate is found.
By analyzing the repository dump file in emacs
Thinking a bit more about the issue, there might a way to workaround it in our client code instead
of hacking in subvertpy.
Sep 24 2018
Tracking down the issue in subvertpy source code, the error occurs in the subvertpy._ra C extension module.
More precisely, an exception is raised at line 1068 in file subvertpy/editor.c when Python tries to decode a svn property value from 'utf-8' encoding.
1062 static svn_error_t *py_cb_editor_change_prop(void *dir_baton, const char *name, const svn_string_t *value, apr_pool_t *pool) 1063 { 1064 PyObject *self = (PyObject *)dir_baton, *ret; 1065 PyGILState_STATE state = PyGILState_Ensure(); 1066 1067 if (value != NULL) { 1068 ret = PyObject_CallMethod(self, "change_prop", "sz#", name, value->data, value->len); 1069 } else { 1070 ret = PyObject_CallMethod(self, "change_prop", "sO", name, Py_None); 1071 } 1072 CB_CHECK_PYRETVAL(ret); 1073 Py_DECREF(ret); 1074 PyGILState_Release(state); 1075 return NULL; 1076 }
(If you are not interested in details, no need to look further, it's a run extract ;)
This one is still true. Still banging my head on this.
Based on my last tests, I was too confident that svnadmin will be able to load a dump containing an arbitrary revision range
(either generated by svnrdump and rsvndump). So let's put that incremental dump idea in hold for the moment as it needs
more investigation on the subject.
Sep 22 2018
FWIW, this is my main worry about this approach.
FWIW, this is my main worry about this approach.
Hum, it seems there exist some subtle corner cases where incremental loading will fail ...
...
To be sure, I quickly patched the rsvndump source code and the issue went away, so my assumption seems right.
....
Sep 21 2018
Hum, it seems there exist some subtle corner cases where incremental loading will fail ...
This is what I got for instance, when playing with the Apache Subversion repository by
loading it incrementally (killing rsvndump randomly in order to load what we dumped so far).
Fix blank spaces in readme and rebase
Let's forget my comments about the svn_url parameter drop
Right.
Let's forget my comments about the svn_url parameter drop and let's land it!
Note: I'm willing to rebase all this on @anlambert's current work to improve the loading speed (D433)
Really nice rework!
Amend and improve the svnrepo initialization
Awesome!
Have you checked that the last part results in the same snapshot as the actual svn loader?
That is do the full loading with the actual svn loader (up to the 20 revisions), take the snapshot and compare it with this quoted one.
Really nice rework! Implementing fetch_data / store_data really helps to better understand the loader processing.
I added a couple of comments but it's all good for me.
thinking more about this. I see something missing in the description.
Sep 20 2018
\m/
Thanks for the thorough description!
It's awesome.
So I took some time to dig a little further on that idea of creating a dump file using the
svnrdump command from the official tools coming along with subversion.
- docs: Remove old notes
- docs: Remove old comparison file