Page MenuHomeSoftware Heritage

Add a human-readable 'whatsnew' item to the status dict.
AbandonedPublic

Authored by vlorentz on Nov 20 2018, 3:58 PM.

Details

Reviewers
douardda
ardumont
Group Reviewers
Reviewers
Summary

That's useful for tests, and there's no harm in keeping it.

Diff Detail

Repository
rDLDG Git loader
Branch
eventful-whatsnew
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 2518
Build 3119: tox-on-jenkinsJenkins
Build 3118: arc lint + arc unit

Event Timeline

Have you tried runnning this change (with a loader with remote storage setup).

For example:

python3 -m swh.loader.git.updater --origin-url /path/to/one-repo/you/can/control
cd /path/to/one-repo/you/can/control
# do something
# commit
git commit -am 'Dummy action to commit stuff'
python3 -m swh.loader.git.updater --origin-url /path/to/one-repo/you/can/control

I ask because that status result will get serialized and i'm not sure of the result on whatsnew here (id are bytes).

repr() always returns a string, so it's fine

douardda added a subscriber: douardda.

Also, you justify this change with "That's useful for tests, and there's no harm in keeping it."

First passertion might be acceptable, but it would then make sense to generalize what you do in test_load_unchanged (ie. add the returned status as third argument of the assertion).

Second argument, well I guess you know now ;-)

swh/loader/git/updater.py
486–497

Generally speaking, it's best to have only one exit point in a function/method, as far as possible. In this case, I see no reason not to do so:

 status = {'status': 'uneventful'}
 if self.base_snapshot:
     if self.snapshot['id'] != self.base_snapshot['id']:
         status['status'] = 'eventful'
         status['whatsnew'] = 'snapshot: {} -> {}'.format(
                     self.base_snapshot['id'], self.snapshot['id'])}
 elif self.snapshot['branches']:
         status['status'] = 'eventful'
         status['whatsnew'] = 'new branches: {}'.format(
                     ', '.join(map(repr, self.snapshot['branches'])))}
return status
This revision now requires changes to proceed.Nov 22 2018, 9:38 AM

repr() always returns a string, so it's fine

I was not even on branches, i was speaking about the base_snapshot['id'] which must be bytes, most probably.

Also, my proposed manual test is not the one which will cause issue.
It's more running with a celery instance.

(and i forgot to hit enter yesterday... nice)

repr() always returns a string, so it's fine

I was not even on branches, i was speaking about the base_snapshot['id'] which must be bytes, most probably.

Also, my proposed manual test is not the one which will cause issue.
It's more running with a celery instance.

(and i forgot to hit enter yesterday... nice)

I remember why, i wanted to test something prior to hit enter ;)

In the end, that will work but the log won't be that human readable:

>>> raw_id
b'4\x972t\xcc\xefj\xb4\xdf\xaa\xf8e\x99y/\xa9\xc3\xfeF\x89'
>>> 'hello {}'. format(raw_id)
"hello b'4\\x972t\\xcc\\xefj\\xb4\\xdf\\xaa\\xf8e\\x99y/\\xa9\\xc3\\xfeF\\x89'"

I think you need to hashutil.hash_to_hex the id to keep it human readable.
AFAIR, that's what is/was done with the loaders.

Second argument, well I guess you know now ;-)

?

  • Show the load status on assertion failures.
  • Simplify the logic in load_status.
  • hash_to_hex on snapshot ids.

Second argument, well I guess you know now ;-)

?

This was about

there's no harm in keeping it

which smells like a small yagni. Either it's useful or not. If not then there is harm keeping it: every written line of code needs maintenance, testing (which lines are not, in fact) etc. so there is always a cost (not counting the time taken to write the code at first place).

Once again, I'm against this diff, I just want to point how easy it is add code, and how difficult it is, sometimes, not to write code. One must always ask him/herself whether the written line of code needs to exists, needs to be taken care of for the next 5-15 years.

This revision is now accepted and ready to land.Nov 27 2018, 11:59 AM

which smells like a small yagni. Either it's useful or not. If not then there is harm keeping it: every written line of code needs maintenance, testing (which lines are not, in fact) etc. so there is always a cost (not counting the time taken to write the code at first place).

Once again, I'm against this diff, I just want to point how easy it is add code, and how difficult it is, sometimes, not to write code. One must always ask him/herself whether the written line of code needs to exists, needs to be taken care of for the next 5-15 years.

If you think that's a bad idea, I can abandon this diff; that's also why we have code reviews.