Changeset View
Changeset View
Standalone View
Standalone View
swh/storage/cassandra/storage.py
Show First 20 Lines • Show All 757 Lines • ▼ Show 20 Lines | ): | ||||
else: | else: | ||||
origins = [orig for orig in origins if url_pattern in orig.url] | origins = [orig for orig in origins if url_pattern in orig.url] | ||||
if with_visit: | if with_visit: | ||||
origins = [orig for orig in origins if orig.next_visit_id > 1] | origins = [orig for orig in origins if orig.next_visit_id > 1] | ||||
return [{"url": orig.url,} for orig in origins[offset : offset + limit]] | return [{"url": orig.url,} for orig in origins[offset : offset + limit]] | ||||
def origin_add(self, origins: Iterable[Origin]) -> List[Dict]: | def origin_add(self, origins: Iterable[Origin]) -> Dict[str, int]: | ||||
vlorentz: It is, but it doesn't matter. If an origin gets inserted twice at the same time, both entries… | |||||
Done Inline Actionsok thx douardda: ok thx | |||||
Done Inline ActionsNow I remember: what may be inconsistent is the number of origin added retruned in the status dict {"origin:add": n} but I guess it's not really a problem. douardda: Now I remember: what may be inconsistent is the number of origin added retruned in the status… | |||||
Not Done Inline Actionsyup. all stats are approximates anyway vlorentz: yup.
all stats are approximates anyway | |||||
results = [] | # XXX this looks wrong to me; I'm afraid there is a race condition | ||||
for origin in origins: | known_origins = self.origin_get(origins) | ||||
to_add = [origin for origin in origins if origin not in known_origins] | |||||
self.journal_writer.origin_add(to_add) | |||||
for origin in to_add: | |||||
Not Done Inline ActionsRemove the brackets around the call to origin_get vlorentz: Remove the brackets around the call to `origin_get` | |||||
Done Inline Actionsthx again douardda: thx again | |||||
Not Done Inline ActionsTo repeat what i've said on irc, I think that filtering might be probably the root cause of the current tests failing... ardumont: To repeat what i've said on irc, I think that filtering might be probably the root cause of the… | |||||
self.origin_add_one(origin) | self.origin_add_one(origin) | ||||
results.append(origin.to_dict()) | return {"origin:add": len(to_add)} | ||||
return results | |||||
def origin_add_one(self, origin: Origin) -> str: | def origin_add_one(self, origin: Origin) -> str: | ||||
known_origin = self.origin_get_one(origin.to_dict()) | known_origin = self.origin_get_one(origin.to_dict()) | ||||
if known_origin: | if known_origin: | ||||
origin_url = known_origin["url"] | origin_url = known_origin["url"] | ||||
else: | else: | ||||
self.journal_writer.origin_add([origin]) | self.journal_writer.origin_add([origin]) | ||||
▲ Show 20 Lines • Show All 371 Lines • Show Last 20 Lines |
It is, but it doesn't matter. If an origin gets inserted twice at the same time, both entries have the same primary key, so only the latter is kept.