Changeset View
Standalone View
swh/indexer/sql/40-swh-func.sql
Show First 20 Lines • Show All 340 Lines • ▼ Show 20 Lines | |||||
as $$ | as $$ | ||||
create temporary table tmp_revision_metadata ( | create temporary table tmp_revision_metadata ( | ||||
like revision_metadata including defaults | like revision_metadata including defaults | ||||
) on commit drop; | ) on commit drop; | ||||
$$; | $$; | ||||
comment on function swh_mktemp_revision_metadata() is 'Helper table to add revision metadata'; | comment on function swh_mktemp_revision_metadata() is 'Helper table to add revision metadata'; | ||||
-- create a temporary table for retrieving origin_intrinsic_metadata | |||||
zack: Do we really need this? The temporary table + COPY trick is something we use in heavy-duty… | |||||
Not Done Inline ActionsYes, it's consistent with how we add stuff in the indexer storage. ardumont: Yes, it's consistent with how we add stuff in the indexer storage.
We can diverge from this… | |||||
Done Inline ActionsI copied what was done for revision metadata. I discussed that issue with @ardumont and he agrees this should be removed. Once this Diff is accepted, I'll create a new one to remove temporary tables. vlorentz: I copied what was done for revision metadata. I discussed that issue with @ardumont and he… | |||||
Not Done Inline ActionsWell, i missed to check back the old context prior to saying anything. If we agree it's enough to use simpler patterns here (at least for revision), we can indeed change that. ardumont: Well, i missed to check back the old context prior to saying anything.
We no longer use… | |||||
create or replace function swh_mktemp_origin_intrinsic_metadata() | |||||
returns void | |||||
language sql | |||||
as $$ | |||||
create temporary table tmp_origin_intrinsic_metadata ( | |||||
like origin_intrinsic_metadata including defaults | |||||
) on commit drop; | |||||
$$; | |||||
comment on function swh_mktemp_origin_intrinsic_metadata() is 'Helper table to add origin intrinsic metadata'; | |||||
create or replace function swh_mktemp_indexer_configuration() | create or replace function swh_mktemp_indexer_configuration() | ||||
returns void | returns void | ||||
language sql | language sql | ||||
as $$ | as $$ | ||||
create temporary table tmp_indexer_configuration ( | create temporary table tmp_indexer_configuration ( | ||||
like indexer_configuration including defaults | like indexer_configuration including defaults | ||||
) on commit drop; | ) on commit drop; | ||||
alter table tmp_indexer_configuration drop column id; | alter table tmp_indexer_configuration drop column id; | ||||
Show All 17 Lines | begin | ||||
return query | return query | ||||
select id, tool_name, tool_version, tool_configuration | select id, tool_name, tool_version, tool_configuration | ||||
from tmp_indexer_configuration join indexer_configuration | from tmp_indexer_configuration join indexer_configuration | ||||
using(tool_name, tool_version, tool_configuration); | using(tool_name, tool_version, tool_configuration); | ||||
return; | return; | ||||
end | end | ||||
$$; | $$; | ||||
-- add tmp_origin_intrinsic_metadata entries to origin_intrinsic_metadata, | |||||
-- overwriting duplicates if conflict_update is true, skipping duplicates | |||||
-- otherwise. | |||||
-- | |||||
-- If filtering duplicates is in order, the call to | |||||
-- swh_origin_intrinsic_metadata_missing must take place before calling this | |||||
-- function. | |||||
-- | |||||
-- operates in bulk: 0. swh_mktemp(content_language), 1. COPY to | |||||
-- tmp_origin_intrinsic_metadata, 2. call this function | |||||
create or replace function swh_origin_intrinsic_metadata_add( | |||||
conflict_update boolean) | |||||
returns void | |||||
language plpgsql | |||||
as $$ | |||||
begin | |||||
if conflict_update then | |||||
insert into origin_intrinsic_metadata (origin_id, metadata, indexer_configuration_id, from_revision) | |||||
select origin_id, metadata, indexer_configuration_id, from_revision | |||||
from tmp_origin_intrinsic_metadata | |||||
on conflict(origin_id, indexer_configuration_id) | |||||
do update set metadata = excluded.metadata; | |||||
else | |||||
insert into origin_intrinsic_metadata (origin_id, metadata, indexer_configuration_id, from_revision) | |||||
select origin_id, metadata, indexer_configuration_id, from_revision | |||||
from tmp_origin_intrinsic_metadata | |||||
on conflict(origin_id, indexer_configuration_id) | |||||
Not Done Inline ActionsQuestion about update politics: moranegg: Question about update politics:
when `conflict update` is true, you update origin no matter… | |||||
Done Inline Actions
Yes, that's because the indexer_configuration_id is part of the PK. vlorentz: > but if the tool is different it creates a new entry?
Yes, that's because the… | |||||
Not Done Inline ActionsClarifying the initial attempt: And a conflict is raised only when an entry already exists for the same object (here origin) and the same tool (up to its version). Now, answering you ;)
only if a conflict exist, yes.
yes
yes ardumont: Clarifying the initial attempt:
`conflict_update` to true means update on conflict (in effect… | |||||
do nothing; | |||||
end if; | |||||
return; | |||||
end | |||||
$$; | |||||
comment on function swh_origin_intrinsic_metadata_add(boolean) IS 'Add new origin intrinsic metadata'; |
Do we really need this? The temporary table + COPY trick is something we use in heavy-duty tables, and in particular Merkle DAG nodes, because we add massive amounts of data there, which are potentially duplicated a lot across parallel workers. Metadata are much less heavy-duty than that, and I suppose that simple inserts (with proper ON CONFLICT handling) would be enough. Or am I missing something?
OTOH if it makes the coder more consistent with everything else, I'm not strongly opposed to having this. Just keep in mind that all these temporary tables will make it harder to migrate to a non-SQL implementation of the data model the day we come to that.