Event Timeline
Comment Actions
For the discussion tomorrow about the entity table vs origin_metadata table.
I also added a draft for the external_metadata table to have it in mind as well.
Comment Actions
New day.. new question:
Should we keep more than one translation of origin_metadata?
if so, we should break the table above into two tables:
- origin_metadata
- origin_ metadata_translation
Here is the discussion on devel:
|15:49:27 morane_ | keeping raw_metadata found with lister/loader/deposit/external_catalog │ │15:49:57 morane_ | and translating the raw_metadata into the CodeMeta vocabulary │ │15:50:37 morane_ | with a tool that we keep in indexer_configuration and reference by indexer_configuration_id │ │15:51:12 morane_ | should this id be part of the PK of the table? │ │15:51:23 olasd | no │ │15:52:09 morane_ | I thought so too, but i'm keeping it as PK in content_metadata and revision_metadata │ │15:52:25 morane_ | so i'm searching for the why not │ │15:53:22 olasd | considering your schema, what would be the point? │ │15:53:51 olasd | as far as I can tell you can have however many origin_metadata entries you want per origin │ │15:54:12 morane_ | yes right │ │15:54:18 olasd | which is not the case for content_metadata, which is keyed using the content_id │ │15:54:22 olasd | or should be in any case │ │15:54:50 morane_ | so the PK is the object's id (object= origin_metadata) │ │15:55:16 morane_ | but the raw_metadata it contains can be translated by different tools │ │15:55:35 morane_ | where all the rest of the information stays the same │ │15:56:43 morane_ | (easier to keep only object id as PK anyway, cause we can translate directly when captured or with a delay) │ │15:58:14 olasd | if the raw_metadata is the same and gets translated several times, then (considering we want to keep the data normalized) you should make an ancillary table for the │ │ | translated metadata entries │ │15:58:49 olasd | and _that_ ancillary table can be keyed with the pair (origin_metadata_id, indexer_configuration_id) │ │15:58:55 olasd | does that make sense? │ │15:59:47 morane_ | it does, i thought of that but it seemed like adding another metadata table to the mix │ │16:00:11 olasd | well │ │16:00:33 olasd | do we really want to keep several different translations for the same raw metadata │ │16:00:40 olasd | that's the main question, I believe │ │16:04:58 morane_ | i think we don't, because the most recent translation should be the most accurate translation │ │16:05:11 morane_ | so we shouldn't break into 2 tables │ │16:05:52 morane_ | and just update the tool used when updating translation (but this is against keeping everything at any time) │ │16:06:35 morane_ | on the other hand we can reproduce the same result with an older version for example... ahhhh i don't know │16:09:00 ardumont | > because the most recent translation should be the most accurate: why? how do you determine what's the most accurate? │ │16:11:00 morane_ | haha you are right │ │16:12:23 morane_ | i imagine that with a new tool or newer version we improve the translation