[Tf-curation] TF Curation - meeting in Utrecht -

Durco, Matej Matej.Durco at oeaw.ac.at
Tue May 3 07:08:00 CEST 2016


Dear Florian,

thank you for the extensive feedback also from my side.

Davor already answered some of the issues.

I would like to comment on the major issue of the hierarchical setup
This is indeed a very good point and there has been some great deal of discussion to this over the last years.
This was (or still is) an issue also in the VLO itself - even though now the hierarchy is being represented meanwhile within the detail view, in the flat structure of the facetted browser "all records sit next to each other".

As Davor said, this is nothing that can be solved easily/quickly,
but we (Davor) would raise the issue in Utrecht, to at least discuss the possibilities to solve it.

Best,
Matej

From: Ostojic, Davor
Sent: Friday, April 29, 2016 12:21 PM
To: Florian Schiel <schiel at phonetik.uni-muenchen.de>; Durco, Matej <Matej.Durco at oeaw.ac.at>
Cc: Draxler Christoph <draxler at phonetik.uni-muenchen.de>; Nina Pörner <nina.poerner at gmail.com>
Subject: RE: [Tf-curation] TF Curation - meeting in Utrecht

Dear Florian,

Thank you for your feedback. We are trying to improve the application and we are really appreciate it.

Regarding minor things part of your email:

>>The module report a missing facet 'rightsHolder' on instances of profile 'media-corpus-profile', but the profile (and the instances I tested) contains the element <Owner> which is linked to concept
>> http://hdl.handle.net/11459/CCR_C-2956_519a4aab-2f76-0fd3-090e-f0d6b81a7dbb
Facet "rightsHolder" is currently mapped only to the following concepts:

·        http://www.isocat.org/datcat/DC-6709

·       http://hdl.handle.net/11459/CCR_C-6709_cb3572ed-ffd3-04f1-c145-b9c1f26bfc82

·       http://purl.org/dc/terms/rightsHolder

Mapping file used in production ca b fined at [1]. This file contains also descriptions for facet.
Perhaps we have to update this file with the concept "legalOwner".


>>In some CMDI instances the module reports broken links in the Resource section, for instance 37 in https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ALC/ALC.2.php?format=cmdi
>>I checked all the URLs listed in this CMDI instance and they all are valid.

I am aware of this problem. It happens due to the high concurrency. Response timeout is set to 5s but sometimes servers need more time.
I will try to find a better way to do link validation.

Regarding major issues :

>>1. Since we encourage users to build redundant-free hierarchical MD structures in the CMD framework, would it be possible that the curator module follows hierarchies (if they are there) all the way to the top and add the encountered >>MD to the MD content of the CMDI?

I was not familiar with this case when I started with development but its sound like a crucial thing. I will create a new issues for the first (bottom up) case but I cannot say when I will have time to implement it.


>>2. What about the top-down way? When analysing a level 1 CMDI, this will contain no MD facets that are specific for the single resources, such as Date. Consequently the curator module should identify ResourceLinks of type 'Metadata', >>and add all the MD of the linked level 2 CMDIs.
This case I am afraid won't be (and can't be) supported. The web application is meant for curation of a single instance or profile. The library in the backend can validate collections or to work in a batch mode.

>>One possible solution would be to recognize that a CMDI instance has ResourceLinks of type 'Metadata', and if so, treat it differently than CMDI instances that don't.
Could you explain this a bit better or do you have any concrete proposal?
In the future when you experience any problems with curation module you can create an issue by yourself at [2] or you can write to us directly.


Best,
Davor

[1] https://raw.githubusercontent.com/clarin-eric/VLO/master/vlo-commons/src/main/resources/facetConcepts.xml
[2] https://github.com/clarin-eric/clarin-curation-module/issues



From: Florian Schiel [mailto:schiel at phonetik.uni-muenchen.de]
Sent: Friday, April 29, 2016 9:09 AM
To: Durco, Matej <Matej.Durco at oeaw.ac.at<mailto:Matej.Durco at oeaw.ac.at>>
Cc: Ostojic, Davor <Davor.Ostojic at oeaw.ac.at<mailto:Davor.Ostojic at oeaw.ac.at>>; Draxler Christoph <draxler at phonetik.uni-muenchen.de<mailto:draxler at phonetik.uni-muenchen.de>>; Nina Pörner <nina.poerner at gmail.com<mailto:nina.poerner at gmail.com>>
Subject: Re: [Tf-curation] TF Curation - meeting in Utrecht


Dear Matej, dear Davor,

I tested the curation module interface with different instances and profiles. It works very well!

But when testing CMDI instances nested in a hierarchy I encountered the following (conceptional?)  problem:
Each CMDI instance is tested by the module in isolation. Why is this a problem?

Consider for example a 2-level hierarchy of metadata: on the first level (corpus level) the metadata of  a complete collection of resources is stored as in [1]; on the second level (that is linked as resources of type 'Metadata') in the first level) the metadata of a single resource is stored as in [2]. To avoid massive replication, MD that concern all members of the collection are only stored in the first level, for example availabilty.
When analysing a single CMD instance of the second level, we can't find this information in the CMDI. But what we find is a pointer to the upper level, namely the IsPartOf entry in the CMDI header.

So, I guess my questions are:

1. Since we encourage users to build redundant-free hierarchical MD structures in the CMD framework, would it be possible that the curator module follows hierarchies (if they are there) all the way to the top and add the encountered MD to the MD content of the CMDI?

2. What about the top-down way? When analysing a level 1 CMDI, this will contain no MD facets that are specific for the single resources, such as Date. Consequently the curator module should identify ResourceLinks of type 'Metadata', and add all the MD of the linked level 2 CMDIs.
This may in principle the right thing to do, but in reality it is of course not feasible, since some collections contain millions of resources.
One possible solution would be to recognize that a CMDI instance has ResourceLinks of type 'Metadata', and if so, treat it differently than CMDI instances that don't.

I hope I made myself clear. If not, don't hesitate to ask back.

--------------

Some minor things I noticed:

The module report a missing facet 'rightsHolder' on instances of profile 'media-corpus-profile', but the profile (and the instances I tested) contains the element <Owner> which is linked to concept  http://hdl.handle.net/11459/CCR_C-2956_519a4aab-2f76-0fd3-090e-f0d6b81a7dbb

In some CMDI instances the module reports broken links in the Resource section, for instance 37 in
https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ALC/ALC.2.php?format=cmdi
I checked all the URLs listed in this CMDI instance and they all are valid.

Finally, it would be very useful for the maintainer to have a list of facet definitions, for instance I couldn't figure out what exactly is 'distributionType' and to which concept is this facet linked?


Best regards,

Florian


[1] https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ALC/ALC.2.php?format=cmdi
[2] https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ALC/ses1009.2.cmdi.xml



On 28.04.2016 15:27, Durco, Matej wrote:
Dear Florian,

thanks! :)
that would be either me, or my colleague davor.ostojic at oeaw.ac.at<mailto:davor.ostojic at oeaw.ac.at>

Best,
Matej

From: Florian Schiel [mailto:schiel at phonetik.uni-muenchen.de]
Sent: Thursday, April 28, 2016 2:29 PM
To: Durco, Matej <Matej.Durco at oeaw.ac.at><mailto:Matej.Durco at oeaw.ac.at>
Subject: Re: [Tf-curation] TF Curation - meeting in Utrecht

Dear Matej,

this is very useful. Thank you for that!
Since I cannot be at the meeting, to whom (or which address) I can send (constructive) feedback about the curation module?

Best,

Florian
On 28.04.2016 10:57, Durco, Matej wrote:
Dear all,

During the CLARIN Centre meeting in Utrecht, there is space on the 11 May afternoon for TF meetings.
We would like to take this opportunity to meet with you (the curation taskforce), to report about the recent developments and discuss possible further steps.
Could you please indicate who of you will be there and could join the meeting?
(It would be either 13:30-15:00 or 15:00/30 - 17:00, depending on coordination with other TFs, especially CMDI)

Meanwhile, you can already have a look at the main progress achieved.
A first version of the curation module has been delivered by my colleague Davor Ostojic, as task within CLARIN-PLUS project.
It is described in clarin-trac [1]
And there is also a simple web-application to try out:
https://clarin.oeaw.ac.at/curate/

Looking forward to any comments/feedback.

Thank you

Best,
Matej

[1] https://trac.clarin.eu/wiki/Curation%20Module


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/tf-curation/attachments/20160503/20421d5c/attachment-0001.html>


More information about the Tf-curation mailing list