[Tf-curation] TF Metadata Curation F2F meeting in Vienna - Material

Durco, Matej Matej.Durco at oeaw.ac.at
Mon Jan 29 14:47:12 CET 2018


Dear all,

the main topic of the meeting will be the value normalization of the facets in the VLO (most prominently the ResourceType (or resourceClass) facet).
We (the vlo-dev team) gave it some thinking over the last  weeks and compiled some information that should serve as base for our discussions.
So if you still have some time, in preparation for the meeting you could have a look at the following documents:

Overview of the different mapping situations/variants:
https://github.com/clarin-eric/VLO-mapping/blob/value-mapping-documentation/doc/ValueMapping.adoc

Updated info on resourceType facet - especially a target controlled vocabulary to normalize against:
https://trac.clarin.eu/wiki/Taskforces/Curation/ValueNormalization/ResourceType

We will explain the details of the envisaged workflow tomorrow, but the basic idea is to work collaboratively on csv files in a dedicated git-repo.
Two samples of such CSV files are available on github in the VLO-mapping repo:
https://github.com/acdh-oeaw/VLO-mapping/blob/master/value-maps/resourceclass.csv
lists all (~350) values currently encountered in the resourceClass facet in VLO and for some - the more obvious ones - it proposes a normalization value.

https://github.com/acdh-oeaw/VLO-mapping/blob/master/value-maps/profileName2resourceClass.csv
As we concluded / identified earlier, one  big issue we have next to variability of the terms  is missing coverage.
For resource type around 500.000(!) records don't have a resource type information at all.
I proposed earlier to try that the resource type can often be deduced from the profile used.
The above mapping file is exactly an attempt into that direction. Employing, what we call "Cross facet mapping" (CFM) it would allow to introduce a normalized value for resource type based on the profileName. Of course, CFM can be employed in general for any combination of facets (for example we may want to deduce value in the genre or subject facet from information in resourceClass) and allows to populate multiple "target facets" based on information from one facet.

I plan to take this two files as starting point for a hands-on session, where we could try to populate these mappings in small groups, both to test the workflow and to finally have the mappings to get to a nice  clean resourceType/Class facet.

If there are any questions, I guess best you spare them for in person discussion in the upcoming three days, but if there is anything urgent you can contact me any time.  ;)

See you all tomorrow.

Best,
Matej


From: Durco, Matej
Sent: Donnerstag, 25. Jänner 2018 11:32
To: Durco, Matej <Matej.Durco at oeaw.ac.at>; tf-curation at lists.clarin.eu; tf-cmdi at lists.clarin.eu; Trognitz, Martina <Martina.Trognitz at oeaw.ac.at>; Sugimoto, Go <Go.Sugimoto at oeaw.ac.at>; Fišer, Darja <Darja.Fiser at ff.uni-lj.si>; Lenardič, Jakob <Jakob.Lenardic at ff.uni-lj.si>; Susanne Haaf <haaf at bbaw.de>; Sauer, Wolfgang <Wolfgang.Sauer at oeaw.ac.at>; Resch, Stefan <Stefan.Resch at oeaw.ac.at>
Subject: RE: TF Metadata Curation F2F meeting in Vienna - Agenda

Dear all,

I realized that I did not share the attendance spreadsheet with you, making it hard for you to write yourself in. That is now corrected, so you should be able to write. Sorry for the inconvenience.

Best,
Matej

From: Durco, Matej
Sent: Donnerstag, 25. Jänner 2018 07:57
To: 'Durco, Matej' <Matej.Durco at oeaw.ac.at<mailto:Matej.Durco at oeaw.ac.at>>; tf-curation at lists.clarin.eu<mailto:tf-curation at lists.clarin.eu>; tf-cmdi at lists.clarin.eu<mailto:tf-cmdi at lists.clarin.eu>; Trognitz, Martina <Martina.Trognitz at oeaw.ac.at<mailto:Martina.Trognitz at oeaw.ac.at>>; Sugimoto, Go <Go.Sugimoto at oeaw.ac.at<mailto:Go.Sugimoto at oeaw.ac.at>>; Fišer, Darja <Darja.Fiser at ff.uni-lj.si<mailto:Darja.Fiser at ff.uni-lj.si>>; 'Lenardič, Jakob' <Jakob.Lenardic at ff.uni-lj.si<mailto:Jakob.Lenardic at ff.uni-lj.si>>; 'Susanne Haaf' <haaf at bbaw.de<mailto:haaf at bbaw.de>>; Sauer, Wolfgang <Wolfgang.Sauer at oeaw.ac.at<mailto:Wolfgang.Sauer at oeaw.ac.at>>; Resch, Stefan <Stefan.Resch at oeaw.ac.at<mailto:Stefan.Resch at oeaw.ac.at>>
Subject: TF Metadata Curation F2F meeting in Vienna - Agenda

Dear all,

finally, here some detailed information on our meeting on Metadata Curation next week in Vienna.

Meeting venue is the Museumszimmer in 2nd Floor of the main building of the Academy
Dr. Ignaz Seipel-Platz 2<https://www.google.at/maps/place/Doktor-Ignaz-Seipel-Platz+2,+1010+Wien/@48.2088405,16.3749958,17z/data=!3m1!4b1!4m5!3m4!1s0x476d079fed6ec5d9:0x9c5bf347f74dcf65!8m2!3d48.2088405!4d16.3771845>
(While our offices are around the corner at Sonnenfelsgasse 19)


Here is the tentative agenda:
https://trac.clarin.eu/wiki/Taskforces/Curation/Meetings/2018-01-30

And for ease of access also copied here:
https://docs.google.com/document/d/1CZw-RyDsRnaJFdEP_1E08YL6FtOmVf8BtE3yhP1gENQ/edit#<https://docs.google.com/document/d/1CZw-RyDsRnaJFdEP_1E08YL6FtOmVf8BtE3yhP1gENQ/edit>

There will be some more details and pointers to more information by Monday.

There is an attendance scheduling spreadsheet:
https://docs.google.com/spreadsheets/d/1nN_90b13xZC7o-2kTEYl8I1-iwBP105qfOSFgywKY2Y/edit#gid=0
in which I would kindly ask you to indicate when are you arriving and leaving and if you are joining us for the dinner.

Please don't hesitate to contact me, if there is anything else you need to know

Best regards,
Matej
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.clarin.eu/pipermail/tf-curation/attachments/20180129/814aa6b5/attachment.html>


More information about the Tf-curation mailing list