GEMET
General
Multilingual
Environmental
Thesaurus

About GEMET - GEneral Multilingual Environmental Thesaurus

1. GEMET in 2021 – principles and procedures behind updates and further developments

EEA and Eionet - the institutional environmental network of almost 40 European countries - are committed to update GEMET as a source of common and relevant terminology used under the ever-growing environmental agenda. In further developing GEMET we have established over the years a set of procedures and principles for GEMET updates:

Technical approach and methodology:

The latest version 4.2 is updated with 45 new terms in 25 languages largely based on EEA’s SOER 2020 and the latest work in the area of sustainable finance.

Changes after the 2004 version of GEMET are documented in the application under history of changes. During the past 17 years, languages were added based on mutual agreement with the participating countries. Some of these editions including translation of definitions into the newly added languages. New terms were added in 2012, 2017, 2018 and 2020/21. Since version 3.1 in 2017, each update can be downloaded by interested users for own maintenance purposes should they not use the GEMET API.

2. Earlier versions of GEMET

2004 version

This version was an extension of the 2001 version of GEMET. It includes the Czech, Estonian and Polish translations as national contributions and the entire content is embedded in a modern Internet application. Definitions are available in English and the Bulgarian, Russian and Slovenian translations are added. The application is browsable through this GUI and it is also available as a webservice for those who want to link it to their application. How to do this is explained under “GEMET web service”. The content of GEMET will need quality assurance around various terms over time. This is part of another initiative which is about to provide GEMET as open content in the WIKI set of Internet services (Wiktionary ...).

2001 version

The 2001 version provided Bulgarian, Russian and Slovenian as new languages. This could be achieved through the kind co-operation with National Focal points and other expert organisations in these countries. These translations are contribution of the countries to EEA's work programme and have been financed through national funds or with additional funds from outside the EEA scheme. As special remark has to be made regarding the Russian version due to the fact that Russia is not part of Eionet. The translation of GEMET terms into Russian has been funded by the United Nations Environment Programme (UNEP) and carried out in the International Centre for Scientific and Technical Information within a respective Memorandum of Understanding. The 2001 version of GEMET also incorporates changes provided for the Portuguese and Swedish language. It also sees an inclusion of the Basque (Euskara) language into the ThesShow browser - this inclusion has not been possible in the year 2000 version. The EEA highly appreciates all these contributions. The content of GEMET has not been changed to assure consistency in use between the versions. There are plans to include more European languages in the years to come as well as to perform a thorough evaluation of the content.

First versions

After the distribution of different intermediate versions, GEMET versions 0.5, 1.0 and 1.5 have been subjected to an extensive work by CNR and UBA that led to the version 2.0. More translations and some corrections formed version 2000. Aiming at completing the European languages in GEMET, Basque, Bulgarian, Russian and the Slovenian language could be added to this version. No changes have been performed regarding the term count or the hierarchy. Please find a report on this work in Annex 4.

The remarks and suggestions provided by the colleagues of Belgium, Sweden, Portugal, Norway, Austria, France, by the ETC for Nature Conservation and by US EPA and by UNEP GRID Geneva have been taken into account. Most of them have been applied to GEMET, to the extent they were not interfering with:

The resulting version presents:

The following table summarises the structural elements of GEMET (status version1 of 1997).

Structure elements

No

Super Groups

3

Groups

30

Accessory Groups

5

Themes

40

Top Terms (TT)

109

Narrower Terms (NT)

5.189

Total descriptors (TT + NT)

5.298

Total non-descriptors

1.264

Total records

6.562

Due to the lack of a complete list of equivalents for all the above mentioned languages, the two alphabetical and the multilingual lists will be presented in this printed version only with British English as the filing language. Separate lists for the other languages will be available by the EEA on request.

3. Initial development and structure of GEMET

This introduction refers to the 2001 version.

GEMET, the GEneral Multilingual Environmental Thesaurus, has been developed since 1995 as an indexing, retrieval and control tool for the European Topic Centre on Catalogue of Data Sources (ETC/CDS) and the European Environment Agency (EEA), Copenhagen. The work has been carried out through a contract between the EEA and the ETC/CDS which was led by the Ministry of the Environment of Lower Saxony, includes members of Germany, Austria, Italy, Sweden and benefits of the collaboration of other member countries of the European Union (EU), as well as of UNEP Infoterra.

The basic idea for the development of GEMET was to use the best of the presently available excellent multilingual thesauri, in order to save time, energy and funds. GEMET was conceived as a “general” thesaurus, aimed to define a common general language, a core of general terminology for the environment. Specific thesauri and descriptor systems (e.g. on Nature Conservation, on Wastes, on Energy, etc.) have been excluded from the first step of development of the thesaurus and have been taken into account only for their structure and upper level terminology.

GEMET has been compiled by merging the terms of the following multilingual documents:

  1. A selection of the “Umwelt Thesaurus” of Umweltbundesamt (UBA), Berlin, 1995, with more than 2.000 descriptors out of 8.500 in German and English.
  2. The complete “Thesaurus Italiano per l'Ambiente (TIA)” quadrilingual version on CD-ROM of Consiglio Nazionale delle Ricerche (CNR), Rome, 1994, with more than 4.000 descriptors in Italian, English, Dutch and German and a selection of more than 2.000 descriptors of this thesaurus, compiled as a Classification Scheme for the MET of the EEA, 1995 (see the following No. 3).
  3. The complete “Multilingual Environment Thesaurus (MET)” of Nederlands Bureau voor Onderzoek Informatie (NBOI), Amsterdam, developed on the Dutch “Milieu-thesaurus” for the EEA in 1995, with more than 2.300 descriptors in Dutch, Danish, English, French, German, Italian, Norwegian and Spanish.
  4. The complete “EnVoc Thesaurus”, of UNEP Infoterra, 1997 edition, with about 2.000 descriptors in English, French and Spanish, with possibility of access to Arabic, Chinese and Russian.
  5. The complete “Thesaurus de Medio Ambiente” on CD-ROM of Ministerio de Obras Publicas, Transportes y Medio Ambiente (MOPTMA), Madrid, 1995, with more than 2.600 descriptors in Spanish, English, French, German.
  6. The complete “Lexique environnement - Planète”, of the Ministère de l'environnement, Paris, 1995, with more than 5.000 descriptors in French and English.
  7. Descriptors of relevant documents of the EEA, namely “Europe's Environment, The Dobris Assessment”, the “DPSIR Data Flow Scheme”, as well as terminology of ETCs and Eionet, in English.
  8. Descriptors of the “Thesaurus Eurovoc” of the European Parliament, Brussels, 1996, in French, English, Dutch, German, Italian, and Spanish, with possibility of access to Danish, Greek, and Portuguese.

The merging has been performed both on conceptual and formal basis. Coinciding concepts in the different thesauri have been identified and scored. Like in other multilingual thesauri, e. g. Infoterra EnVoc, a neutral alphanumerical notation allows the identification of a concept independently on the user's language.

The links with the original thesauri are ensured by the respective identifiers or code notations.

Following the identification of the coinciding concepts, a selection was made by the experts of the National Focal Points of the organisations involved.

The resulting 6.562 terms have been arranged in a classification scheme made of 3 super-groups, 30 groups plus 5 accessory, instrumental groups. Each descriptor has been arranged in a hierarchical structure headed by a Top Term. The level of poly-hierarchy, i.e. the allocation of a descriptor to more than one group, has been kept to a minimum. Further, to allow a thematic retrieval of terms thematically related but scattered in different groups, a set of 40 themes have been agreed upon with the EEA and each descriptor has been assigned to as many themes as necessary. Thus, the user can access the thesaurus through the group-hierarchical list, through the thematic list or through the alphabetical list. As a complement to the hierarchical “vertical” relations, an exhaustive series of strong “horizontal” relations between terms (RT, Related Terms) have been introduced. A progressive Line Number has been assigned to each descriptor of the systematic list, in order to help the user of the lists to identify the descriptor in the different lists. The Line Number is merely a neutral identifier for the present version.

The GEMET size, formerly figured at about 200000 descriptors, rose to more than 5.000 in the course of merging, due to the limited overlapping between the different thesauri, to constraints of the selection work carried out by the parental organisations and to a few new additions, mainly from CDS indexing work.

The present version 2001 of GEMET is the result of a close collaboration between CNR and UBA under contract and supervision of the ETC/CDS. It presents 5.298 descriptors, including 109 Top Terms, and 1.264 synonyms in English. The 5.524 terms belonging to the parental thesauri and not included in GEMET, constitute an accessory alphabetical list of free terms.

British English has been proposed as language of choice for the EEA, but the American English equivalents have been added through a collaboration with the US Environmental Protection Agency (EPA).

The present Version 2001 of GEMET provides a complete numerical equivalence (all the descriptors have an equivalent) with the following languages: Basque, Bulgarian, Dutch, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Russian, Slovenian and Spanish. For Danish, Slovak, Swedish and Greek some few descriptors are still missing - this issue is presently under work. The semantic equivalence (correct correspondence of meaning between languages) has been separately ensured by the NFPs experts for Dutch, French, German, Italian, Norwegian, Portuguese and almost completely for Spanish. Equivalence in Finnish is not yet validated. The translation of GEMET into other languages, both extra-EU and extra-European is foreseen in the future.

The need to ensure the internal systematic and linguistic coherence of the thesaurus led the GEMET Working Group to foster the endowment of all the descriptors with a consistent set of definitions. There are at present more than 4.000 definitions available, which provide a useful glossary function where the semantic of the thesaurus structure might not be completely caught. The sources of definitions are presented in Annex 1.

GEMET follows the ISO norms on monolingual and multilingual thesauri.

At times, a GEMET version 2001 was published on CD-ROM in October 2001. Printed editions and editions in Adobe-Acrobat readable .PDF format were also published from 1997 throughout 2000.  

4. Criteria for the allocation of terms to the groups and themes of GEMET

GEMET has two systems for arranging the descriptors:

A classification scheme of 3 super-groups containing 30 groups; there are in addition 5 accessory groups of terms, instrumental to the thesaurus use. The super-groups have been adopted to approach an environmental management perspective and to help the hierarchical structuring of GEMET. The groups reflect a systematic, category- or discipline-oriented perspective. Within the groups, the descriptors are basically allocated in a mono-hierarchical order, but several descriptors needed to be allocated to more than one group or to more than a broader term inside the same group, thus creating a condition of poly-hierarchy.

Hierarchical relationships are either:

generic relationships (the narrower term has all the characteristics of the broader term and at least one additional characteristic)

Example:

trees

 

NT

deciduous trees

 

NT

conifers

or whole-part relationships (the narrower term must be part of the broader term)

Example:

trees

 

NT

tree trunks

 

NT

treetops

If both generic and whole-part relationships exist in connection with a term, this results in a poly-dimensional subdivision. For the sake of clarity and taking into consideration that GEMET deals mainly with generic relationships, both relationships are treated as equals in the thesaurus.

Example:

trees

 

NT

tree trunks

 

NT

treetops

 

NT

deciduous trees

 

NT

conifers

Hierarchical relationships exist between terms belonging to the same logical categories. Every term can possess several broader terms (polyhierarchy)

Example:

sulphuric acid

 

BT

sulphur compounds

 

BT

acid

A thematic order, containing 40 themes. These themes have been established according to practical considerations, corresponding to the information needs. They have been developed to reflect the EEA activities in order to support the thematic elements of the EEA DPSIR Dataflow Scheme. The list of themes has taken into account all the main topics of the Scheme, of The Dobris Assessment and of other sources, like ETCs (European Topic Centres) and Eionet (Environmental Information and Observation Network). They can be used as checklists when dealing with environmental matters. The themes, being complementary to the groups, confer to the thesaurus a matrix structure.

The main principles followed for the allocation of descriptors were:

  1. A descriptor is usually allocated to one group;
  2. When necessary, a descriptor can be assigned to more than one group or to more than a broader term inside the same group (poly-hierarchy).
  3. A descriptor can be allocated to more than one theme (“poly-thematic” condition).
  4. A descriptor should be allocated to all the (relevant) themes to which it belongs.
  5. All descriptors belonging to a “Group” of GEMET whose name and content corresponds to a “Theme” will be allocated to that theme.
  6. The non-descriptors (synonyms) are linked to their descriptors.
  7. Descriptors with very general content or those which do not belong exactly to a theme, have been collected in a theme of general character, called “no special theme” (theme: general).
  8. According to the development of GEMET, additional themes might be identified.

The allocation of a descriptor to more than one theme reflects the relation of this term to different subject fields. A non-descriptor, being synonym to a descriptor, belongs to the same group or theme as the descriptor. The synonym guides the user directly to the preferred term, where s/he will find all the necessary information: the fundamental relations like equivalence, hierarchical and associative relationship, and so on.

Unlike in some parental thesauri, the singular form of terms has been preferred throughout the whole thesaurus; only a limited number of terms have been kept in plural form, to prevent change of meaning or to follow the rules of the English language. For the non-English languages, the translators are recommended to follow the same criterion used for English, but are left free to adopt a different form if the meaning of the term is at stake. All the complementary numerical forms (singular to plural, plural to singular) of the terms which can be endowed with such forms, have been entered into the thesaurus file; nevertheless, they will be presented only when they are alphabetically distant from the form presented (e.g.: man _ men).

The thesaurus has also been analysed for the presence of alternate forms and spelling variants, including the prepositional forms. The analysis was restricted to the English equivalents proposed by the parental thesauri, thus it was not extended to the rest of the terms. All these forms have been entered as non-descriptors (synonyms).

The themes have provided the basis for the work on associative relationships (RT, Related Terms).

Because of the restricted use of hierarchical relationships there was a need for another mechanism to draw attention to other terms which an indexer and a searcher should consider. These are RELATED TERMS of the starting term. Associative relationships between terms are relationships which do not correspond to the criteria of hierarchy or equivalence. Associative relationships can be established between terms belonging to different logical categories. From the numerous possible relationships only those relationships are included in GEMET which are considered useful for indexing and searching.

Related terms may be of several kinds:

The established principles will produce a better performance of the indexers when preparing input for the CDS and will allow easier access to the data for the users via the descriptors.

The classification scheme of GEMET, by groups, themes and hierarchies, should be considered merely as a mean to control the thesaurus terms and the semantic relations between them; in other words, as a way to control the internal coherence of the thesaurus. As such, it is not proposed as a general reference pattern for the organisation of any specific environmental information system, although its structure and comprehensive set of meta-concepts (mainly the themes and the top terms) can be fruitfully used for such purpose.

Annex: Essential historic references

de Lavieter, L. (Ed.)

Multilingual Environmental Thesaurus. Part 1, English; Part 2, Français; Part 3, Deutsch; Part 4, Nederlands; Part 5, Italiano; Part 6, Norsk; Part 7, Dansk; Part 8, Español.

NBOI, Nederlandse Bureau voor Onderzoek Informatie / EEA-TF - European Environment Agency - Task Force, Amsterdam, November 1995, pp. (English) vi + A-78; B-112; C-56; D-199, total 445.

Felluga, B., Lucke, S., Palmera, M., Plini, P., de Lavieter, L., Deschamps, J., Eds.

Thesaurus per l'ambiente - Versione quadrilingue / Thesaurus for the Environment - Quadrilingual Version / Milieu-thesaurus - Viertalige vertaling / Thesaurus für die Umwelt.

CNR-SIAM & CNR-UPIS. CD-ROM Edition, Milan, 1994.

(Includes the contents of: Felluga, B., de Lavieter, L., Deschamps, J., Lucke, S., Palmera, M., Eds.

Thesaurus per l'ambiente - Versione trilingue per l'Italia / Thesaurus for the Environment - Trilingual Edition for Italy / Milieu-thesaurus - Drietalige vertaling voor Italië

Edizione pilota, Vol. 1/3, pp. i-xiv + 700; Pilot Edition, Vol. 2/3, pp. xv-xxviii + 684; Proefuitgave, Vol. 3/3, pp. xxix-xxxiv+672, total pp. i+xiv+xxviii+xxxiv + 2056, 1991, Roma, CNR-ITBM.)

Felluga, B. (Ed.)

Multilingual Thesaurus for the Environment. Classification Scheme

CNR, Roma, June 1995, pp. 3 + 90.

Ministère de l'environnement.

Lexique environnement - Planète

Tome 1, Liste alphabétique, pp. 83; Tome 2, Liste thématique, pp. iv + 186. Ministère de l'environnement, Paris, Décembre 1995.

MOPTMA, Ministerio de Obras Públicas, Transportes y Medio Ambiente.

Tesauro Multilingüe de Medio Ambiente.

MOPTMA, Madrid, 1995, CD-ROM Edition, 1995.

(Includes the contents of: MOPU, Ministerio de Obras Publicas y Urbanismo.

Tesauro de medio ambiente.

MOPU, Madrid, 1990, pp. xxxii + 319.)

NERI, National Environmental Research Institute.

Guidelines for data collection for the Dobris+3 Report. Final Draft.

NERI, Copenhagen, pp. 186, September 1996.

Petersen, T. Ed.

AAT - Art and Architecture Thesaurus.

Oxford University Press, New York, Vol. 1, pp. xxix + 455; vol. 2, pp. 533; vol. 3, pp. 586; vol. 4, pp. 586; vol. 5, pp. 546, 1994.

Stanners, D. & Bourdeau, Ph., Eds.

Europe's Environment. The Dobris Assesssment.

EEA - European Environment Agency, Copenhagen, Version 1995-01, pp. xxvi + 616.

UBA - Umweltbundesamt.

Die Umwelt-CD -UMPLIS.

Umweltbundesamt, Berlin, I-1996, CD-ROM Edition + Benutzerhandbuch pp. 110 + Umweltklassifikation, 1993, pp. iii + 12.

(Includes the contents of: UBA, Umweltbundesamt.

Umwelt-Thesaurus und Umwelt Klassifikation.

Umweltbundesamt, Berlin, 1994, pp. v + 11 + 347 + 495 + 150 + 133 + 9, total 1145.)

UNEP, United Nations Environment Programme - Infoterra.

EnVoc - Multilingual Thesaurus of Environmental Terms.

UNEP, Nairobi, May 1997, pp. xix +248.