About GEMET - GEneral Multilingual Environmental Thesaurus

Preface to the 2004 version of GEMET

This version of GEMET is an extension of the 2001 version of GEMET. It includes the Czech, Estonian and Polish translations as national contributions and the entire content is embedded in a modern Internet application. Definitions are available in English and the Bulgarian, Russian and Slovenian translations are added. The application is browsable through this GUI and it is also available as a webservice for those who want to link it to their application. How to do this is explained under “GEMET web service”. The content of GEMET will need quality assurance around various terms over time. This is part of another initiative which is about to provide GEMET as open content in the WIKI set of Internet services (Wiktionary ...).

Please take a look at the text below for information on how GEMET was built, which will inform you about usage and limitations.

Preface of the 2001 version

For the 6th time since 1996 there is a new edition of GEMET, being the reference vocabulary of the European Environment Agency (EEA) and its Network (Eionet). The present version of the controlled vocabulary provides Bulgarian, Russian and Slovenian as new languages. This could be achieved through the kind co-operation with National Focal points and other expert organisations in these countries. These translations are contribution of the countries to EEA's work programme and have been financed through national funds or with additional funds from outside the EEA scheme. As special remark has to be made regarding the Russian version due to the fact that Russia is not part of Eionet. The translation of GEMET terms into Russian has been funded by the United Nations Environment Programme (UNEP) and carried out in the International Centre for Scientific and Technical Information within a respective Memorandum of Understanding. The 2001 version of GEMET also incorporates changes provided for the Portuguese and Swedish language. It also sees an inclusion of the Basque (Euskara) language into the ThesShow browser - this inclusion has not been possible in the year 2000 version. The EEA highly appreciates all these contributions. The content of GEMET has not been changed to assure consistency in use between the versions. There are plans to include more European languages in the years to come as well as to perform a thorough evaluation of the content.

1. Introduction to GEMET

GEMET, the GEneral Multilingual Environmental Thesaurus, has been developed as an indexing, retrieval and control tool for the European Topic Centre on Catalogue of Data Sources (ETC/CDS) and the European Environment Agency (EEA), Copenhagen. The work has been carried out through a contract between the EEA and the ETC/CDS which is led by the Ministry of the Environment of Lower Saxony, includes members of Germany, Austria, Italy, Sweden and benefits of the collaboration of other member countries of the European Union (EU), as well as of UNEP Infoterra.

The basic idea for the development of GEMET was to use the best of the presently available excellent multilingual thesauri, in order to save time, energy and funds. GEMET was conceived as a “general” thesaurus, aimed to define a common general language, a core of general terminology for the environment. Specific thesauri and descriptor systems (e.g. on Nature Conservation, on Wastes, on Energy, etc.) have been excluded from the first step of development of the thesaurus and have been taken into account only for their structure and upper level terminology.

GEMET has been compiled by merging the terms of the following multilingual documents:

  1. A selection of the “Umwelt Thesaurus” of Umweltbundesamt (UBA), Berlin, 1995, with more than 2.000 descriptors out of 8.500 in German and English.
  2. The complete “Thesaurus Italiano per l'Ambiente (TIA)” quadrilingual version on CD-ROM of Consiglio Nazionale delle Ricerche (CNR), Rome, 1994, with more than 4.000 descriptors in Italian, English, Dutch and German and a selection of more than 2.000 descriptors of this thesaurus, compiled as a Classification Scheme for the MET of the EEA, 1995 (see the following No. 3).
  3. The complete “Multilingual Environment Thesaurus (MET)” of Nederlands Bureau voor Onderzoek Informatie (NBOI), Amsterdam, developed on the Dutch “Milieu-thesaurus” for the EEA in 1995, with more than 2.300 descriptors in Dutch, Danish, English, French, German, Italian, Norwegian and Spanish.
  4. The complete “EnVoc Thesaurus”, of UNEP Infoterra, 1997 edition, with about 2.000 descriptors in English, French and Spanish, with possibility of access to Arabic, Chinese and Russian.
  5. The complete “Thesaurus de Medio Ambiente” on CD-ROM of Ministerio de Obras Publicas, Transportes y Medio Ambiente (MOPTMA), Madrid, 1995, with more than 2.600 descriptors in Spanish, English, French, German.
  6. The complete “Lexique environnement - Planète”, of the Ministère de l'environnement, Paris, 1995, with more than 5.000 descriptors in French and English.
  7. Descriptors of relevant documents of the EEA, namely “Europe's Environment, The Dobris Assessment”, the “DPSIR Data Flow Scheme”, as well as terminology of ETCs and Eionet, in English.
  8. Descriptors of the “Thesaurus Eurovoc” of the European Parliament, Brussels, 1996, in French, English, Dutch, German, Italian, and Spanish, with possibility of access to Danish, Greek, and Portuguese.

The merging has been performed both on conceptual and formal basis. Coinciding concepts in the different thesauri have been identified and scored. Like in other multilingual thesauri, e. g. Infoterra EnVoc, a neutral alphanumerical notation allows the identification of a concept independently on the user's language.

The links with the original thesauri are ensured by the respective identifiers or code notations.

Following the identification of the coinciding concepts, a selection was made by the experts of the National Focal Points of the organisations involved.

The resulting 6.562 terms have been arranged in a classification scheme made of 3 super-groups, 30 groups plus 5 accessory, instrumental groups. Each descriptor has been arranged in a hierarchical structure headed by a Top Term. The level of poly-hierarchy, i.e. the allocation of a descriptor to more than one group, has been kept to a minimum. Further, to allow a thematic retrieval of terms thematically related but scattered in different groups, a set of 40 themes have been agreed upon with the EEA and each descriptor has been assigned to as many themes as necessary. Thus, the user can access the thesaurus through the group-hierarchical list, through the thematic list or through the alphabetical list. As a complement to the hierarchical “vertical” relations, an exhaustive series of strong “horizontal” relations between terms (RT, Related Terms) have been introduced. A progressive Line Number has been assigned to each descriptor of the systematic list, in order to help the user of the lists to identify the descriptor in the different lists. The Line Number is merely a neutral identifier for the present version.

The GEMET size, formerly figured at about 200000 descriptors, rose to more than 5.000 in the course of merging, due to the limited overlapping between the different thesauri, to constraints of the selection work carried out by the parental organisations and to a few new additions, mainly from CDS indexing work.

The present version 2001 of GEMET is the result of a close collaboration between CNR and UBA under contract and supervision of the ETC/CDS. It presents 5.298 descriptors, including 109 Top Terms, and 1.264 synonyms in English. The 5.524 terms belonging to the parental thesauri and not included in GEMET, constitute an accessory alphabetical list of free terms.

British English has been proposed as language of choice for the EEA, but the American English equivalents have been added through a collaboration with the US Environmental Protection Agency (EPA).

The present Version 2001 of GEMET provides a complete numerical equivalence (all the descriptors have an equivalent) with the following languages: Basque, Bulgarian, Dutch, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Russian, Slovenian and Spanish. For Danish, Slovak, Swedish and Greek some few descriptors are still missing - this issue is presently under work. The semantic equivalence (correct correspondence of meaning between languages) has been separately ensured by the NFPs experts for Dutch, French, German, Italian, Norwegian, Portuguese and almost completely for Spanish. Equivalence in Finnish is not yet validated. The translation of GEMET into other languages, both extra-EU and extra-European is foreseen in the future.

The need to ensure the internal systematic and linguistic coherence of the thesaurus led the GEMET Working Group to foster the endowment of all the descriptors with a consistent set of definitions. There are at present more than 4.000 definitions available, which provide a useful glossary function where the semantic of the thesaurus structure might not be completely caught. The sources of definitions are presented in Annex 1.

GEMET follows the ISO norms on monolingual and multilingual thesauri.

The thesaurus material, i.e. the terms and their control elements, is managed by the THESmain program, developed by the TBHS, Technisches Büro Hermann Stallbaumer, Vienna, which provides a series of sophisticated functions for handling the poly-hierarchical, poly-thematic and multilingual aspects. The use of GEMET in THESmain is restricted to the developers of the thesaurus, but a user-friendly software program, THESshow (Windows95/98, NT 4.0), is available for the visualisation of GEMET for the lay user. In order to receive ThesShow, please contact the ETC/CDS at From end of 2001, please contact the EEA's Information Centre via e-mail or at

The thesaurus is part of WinCDS - MS-Access based data collection tool for the Catalogue of Data Sources, where it is used for indexing. A further software, GenThes, has been developed for the ETC/CDS by FZI, Forschungszentrum Informatik, Karlsruhe, in order to present GEMET in the Web environment. It functions as Java application to support expert retrieval on the Internet database WebCDS the Catalogue of environment related addresses and data sources. (

GEMET, Version 2001, is published on CD-ROM in October 2001.

The printed edition in Adobe-Acrobat readable .PDF format, mainly intended for the Thesaurus users, will replace the 5 volumes of the version 1.0, published in July 1997 as well as the version 2.0 from August 1999 as well as the version 2000 from September 2000

Volume 1: Systematic List of Descriptors, in English, containing the allocation of descriptors to groups and the poly-hierarchical relations of the descriptors;
Volume 2: Thematic List of Descriptors, in English, containing the allocation of descriptors to themes;
Volume 3: Alphabetical List of Terms, in English, containing descriptors, definitions, scope notes, synonyms, allocation to groups and themes, top terms, broader terms, narrower terms, related terms;
Volume 4: Concordance, i.e. the Alphabetical List of Descriptors and Non-Descriptors in permuted form;
Volume 5: Multilingual List of Descriptors, with British English as the filing language.

2. Criteria for the allocation of terms to the groups and themes of GEMET

GEMET has two systems for arranging the descriptors:

A classification scheme of 3 super-groups containing 30 groups; there are in addition 5 accessory groups of terms, instrumental to the thesaurus use. The super-groups have been adopted to approach an environmental management perspective and to help the hierarchical structuring of GEMET. The groups reflect a systematic, category- or discipline-oriented perspective. Within the groups, the descriptors are basically allocated in a mono-hierarchical order, but several descriptors needed to be allocated to more than one group or to more than a broader term inside the same group, thus creating a condition of poly-hierarchy.

Hierarchical relationships are either:

generic relationships (the narrower term has all the characteristics of the broader term and at least one additional characteristic)


 NTdeciduous trees

or whole-part relationships (the narrower term must be part of the broader term)


 NTtree trunks

If both generic and whole-part relationships exist in connection with a term, this results in a poly-dimensional subdivision. For the sake of clarity and taking into consideration that GEMET deals mainly with generic relationships, both relationships are treated as equals in the thesaurus.


 NTtree trunks
 NTdeciduous trees

Hierarchical relationships exist between terms belonging to the same logical categories. Every term can possess several broader terms (polyhierarchy)


sulphuric acid
 BTsulphur compounds

A thematic order, containing 40 themes. These themes have been established according to practical considerations, corresponding to the information needs. They have been developed to reflect the EEA activities in order to support the thematic elements of the EEA DPSIR Dataflow Scheme. The list of themes has taken into account all the main topics of the Scheme, of The Dobris Assessment and of other sources, like ETCs (European Topic Centres) and Eionet (Environmental Information and Observation Network). They can be used as checklists when dealing with environmental matters. The themes, being complementary to the groups, confer to the thesaurus a matrix structure.

The main principles followed for the allocation of descriptors were:

  1. A descriptor is usually allocated to one group;
  2. When necessary, a descriptor can be assigned to more than one group or to more than a broader term inside the same group (poly-hierarchy).
  3. A descriptor can be allocated to more than one theme (“poly-thematic” condition).
  4. A descriptor should be allocated to all the (relevant) themes to which it belongs.
  5. All descriptors belonging to a “Group” of GEMET whose name and content corresponds to a “Theme” will be allocated to that theme.
  6. The non-descriptors (synonyms) are linked to their descriptors.
  7. Descriptors with very general content or those which do not belong exactly to a theme, have been collected in a theme of general character, called “no special theme” (theme: general).
  8. According to the development of GEMET, additional themes might be identified.

The allocation of a descriptor to more than one theme reflects the relation of this term to different subject fields. A non-descriptor, being synonym to a descriptor, belongs to the same group or theme as the descriptor. The synonym guides the user directly to the preferred term, where s/he will find all the necessary information: the fundamental relations like equivalence, hierarchical and associative relationship, and so on.

Unlike in some parental thesauri, the singular form of terms has been preferred throughout the whole thesaurus; only a limited number of terms have been kept in plural form, to prevent change of meaning or to follow the rules of the English language. For the non-English languages, the translators are recommended to follow the same criterion used for English, but are left free to adopt a different form if the meaning of the term is at stake. All the complementary numerical forms (singular to plural, plural to singular) of the terms which can be endowed with such forms, have been entered into the thesaurus file; nevertheless, they will be presented only when they are alphabetically distant from the form presented (e.g.: man _ men).

The thesaurus has also been analysed for the presence of alternate forms and spelling variants, including the prepositional forms. The analysis was restricted to the English equivalents proposed by the parental thesauri, thus it was not extended to the rest of the terms. All these forms have been entered as non-descriptors (synonyms).

The themes have provided the basis for the work on associative relationships (RT, Related Terms).

Because of the restricted use of hierarchical relationships there was a need for another mechanism to draw attention to other terms which an indexer and a searcher should consider. These are RELATED TERMS of the starting term. Associative relationships between terms are relationships which do not correspond to the criteria of hierarchy or equivalence. Associative relationships can be established between terms belonging to different logical categories. From the numerous possible relationships only those relationships are included in GEMET which are considered useful for indexing and searching.

Related terms may be of several kinds:

The established principles will produce a better performance of the indexers when preparing input for the CDS and will allow easier access to the data for the users via the descriptors.

The classification scheme of GEMET, by groups, themes and hierarchies, should be considered merely as a mean to control the thesaurus terms and the semantic relations between them; in other words, as a way to control the internal coherence of the thesaurus. As such, it is not proposed as a general reference pattern for the organisation of any specific environmental information system, although its structure and comprehensive set of meta-concepts (mainly the themes and the top terms) can be fruitfully used for such purpose.

3. How to use the Thesaurus

To find the appropriate term, the user has several chances to navigate in the different parts of GEMET:

1. Have a look to the “Systematic List of Descriptors”, which guides you from the super-groups and groups to the descriptors and their hierarchical relations


2. Enter the “Thematic List of Descriptors”, indicating the various themes and the descriptors allocated to these themes in an alphabetic order


3. Consult the “Alphabetic List of Terms”, containing descriptors, definitions, scope notes, synonyms and the allocation of the terms to groups and themes


4. Another chance to access is, using the “Concordance” list which presents as entry terms also the internal words of a phrase or compound term, indicating the preferred terms.

In the “Alphabetic List of Terms”, descriptors are presented in bold type; different themes belonging to one descriptor are separated by a semicolon.

In the “Alphabetic List of Terms” and in the “Concordance List”, Non-Descriptors (synonyms) are printed in italics.

The following abbreviations are used in GEMET:

BT: Broader Term
DEF: Definition
NT: Narrower Term
S: Indicates the line number of the term in the “Systematic List of Descriptors”
SN: Scope Note
T: Indicates the line number of the term in the “Thematic List of Descriptors”
TT: Top Term
UF: Used For
USE: Use

Language abbreviations according to ISO standard 639-2:

eng: English
dan: Danish
fin: Finnish
ger: German
dut: Dutch
nor: Norwegian
swe: Swedish
fra: French
gre: Greek
ita: Italian
por: Portuguese
spa: Spanish
hun: Hungarian
slo: Slovak
rus: Russian
bul: Bulgarian
slv: Slovenian
baq: Basque
usa: American English (not ISO)

4. List of Groups

No.* Abbreviation Name of the Super-group/Group

* Neutral number

Supergroup 1 Natural Environment, Anthropic environment
1 ENV ENVIRONMENT (natural environment, anthropic environment)
4 ATM ATMOSPHERE (air, climate)
5 HYD HYDROSPHERE (freshwater, marine water, waters)
6 LIT LITHOSPHERE (soil, geological processes)
7 LAN LAND (landscape, geography)
8 BIO BIOSPHERE (organisms, ecosystems)
9 ANT ANTHROPOSPHERE (built environment, human settlements)
Supergroup 2 Human activities and products, Effects on the environment
12 ENE
13 RSC RESOURCES (utilisation of resources)
Supergroup 3 Social aspects, Environmental policy measures
Accessory Groups

5. List of Themes

* Neutral number

No.* Abbr. Theme Scope Notes
1 adm Administration
2 agr Agriculture
3 air Air air, air pollution (acidification, stratospheric ozone, tropospheric oxidants), air pollution control
4 bio Biology Organisms (also genetically modified organisms), biological properties, processes, biosystems
5 bui Building buildings, built-up area, infrastructure
6 che Chemistry chemical substances, properties and processes
7 cli Climate
8 dyn natural dynamics natural hazards, geophysical processes
9 eco Economics
10 ene Energy energy and power, energy sources and consumption
11 enp Environmental policy Environmental information, e.g. CDS; land cover, remote sensing, environmental impact assessment (EIA), environmental auditing, target setting, environmental expenditures
12 fis Fishery industry, resources
13 fod food, drinking water
14 for Forestry
15 gen General no special theme
16 geo Geography
17 hea human health nutrition, medical aspects, safety
18 hus animal husbandry
19 ind Industry industry, mining, handicraft, technology, technical procedures and equipment
20 inf Information
21 leg Legislation
22 mil military aspects
23 nat natural areas, landscape, ecosystems natural reserves, parks, landforms
24 noi noise, vibrations
25 phy Physics
26 pll Pollution pollution, pollution control, general pollutants (not special substances)
27 prd materials, products, equipments materials, raw materials and products, physical properties and processes, state of matter
28 rad Radiations
29 rec Tourism Recreation and tourism
30 res Research
31 rsc Resources use of resources (not special materials as resources)
32 saf disasters, accidents, risk, safety Contaminated sites, chemical risk, technical hazards, safety control
33 ser trade, services
34 soc social aspects, population social aspects, production, consumption, culture, education, household, labour
35 soi Soil soil, soil pollution, soil pollution control
36 spa Space Interplanetary space
37 tra Transportation, traffic traffic and transportation
38 urb urban environment, urban stress Settlements
39 was Waste waste, waste treatment, waste control
40 wat Water Hydrosphere, water, waters, waste water

6. Presentation of GEMET Version 2001

After the distribution of different intermediate versions, GEMET versions 0.5, 1.0 and 1.5 have been subjected to an extensive work by CNR and UBA that led to the version 2.0. More translations and some corrections formed version 2000. Aiming at completing the European languages in GEMET, Basque, Bulgarian, Russian and the Slovenian language could be added to this version. No changes have been performed regarding the term count or the hierarchy. Please find a report on this work in Annex 4.

The remarks and suggestions provided by the colleagues of Belgium, Sweden, Portugal, Norway, Austria, France, by the ETC for Nature Conservation and by US EPA and by UNEP GRID Geneva have been taken into account. Most of them have been applied to GEMET, to the extent they were not interfering with:

The resulting version presents:

The following table summarises the structural elements of GEMET.

Structure elements No
Super Groups 3
Groups 30
Accessory Groups 5
Themes 40
Top Terms (TT) 109
Narrower Terms (NT) 5.189
Total descriptors (TT + NT) 5.298
Total non-descriptors 1.264
Total records 6.562

Due to the lack of a complete list of equivalents for all the above mentioned languages, the two alphabetical and the multilingual lists will be presented in this printed version only with British English as the filing language. Separate lists for the other languages will be available by the EEA on request.

7. Essential References

de Lavieter, L. (Ed.)
Multilingual Environmental Thesaurus. Part 1, English; Part 2, Français; Part 3, Deutsch; Part 4, Nederlands; Part 5, Italiano; Part 6, Norsk; Part 7, Dansk; Part 8, Español.
NBOI, Nederlandse Bureau voor Onderzoek Informatie / EEA-TF - European Environment Agency - Task Force, Amsterdam, November 1995, pp. (English) vi + A-78; B-112; C-56; D-199, total 445.
Felluga, B., Lucke, S., Palmera, M., Plini, P., de Lavieter, L., Deschamps, J., Eds.
Thesaurus per l'ambiente - Versione quadrilingue / Thesaurus for the Environment - Quadrilingual Version / Milieu-thesaurus - Viertalige vertaling / Thesaurus für die Umwelt.
CNR-SIAM & CNR-UPIS. CD-ROM Edition, Milan, 1994.
(Includes the contents of: Felluga, B., de Lavieter, L., Deschamps, J., Lucke, S., Palmera, M., Eds.
Thesaurus per l'ambiente - Versione trilingue per l'Italia / Thesaurus for the Environment - Trilingual Edition for Italy / Milieu-thesaurus - Drietalige vertaling voor Italië
Edizione pilota, Vol. 1/3, pp. i-xiv + 700; Pilot Edition, Vol. 2/3, pp. xv-xxviii + 684; Proefuitgave, Vol. 3/3, pp. xxix-xxxiv+672, total pp. i+xiv+xxviii+xxxiv + 2056, 1991, Roma, CNR-ITBM.)
Felluga, B. (Ed.)
Multilingual Thesaurus for the Environment. Classification Scheme
CNR, Roma, June 1995, pp. 3 + 90.
Ministère de l'environnement.
Lexique environnement - Planète
Tome 1, Liste alphabétique, pp. 83; Tome 2, Liste thématique, pp. iv + 186. Ministère de l'environnement, Paris, Décembre 1995.
MOPTMA, Ministerio de Obras Públicas, Transportes y Medio Ambiente.
Tesauro Multilingüe de Medio Ambiente.
MOPTMA, Madrid, 1995, CD-ROM Edition, 1995.
(Includes the contents of: MOPU, Ministerio de Obras Publicas y Urbanismo.
Tesauro de medio ambiente.
MOPU, Madrid, 1990, pp. xxxii + 319.)
NERI, National Environmental Research Institute.
Guidelines for data collection for the Dobris+3 Report. Final Draft.
NERI, Copenhagen, pp. 186, September 1996.
Petersen, T. Ed.
AAT - Art and Architecture Thesaurus.
Oxford University Press, New York, Vol. 1, pp. xxix + 455; vol. 2, pp. 533; vol. 3, pp. 586; vol. 4, pp. 586; vol. 5, pp. 546, 1994.
Stanners, D. & Bourdeau, Ph., Eds.
Europe's Environment. The Dobris Assesssment.
EEA - European Environment Agency, Copenhagen, Version 1995-01, pp. xxvi + 616.
UBA - Umweltbundesamt.
Die Umwelt-CD -UMPLIS.
Umweltbundesamt, Berlin, I-1996, CD-ROM Edition + Benutzerhandbuch pp. 110 + Umweltklassifikation, 1993, pp. iii + 12.
(Includes the contents of: UBA, Umweltbundesamt.
Umwelt-Thesaurus und Umwelt Klassifikation.
Umweltbundesamt, Berlin, 1994, pp. v + 11 + 347 + 495 + 150 + 133 + 9, total 1145.)
UNEP, United Nations Environment Programme - Infoterra.
EnVoc - Multilingual Thesaurus of Environmental Terms.
UNEP, Nairobi, May 1997, pp. xix +248.