This version of GEMET is an extension of the 2001 version of GEMET. It includes the Czech, Estonian and Polish translations as national contributions and the entire content is embedded in a modern Internet application. Definitions are available in English and the Bulgarian, Russian and Slovenian translations are added. The application is browsable through this GUI and it is also available as a webservice for those who want to link it to their application. How to do this is explained under “GEMET web service”. The content of GEMET will need quality assurance around various terms over time. This is part of another initiative which is about to provide GEMET as open content in the WIKI set of Internet services (Wiktionary ...).
Please take a look at the text below for information on how GEMET was built, which will inform you about usage and limitations.
For the 6th time since 1996 there is a new edition of GEMET, being the reference vocabulary of the European Environment Agency (EEA) and its Network (Eionet). The present version of the controlled vocabulary provides Bulgarian, Russian and Slovenian as new languages. This could be achieved through the kind co-operation with National Focal points and other expert organisations in these countries. These translations are contribution of the countries to EEA's work programme and have been financed through national funds or with additional funds from outside the EEA scheme. As special remark has to be made regarding the Russian version due to the fact that Russia is not part of Eionet. The translation of GEMET terms into Russian has been funded by the United Nations Environment Programme (UNEP) and carried out in the International Centre for Scientific and Technical Information within a respective Memorandum of Understanding. The 2001 version of GEMET also incorporates changes provided for the Portuguese and Swedish language. It also sees an inclusion of the Basque (Euskara) language into the ThesShow browser - this inclusion has not been possible in the year 2000 version. The EEA highly appreciates all these contributions. The content of GEMET has not been changed to assure consistency in use between the versions. There are plans to include more European languages in the years to come as well as to perform a thorough evaluation of the content.
GEMET, the GEneral Multilingual Environmental Thesaurus, has been developed as an indexing, retrieval and control tool for the European Topic Centre on Catalogue of Data Sources (ETC/CDS) and the European Environment Agency (EEA), Copenhagen. The work has been carried out through a contract between the EEA and the ETC/CDS which is led by the Ministry of the Environment of Lower Saxony, includes members of Germany, Austria, Italy, Sweden and benefits of the collaboration of other member countries of the European Union (EU), as well as of UNEP Infoterra.
The basic idea for the development of GEMET was to use the best of the presently available excellent multilingual thesauri, in order to save time, energy and funds. GEMET was conceived as a “general” thesaurus, aimed to define a common general language, a core of general terminology for the environment. Specific thesauri and descriptor systems (e.g. on Nature Conservation, on Wastes, on Energy, etc.) have been excluded from the first step of development of the thesaurus and have been taken into account only for their structure and upper level terminology.
GEMET has been compiled by merging the terms of the following multilingual documents:
The merging has been performed both on conceptual and formal basis. Coinciding concepts in the different thesauri have been identified and scored. Like in other multilingual thesauri, e. g. Infoterra EnVoc, a neutral alphanumerical notation allows the identification of a concept independently on the user's language.
The links with the original thesauri are ensured by the respective identifiers or code notations.
Following the identification of the coinciding concepts, a selection was made by the experts of the National Focal Points of the organisations involved.
The resulting 6.562 terms have been arranged in a classification scheme made of 3 super-groups, 30 groups plus 5 accessory, instrumental groups. Each descriptor has been arranged in a hierarchical structure headed by a Top Term. The level of poly-hierarchy, i.e. the allocation of a descriptor to more than one group, has been kept to a minimum. Further, to allow a thematic retrieval of terms thematically related but scattered in different groups, a set of 40 themes have been agreed upon with the EEA and each descriptor has been assigned to as many themes as necessary. Thus, the user can access the thesaurus through the group-hierarchical list, through the thematic list or through the alphabetical list. As a complement to the hierarchical “vertical” relations, an exhaustive series of strong “horizontal” relations between terms (RT, Related Terms) have been introduced. A progressive Line Number has been assigned to each descriptor of the systematic list, in order to help the user of the lists to identify the descriptor in the different lists. The Line Number is merely a neutral identifier for the present version.
The GEMET size, formerly figured at about 200000 descriptors, rose to more than 5.000 in the course of merging, due to the limited overlapping between the different thesauri, to constraints of the selection work carried out by the parental organisations and to a few new additions, mainly from CDS indexing work.
The present version 2001 of GEMET is the result of a close collaboration between CNR and UBA under contract and supervision of the ETC/CDS. It presents 5.298 descriptors, including 109 Top Terms, and 1.264 synonyms in English. The 5.524 terms belonging to the parental thesauri and not included in GEMET, constitute an accessory alphabetical list of free terms.
British English has been proposed as language of choice for the EEA, but the American English equivalents have been added through a collaboration with the US Environmental Protection Agency (EPA).
The present Version 2001 of GEMET provides a complete numerical equivalence (all the descriptors have an equivalent) with the following languages: Basque, Bulgarian, Dutch, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Russian, Slovenian and Spanish. For Danish, Slovak, Swedish and Greek some few descriptors are still missing - this issue is presently under work. The semantic equivalence (correct correspondence of meaning between languages) has been separately ensured by the NFPs experts for Dutch, French, German, Italian, Norwegian, Portuguese and almost completely for Spanish. Equivalence in Finnish is not yet validated. The translation of GEMET into other languages, both extra-EU and extra-European is foreseen in the future.
The need to ensure the internal systematic and linguistic coherence of the thesaurus led the GEMET Working Group to foster the endowment of all the descriptors with a consistent set of definitions. There are at present more than 4.000 definitions available, which provide a useful glossary function where the semantic of the thesaurus structure might not be completely caught. The sources of definitions are presented in Annex 1.
GEMET follows the ISO norms on monolingual and multilingual thesauri.
The thesaurus material, i.e. the terms and their control elements, is managed by the THESmain program, developed by the TBHS, Technisches Büro Hermann Stallbaumer, Vienna, which provides a series of sophisticated functions for handling the poly-hierarchical, poly-thematic and multilingual aspects. The use of GEMET in THESmain is restricted to the developers of the thesaurus, but a user-friendly software program, THESshow (Windows95/98, NT 4.0), is available for the visualisation of GEMET for the lay user. In order to receive ThesShow, please contact the ETC/CDS at http://www.mu.niedersachsen.de. From end of 2001, please contact the EEA's Information Centre via e-mail firstname.lastname@example.org or at http://www.eea.europa.eu/
The thesaurus is part of WinCDS - MS-Access based data collection tool for the Catalogue of Data Sources, where it is used for indexing. A further software, GenThes, has been developed for the ETC/CDS by FZI, Forschungszentrum Informatik, Karlsruhe, in order to present GEMET in the Web environment. It functions as Java application to support expert retrieval on the Internet database WebCDS the Catalogue of environment related addresses and data sources. (http://www.mu.niedersachsen.de/system/cds)
GEMET, Version 2001, is published on CD-ROM in October 2001.
The printed edition in Adobe-Acrobat readable .PDF format, mainly intended for the Thesaurus users, will replace the 5 volumes of the version 1.0, published in July 1997 as well as the version 2.0 from August 1999 as well as the version 2000 from September 2000
|Volume 1:||Systematic List of Descriptors, in English, containing the allocation of descriptors to groups and the poly-hierarchical relations of the descriptors;|
|Volume 2:||Thematic List of Descriptors, in English, containing the allocation of descriptors to themes;|
|Volume 3:||Alphabetical List of Terms, in English, containing descriptors, definitions, scope notes, synonyms, allocation to groups and themes, top terms, broader terms, narrower terms, related terms;|
|Volume 4:||Concordance, i.e. the Alphabetical List of Descriptors and Non-Descriptors in permuted form;|
|Volume 5:||Multilingual List of Descriptors, with British English as the filing language.|
GEMET has two systems for arranging the descriptors:
A classification scheme of 3 super-groups containing 30 groups; there are in addition 5 accessory groups of terms, instrumental to the thesaurus use. The super-groups have been adopted to approach an environmental management perspective and to help the hierarchical structuring of GEMET. The groups reflect a systematic, category- or discipline-oriented perspective. Within the groups, the descriptors are basically allocated in a mono-hierarchical order, but several descriptors needed to be allocated to more than one group or to more than a broader term inside the same group, thus creating a condition of poly-hierarchy.
Hierarchical relationships are either:
generic relationships (the narrower term has all the characteristics of the broader term and at least one additional characteristic)
or whole-part relationships (the narrower term must be part of the broader term)
If both generic and whole-part relationships exist in connection with a term, this results in a poly-dimensional subdivision. For the sake of clarity and taking into consideration that GEMET deals mainly with generic relationships, both relationships are treated as equals in the thesaurus.
Hierarchical relationships exist between terms belonging to the same logical categories. Every term can possess several broader terms (polyhierarchy)
A thematic order, containing 40 themes. These themes have been established according to practical considerations, corresponding to the information needs. They have been developed to reflect the EEA activities in order to support the thematic elements of the EEA DPSIR Dataflow Scheme. The list of themes has taken into account all the main topics of the Scheme, of The Dobris Assessment and of other sources, like ETCs (European Topic Centres) and Eionet (Environmental Information and Observation Network). They can be used as checklists when dealing with environmental matters. The themes, being complementary to the groups, confer to the thesaurus a matrix structure.
The main principles followed for the allocation of descriptors were:
The allocation of a descriptor to more than one theme reflects the relation of this term to different subject fields. A non-descriptor, being synonym to a descriptor, belongs to the same group or theme as the descriptor. The synonym guides the user directly to the preferred term, where s/he will find all the necessary information: the fundamental relations like equivalence, hierarchical and associative relationship, and so on.
Unlike in some parental thesauri, the singular form of terms has been preferred throughout the whole thesaurus; only a limited number of terms have been kept in plural form, to prevent change of meaning or to follow the rules of the English language. For the non-English languages, the translators are recommended to follow the same criterion used for English, but are left free to adopt a different form if the meaning of the term is at stake. All the complementary numerical forms (singular to plural, plural to singular) of the terms which can be endowed with such forms, have been entered into the thesaurus file; nevertheless, they will be presented only when they are alphabetically distant from the form presented (e.g.: man _ men).
The thesaurus has also been analysed for the presence of alternate forms and spelling variants, including the prepositional forms. The analysis was restricted to the English equivalents proposed by the parental thesauri, thus it was not extended to the rest of the terms. All these forms have been entered as non-descriptors (synonyms).
The themes have provided the basis for the work on associative relationships (RT, Related Terms).
Because of the restricted use of hierarchical relationships there was a need for another mechanism to draw attention to other terms which an indexer and a searcher should consider. These are RELATED TERMS of the starting term. Associative relationships between terms are relationships which do not correspond to the criteria of hierarchy or equivalence. Associative relationships can be established between terms belonging to different logical categories. From the numerous possible relationships only those relationships are included in GEMET which are considered useful for indexing and searching.
Related terms may be of several kinds:
The established principles will produce a better performance of the indexers when preparing input for the CDS and will allow easier access to the data for the users via the descriptors.
The classification scheme of GEMET, by groups, themes and hierarchies, should be considered merely as a mean to control the thesaurus terms and the semantic relations between them; in other words, as a way to control the internal coherence of the thesaurus. As such, it is not proposed as a general reference pattern for the organisation of any specific environmental information system, although its structure and comprehensive set of meta-concepts (mainly the themes and the top terms) can be fruitfully used for such purpose.
To find the appropriate term, the user has several chances to navigate in the different parts of GEMET:
1. Have a look to the “Systematic List of Descriptors”, which guides you from the super-groups and groups to the descriptors and their hierarchical relations
2. Enter the “Thematic List of Descriptors”, indicating the various themes and the descriptors allocated to these themes in an alphabetic order
3. Consult the “Alphabetic List of Terms”, containing descriptors, definitions, scope notes, synonyms and the allocation of the terms to groups and themes
4. Another chance to access is, using the “Concordance” list which presents as entry terms also the internal words of a phrase or compound term, indicating the preferred terms.
In the “Alphabetic List of Terms”, descriptors are presented in bold type; different themes belonging to one descriptor are separated by a semicolon.
In the “Alphabetic List of Terms” and in the “Concordance List”, Non-Descriptors (synonyms) are printed in italics.
The following abbreviations are used in GEMET:
|S:||Indicates the line number of the term in the “Systematic List of Descriptors”|
|T:||Indicates the line number of the term in the “Thematic List of Descriptors”|
Language abbreviations according to ISO standard 639-2:
|usa:||American English (not ISO)|
|Supergroup||1||Natural Environment, Anthropic environment|
|1||ENV||ENVIRONMENT (natural environment, anthropic environment)|
|4||ATM||ATMOSPHERE (air, climate)|
|5||HYD||HYDROSPHERE (freshwater, marine water, waters)|
|6||LIT||LITHOSPHERE (soil, geological processes)|
|7||LAN||LAND (landscape, geography)|
|8||BIO||BIOSPHERE (organisms, ecosystems)|
|9||ANT||ANTHROPOSPHERE (built environment, human settlements)|
|Supergroup||2||Human activities and products, Effects on the environment|
|10||CHE||CHEMISTRY, SUBSTANCES, PROCESSES|
|11||PHY||PHYSICAL ASPECTS, NOISE, VIBRATIONS, RADIATIONS|
|13||RSC||RESOURCES (utilisation of resources)|
|15||AGR||AGRICULTURE, FORESTRY; ANIMAL HUSBANDRY; FISHERY|
|16||IND||INDUSTRY, CRAFTS; TECHNOLOGY; EQUIPMENTS|
|20||WAS||WASTES, POLLUTANTS, POLLUTION|
|Supergroup||3||Social aspects, Environmental policy measures|
|23||LEG||LEGISLATION, NORMS, CONVENTIONS|
|24||ADM||ADMINISTRATION, MANAGEMENT, POLICY, POLITICS, INSTITUTIONS, PLANNING|
|26||INF||INFORMATION, EDUCATION, CULTURE, ENVIRONMENTAL AWARENESS|
|3||air||Air||air, air pollution (acidification, stratospheric ozone, tropospheric oxidants), air pollution control|
|4||bio||Biology||Organisms (also genetically modified organisms), biological properties, processes, biosystems|
|5||bui||Building||buildings, built-up area, infrastructure|
|6||che||Chemistry||chemical substances, properties and processes|
|8||dyn||natural dynamics||natural hazards, geophysical processes|
|10||ene||Energy||energy and power, energy sources and consumption|
|11||enp||Environmental policy||Environmental information, e.g. CDS; land cover, remote sensing, environmental impact assessment (EIA), environmental auditing, target setting, environmental expenditures|
|13||fod||food, drinking water|
|15||gen||General||no special theme|
|17||hea||human health||nutrition, medical aspects, safety|
|19||ind||Industry||industry, mining, handicraft, technology, technical procedures and equipment|
|23||nat||natural areas, landscape, ecosystems||natural reserves, parks, landforms|
|26||pll||Pollution||pollution, pollution control, general pollutants (not special substances)|
|27||prd||materials, products, equipments||materials, raw materials and products, physical properties and processes, state of matter|
|29||rec||Tourism||Recreation and tourism|
|31||rsc||Resources||use of resources (not special materials as resources)|
|32||saf||disasters, accidents, risk, safety||Contaminated sites, chemical risk, technical hazards, safety control|
|34||soc||social aspects, population||social aspects, production, consumption, culture, education, household, labour|
|35||soi||Soil||soil, soil pollution, soil pollution control|
|37||tra||Transportation, traffic||traffic and transportation|
|38||urb||urban environment, urban stress||Settlements|
|39||was||Waste||waste, waste treatment, waste control|
|40||wat||Water||Hydrosphere, water, waters, waste water|
After the distribution of different intermediate versions, GEMET versions 0.5, 1.0 and 1.5 have been subjected to an extensive work by CNR and UBA that led to the version 2.0. More translations and some corrections formed version 2000. Aiming at completing the European languages in GEMET, Basque, Bulgarian, Russian and the Slovenian language could be added to this version. No changes have been performed regarding the term count or the hierarchy. Please find a report on this work in Annex 4.
The remarks and suggestions provided by the colleagues of Belgium, Sweden, Portugal, Norway, Austria, France, by the ETC for Nature Conservation and by US EPA and by UNEP GRID Geneva have been taken into account. Most of them have been applied to GEMET, to the extent they were not interfering with:
The resulting version presents:
The following table summarises the structural elements of GEMET.
|Top Terms (TT)||109|
|Narrower Terms (NT)||5.189|
|Total descriptors (TT + NT)||5.298|
Due to the lack of a complete list of equivalents for all the above mentioned languages, the two alphabetical and the multilingual lists will be presented in this printed version only with British English as the filing language. Separate lists for the other languages will be available by the EEA on request.