E-SLIDE : A web-based text-mining tool for the construction of landslide catalogues

Dumitriu, M. (1,2), Bernhard, D. (1), Malet, J.-P. (2), Puissant, A. (3), Mathieu, A. (2)

(1) Linguistique, Langues, Parole, EA 1339, Département d’Informatique, Université de Strasbourg, 22 rue Descartes, F-67084 Strasbourg Cedex
(2) Institut de Physique du Globe de Strasbourg, CNRS UMR 7516, Université de Strasbourg, 5 rue Descartes, F-67084 Strasbourg Cedex
(3) Laboratoire Image, Ville, Environnement, CNRS UMR 7362, Université de Strasbourg, 3 rue de l’Argonne, F-67083 Strasbourg Cedex

Landslides are a complex natural process that constitutes a serious natural hazard in many countries. The term includes a wide variety of slope movements, such as soil slips, deep-seated slides, mudflows, debris flows, rockfalls, etc. In order to quantify their occurrence, and associate them to a triggering event, it is necessary to construct event catalogues, with information on the date, spatial location and intensity of the event, and, possibly, information on the observed damages.
Most of the landslide causing damages (whatever the level of severity) are reported in on-line newspapers or in local information media just after the event. This type of inventory can be called landslide-event inventory as it is associated with a trigger. An overview of the syntaxical structure of the description of the landslide event in the media indicates that most of the articles are constructed using a similar framework, and that text-mining techniques could be used to automatically retrieve and store relevant information in a database.
The objective of this work is to present the service E-SLIDE which aims to mine daily a serie of nearly 400 RSS feeds and internet-based newspapers. The service is constructed around a serie of programs which are called in a BASH script that will parse and extract landslide information daily from the web.
After a first automatic selection of the article containing a serie of keywords describing slope-movements, the Unitex suite (e.g. a corpus processing system based on automated-oriented technology) is used to extract the event type, the event date, the event geographical location, the number of casualties, the possible damage, and if available, photographs of the event. All this information is automatically stored annotated and stored in a database. Two maps are further created with the location of the events, and the cumulated number of events per country.
The service has been developed for the French Landslide Observatory OMIV by the OMIV-EOST team at University of Strasbourg.