Introduction

This project was developed in LISIS laboratory , UPEM, Paris; in collaboration with INRA. with Nicola Ricci and Marc Barbier as my tutors during the whole process. Cortext is a open platform developed in LISIS laboratory that help researchers to process and analysis data, have some useful tools like terms extractions, name entity recognition, network mapp, geolocalitation and more.
The objetive of the project is to realize a digital enquiry of the agroecological turn in Costa Rica, more largely in Central America through the setup of consistent and appropriated datasets in order to analyze the production and circulation of knowledge through different channels of social media of webpages.
understanding this objective and trying to put it in the context of a computer engenier student, my job as a intern is to create a mining program to extract all the possible data from the datasets defined with the researchers. therefore process and analysis of this data to satisfy with the objetive.
So in this period of mining and analysis data, after a little exploration of which is the most popular social media in Central America and where the greatest amount of information can be located.It was decided to explore

  1. The repository of the Universities and research centers of Costa Rica
  2. Facebook, Youtube and Twitter
We chose the Costa Rican Universities because there can be located the biggest ammount of the academic research of the Country. We want also to see the influence of the language in each datasets(How many language, how to separate and to tread the data) also to se the period of the time into the data. The way to explore each place was different and requires different techniques.
For the search we define two queries, "agroecología OR agroecologico OR agricultura ecológica OR agro-ecología" and “agroecology OR agroecological OR agriculture ecological OR agro-ecology”. Those queries provide answers in french, italian, spanish, catalan, portugues and english. So all these data have to be delimited and filtered by country and not just for the language (because the language can confuse the original objective)
For the analysis part, we use Cortext and some of my own scripts that can be found in Gitlab Repository

You can see more of the analysis in each social media here: