Research Projects

ECLADATTA

Keywords: Knowledge Extraction, Knowledge Graph, Joint Extraction Text/Table

Overview: I am the lead coordinator of the ECLADATTA project (2023 - 2026), a French national research project financed by the French National Research Agency (Agence Nationale de la Recherche - ANR) under grant ANR-22-CE23-0020. The consortium members are EURECOM, IRIT and Orange. ECLADATTA stands for ExtraCtion of LAtent knowledge in Documents by conjointly Analyzing Texts and TAbles. The ECLADATTA project aims at leveraging the complementarity between tables, texts, and knowledge graphs to propose a joint knowledge extraction and reconciliation process. The overall and original objective of ECLADATTA is to propose new methods and to develop tools:

  • to assess the relatedness between tables and texts (within documents and across documents) and build on-demand text-table corpora based on a variety of filtering criteria
  • to automatically extract knowledge jointly from tables and related texts
  • to check the consistency of knowledge from tables, texts, and knowledge graphs
  • to refine tables, texts, and general or domain-specific knowledge graphs

More info on the official Website: ECLADATTA

Knowledge

Keywords: Knowledge Graph, Large Language Models, Data Management

Context and Highlights:

  • Creation of Knowledge in 2023
  • Project manager with a research team of approx. 20 people

Overview: I am leading a research project named Knowledge at Orange. This project contributes to advances in the scientific and technological state of the art in natural language processing and knowledge engineering. These breakthroughs are then applied to new value-added service concepts for the Orange Group. The project brings together about 15 to 20 researchers and engineers in the fields of NLP and knowledge engineering.

NORIA

Keywords: Anomaly Detection, Cybersecurity, Knowledge Graph

Context and Highlights:

  • Co-creation of NORIA in 2015
  • Collaboration with EURECOM + co-supervision of a PhD student

Overview: The goal of the NORIA (machine learNing, Ontology and Reasoning for the Identification of Anomalies) research project is to build an innovative pipeline for advanced anomaly detection over the network infrastructures and cyber security application domains with help of Knowledge Graphs and neuro-symbolic models.


Past projects

DAGOBAH

Keywords: Tabular Data, Knowledge Extraction, Knowledge Graph, Semantic Annotation

Context and Highlights:

  • Co-creation of DAGOBAH in 2018
  • Project manager with a research team of approx. 14 people
  • Collaboration with EURECOM + co-supervision of a PhD student
  • 1st Prize (Accuracy track) SemTab2022, 1st Prize (Accuracy track) SemTab2021, 3rd Prize SemTab2020

Overview: Within the ever-expanding Web of data, more and more knowledge graphs (KGs) become available. However, these KGs may suffer from inconsistency and incompleteness issues. Hence, one can envision to either correct or complete KGs by extracting information from various sources such as web tables and texts available in Web pages. Interestingly, tables often constitute a major source of information since large parts of both companies internal repositories and Web pages are represented in tabular formats. Additionally, besides KG completion, the automatic interpretation of tables by software agents can enable semantic-driven services to query, manipulate, and process heterogeneous table corpora, such as a dataset search “moving beyond keyword”.

DAGOBAH aims at proposing solutions to semantically anotate tables and to exploit these annotations in search and recommendation use cases.

Dataforum

Keywords: Data Management, Data Catalog, Traceability, Knowledge Graph

Context:

SADFC: Semantic Analysis of Digital Forensic Cases

Keywords: Digital Forensics, Timeline Reconstruction, Semantic Web

Context:

  • Project subject of my PhD
  • Co-supervision of Pr. Christophe Nicolle and Dr. Aurélie Bertaux from University of Burgundy and Pr. Tahar Kechadi from University College Dublin

Overview: The research addresses the reconstruction of events related to a digital incident in the field of computer forensics. It proposes a new approach to building semantically enriched incident timelines from large, heterogeneous data sources, using formally defined operators for analysis. This method aims to assist investigators with automatic analysis tools, resolve data heterogeneity issues, and ensure the reproducibility and credibility of investigation processes. As cybercrime grows, investigators face challenges in processing large, diverse data sets. The approach combines computer forensics and semantic web technologies, using an ontology to represent and analyze events in detail, aiding in the reconstruction of incident timelines.