Research Projects
ECLADATTA
Keywords: Knowledge Extraction, Knowledge Graph, Joint Extraction Text/Table
I am the lead coordinator of the ECLADATTA project (2023 - 2026), a French national research project financed by the French National Research Agency (Agence Nationale de la Recherche - ANR) under grant ANR-22-CE23-0020. The consortium members are EURECOM, IRIT and Orange. ECLADATTA stands for ExtraCtion of LAtent knowledge in Documents by conjointly Analyzing Texts and TAbles. The ECLADATTA project aims at leveraging the complementarity between tables, texts, and knowledge graphs to propose a joint knowledge extraction and reconciliation process. The overall and original objective of ECLADATTA is to propose new methods and to develop tools:
- to assess the relatedness between tables and texts (within documents and across documents) and build on-demand text-table corpora based on a variety of filtering criteria
- to automatically extract knowledge jointly from tables and related texts
- to check the consistency of knowledge from tables, texts, and knowledge graphs
- to refine tables, texts, and general or domain-specific knowledge graphs
More info on the official Website: ECLADATTA
Knowledge
Keywords: Knowledge Graph, Large Language Models, Data Management
Context and Highlights:
- Creation of Knowledge in 2023
- Project manager with a research team of approx. 20 people
Overview: I am leading a research project named Knowledge at Orange. This project contributes to advances in the scientific and technological state of the art in natural language processing and knowledge engineering. These breakthroughs are then applied to new value-added service concepts for the Orange Group. The project brings together about 15 to 20 researchers and engineers in the fields of NLP and knowledge engineering.
NORIA
Keywords: Anomaly Detection, Cybersecurity, Knowledge Graph
Context and Highlights:
- Co-creation of NORIA in 2015
- Collaboration with EURECOM + co-supervision of a PhD student
Overview: The goal of the NORIA (machine learNing, Ontology and Reasoning for the Identification of Anomalies) research project is to build an innovative pipeline for advanced anomaly detection over the network infrastructures and cyber security application domains with help of Knowledge Graphs and neuro-symbolic models.
Past projects
DAGOBAH
Keywords: Tabular Data, Knowledge Extraction, Knowledge Graph, Semantic Annotation
Context and Highlights:
- Co-creation of DAGOBAH in 2018
- Project manager with a research team of approx. 14 people
- Collaboration with EURECOM + co-supervision of a PhD student
- 1st Prize (Accuracy track) SemTab2022, 1st Prize (Accuracy track) SemTab2021, 3rd Prize SemTab2020
Overview: Within the ever-expanding Web of data, more and more knowledge graphs (KGs) become available. However, these KGs may suffer from inconsistency and incompleteness issues. Hence, one can envision to either correct or complete KGs by extracting information from various sources such as web tables and texts available in Web pages. Interestingly, tables often constitute a major source of information since large parts of both companies internal repositories and Web pages are represented in tabular formats. Additionally, besides KG completion, the automatic interpretation of tables by software agents can enable semantic-driven services to query, manipulate, and process heterogeneous table corpora, such as a dataset search “moving beyond keyword”.
DAGOBAH aims at proposing solutions to semantically anotate tables and to exploit these annotations in search and recommendation use cases.
- Related papers:
- DAGOBAH: Table and Graph Contexts For Efficient Semantic Annotation of Tabular Data
- A Framework for Automatically Interpreting Tabular Data at Orange
- DAGOBAH: Enhanced Scoring Algorithms for Scalable Annotations of Tabular Data
- DAGOBAH: An End-to-End Context-Free Tabular Data Semantic Annotation System
- DAGOBAH : Un système d’annotation sémantique de données tabulaires indépendant du contexte
- DAGOBAH : Activité de recherche Orange autour de l’annotation sémantique de données tabulaires
- Blogs:
- Youtube channel
Dataforum
Keywords: Data Management, Data Catalog, Traceability, Knowledge Graph
Context:
- Creation of Dataforum in 2015
- Project manager with a research team of approx. 10 people until 2019
Overview: TODO
- Related papers:
- Blogs:
SADFC: Semantic Analysis of Digital Forensic Cases
Keywords: Digital Forensics, Timeline Reconstruction, Semantic Web
Context:
- Project subject of my PhD
- Co-supervision of Pr. Christophe Nicolle and Dr. Aurélie Bertaux from University of Burgundy and Pr. Tahar Kechadi from University College Dublin
Overview: Soon
- Related papers:
- An ontology-based approach for the reconstruction and analysis of digital incidents timelines
- A complete formalized knowledge representation model for advanced digital forensics timeline analysis
- Construction, Enrichment and Semantic Analysis of Timelines: Application to Digital Forensics
- Automatic timeline construction and analysis for computer forensics purposes
- De la scène de crime aux connaissances: représentation d’évènements et peuplement d’ontologie appliqués au domaine de la criminalistique informatique
- Reconstruction et analyse sémantique de chronologies cybercriminelles
- Event reconstruction: A state of the art