General information
Dr. Christian Desrosiers
Dr. Sylvie Ratté
Contacts (outliers)
Jose Pasillas
Contacts (LLTF)
Alpa Shah
Subventions
Conacyt, Doctoral scholarship
Essential links
ERRARE
The project ERRARE aims to study the anomalies in all forms and in all types of data. It aims to produce innovative techniques for the detection and interpretation of these particular data values.
ERRARE: OUTLIERS
Ensemble outlier detection for data quality improvement: a diverse an adaptable approach
Outliers and errors are correlated problems that are present in the majority of the real world datasets. However, current approaches dealing with these problems are useful only for a specific type of dataset and are limited only to the finding of the anomalies, not to their posterior classification. We aimed to develop a unified process, capable of adapt the set of anomaly detection techniques to the dataset under study, differentiating between errors and anomalies (representing possibly interesting data).
This approach is based on the understanding that adaptability is a key factor to evaluate different databases, then we propose a research project consisting basically of: a set of different types of anomaly detection techniques – accuracy plus diversity improving ensemble performance-; the use of different samples produced by different and diverse techniques (avoiding the possibility of bias in a single sample); and a data quality measurement improvement by classifying the anomalies.
ERRARE: LLTF
Abnormal Behavior Detection using Log Linear Tensor Factorization for Security Surveillance
Real Time Location Systems using Radio Frequency Identification is a popular surveillance method for security. However, in an open and dynamic environment, where patterns rarely repeat, it is diffcult to implement a model that could analyze incoming real-time volume of information for detecting abnormal events. This incoming data can be illustrated through a multidimensional array, called tensor. Latent information extracted using tensor decomposition methods have proved successful in representing the environmental data. Given this, we propose a robust and a scalable model where we apply machine learning tools for clustering and prediction upon these latent factors and thus, detect anomalies in real-time.