My research revolves around artificial intelligence, machine learning and data mining.
More specifically, I am interested in Social Media Analysis, Popularity Modelling and Prediction and Online Privacy.
Previous interests include knowledge injection into non-supervised learning algorithms, data representation and temporal evolutions.
M.-A. Rizoiu, S. Mishra, Q. Kong, M. Carman, and L. Xie, "SIR-Hawkes: Linking Epidemic Models and Hawkes Processes to Model Diffusions in Finite Populations, " in Proc. International Conference on World Wide Web (WWW '18), Lyon, France, 2018.
preprint + SI:
Source code and datasets
Q. Kong, M.-A. Rizoiu, S. Wu, and L. Xie, "Will This Video Go Viral? Explaining and Predicting the Popularity of Youtube Videos, " in Proc. International Conference on World Wide Web Companion (WWW '18), Lyon, France, 2018.
preprint + SI:
HIPie public installationYoutube screencastSource code
M.-A. Rizoiu and L. Xie, "Online Popularity under Promotion: Viral Potential, Forecasting, and the Economics of Time, " in Proc. International AAAI Conference on Web and Social Media (ICWSM '17), Montréal, Canada, p. 10, 2017.
preprint + SI:
Source code and dataset
M.-A. Rizoiu, L. Xie, S. Sanner, M. Cebrian, H. Yu, and P. Van Hentenryck, "Expecting to be HIP: Hawkes Intensity Processes for Social Media Popularity, " in Proc. International Conference on World Wide Web (WWW '17), Perth, Australia, pp. 735-744, 2017.
preprint + SI:
Source code and dataset Interactive visualization system
S. Mishra, M.-A. Rizoiu, and L. Xie, "Feature Driven and Point Process Approaches for Popularity Prediction, " in Proc. International Conference on Information and Knowledge Management (CIKM ’16), Indianapolis, USA, p. 1069-1078, 2016.
preprint + SI:
Presentation page Source code and dataset
M.-A. Rizoiu, L. Xie, T. Caetano, and M. Cebrian "Evolution of Privacy Loss on Wikipedia, " in Proc. International Conference on Web Search and Data Mining (WSDM '16), 2016, , pp. 215–224.
preprint + SI:
Y.-M. Kim, J. Velcin, S. Bonnevay, and M.-A. Rizoiu, "Temporal Multinomial Mixture for Instance-Oriented Evolutionary Clustering, " in Proc. European Conference on Information Retrieval (ECIR '15), 2015, pp. 593–604.
M.-A. Rizoiu, "Semi-Supervised Structuring of Complex Data, " in Proc. Doctoral Consortium of the International Joint Conference on Artificial Intelligence (IJCAI '13), 2013, pp. 3239–3240.
M.-A. Rizoiu, J. Velcin, and S. Lallich, "Structuring typical evolutions using Temporal-Driven Constrained Clustering," in Proc. International Conference on Tools with Artificial Intelligence (ICTAI '12), 2012, pp. 610–617.
C. Musat, J. Velcin, S. Trausan-Matu, and M.-A. Rizoiu, "Improving topic evaluation using conceptual knowledge," in Proc. International Joint Conference on Artificial Intelligence (IJCAI '11), 2011, pp. 1866–1871.
C. Musat, J. Velcin, M.-A. Rizoiu, and S. Trausan-Matu, "Concept-based Topic Model Improvement," in Proc. International Symposium on Methodologies for Intelligent Systems (ISMIS '11), 2011, pp. 133–142.
M.-A. Rizoiu, J. Velcin, and J.-H. Chauchat, "Regrouper les données textuelles et nommer les groupes à l'aide des classes recouvrantes," in Proc. Extraction et Gestion des Connaissances (EGC '10), 2010, pp. 561–572. preprint:
M.-A. Rizoiu, Y. Lee, S. Mishra, L. Xie "A Tutorial on Hawkes Processes for Events in Social Media," in book: Research Frontiers of Multimedia, S.-F. Chang (Ed.), pp. 1–26, 2017. ACM Books. (to appear)
M.-A. Rizoiu and J. Velcin, "Topic Extraction for Ontology Learning," in book: Ontology Learning and Knowledge Discovery Using the Web: Challenges and Recent Advances, pp. 38–61, 2011.
M.-A. Rizoiu, "Semi-Supervised Structuring of Complex Data," PhD Thesis, University Lumière Lyon 2, June, 2013.
M.-A. Rizoiu, "Textual Data Clustering and Cluster Naming," Master's Dissertation, 2009.
My research revolves around artificial intelligence, machine learning and data mining. More specifically, I am interested
in Social Network Analysis, popularity prediction, knowledge injection into non-supervised learning algorithms, data
representation and temporal evolutions. I deal with large datasets of complex data (textual, image), often issued from the
online social media and my main tools are modeling and simulation, clustering and topic modeling.
A little more details
My current research interest is to model theoretically popularity on online media, as well as estimate the influence of media content and network characteristics on online attention.
We established a generative model that predicts online attention, based on an exogenously-driven Hawkes self-exciting processes.
We also examine the geographical diffusion of media content over time and the goal is to generate statistical descriptions of content
diffusion over time and geographical areas.
We are handling very large Twitter datasets (the network), which relate to Youtube videos (the content).
My previous work dealt with how partial expert information can be leveraged into a non-supervised learning algorithm that treats complex data.
This complex data is of different natures (text, image), it is temporal and structured, linked to knowledge repository (e.g. ontology) and/or labeled.
Semi-supervised clustering is used to model the additional information (structure, labels, time) and to inject the heterogeneous information into the
A series of application emerge from the theoretical research: using the temporal dimension to detect temporal patters and typical evolutions,
using the image labels to improve image numerical representation and an automatic topic evaluation using concept trees.
A science slam is the challenge to describe a research topic to a non-expert, with a twist of humor.
At the ICWSM'17 science slam I talked about how to link exogenous stimuli and endogenous reactions to explain online popularity.
2014 – present : Human Dynamics
The study of Human Dynamics operates at the intersection of the computer and social sciences, with a primary interest in
social and financial networks, crowdsourcing, urban economics, behavioral game theory, and evolutionary dynamics.
2012 – 2014 : Project IMAGIWEB (financed by
Analysis of the image life cycle of politicians and companies through the online media and microblogging of
the Web 2.0.
2012 – 2014 : Project CRTT – ERIC (financed by University Lyon2)
Study of the evolution of the specialized discourse of the domain of nuclear medicine, taking into
account the temporal topics evolution and the different involved populations.
2010 – 2011 : Project ERIC-ELICO (financed by University Lyon2)
Joint analysis of the information extracted either by data mining techniques or manually by experts in
2009 : Project CONVERSESSION (financed by the Rhône-Alpes region)
Design of a novel platform for organizing and analyzing online debates. Project associated with the
incubation of a start-up enterprise.
02/10/2013 : CommentWatcher: plateforme Web open-source pour analyser les discussions sur des forums en ligne.
Invited speaker with BLEND 2013 conference.
26/08/2013 : Extracting and evaluating topics. CommentWatcher, an online forum analysis tool.
Invited speaker in the Chronological Text Mining session of the Research reunion of the
59th ISI World Statistical Congress.
12/02/2013 : Advancements in temporal clustering.
Research reunion of the
21/01/2013 : Using a Pareto Front for a Non-Supervised Feature Construction Algorithm.
Thematic day of the
: Using Multi-criteria Optimization.
This course is for the third year of undergraduates in the Research School of Computer Science.
It presents relational theory and conceptual modelling; privacy and security; statistical databases; distributed databases; data warehousing; data cleaning and integration;
and data mining concepts and techniques.
I give lectures concerning databases and data warehousing and data cleaning and integration.
This course is for the third year of undergraduates in the Research School of Computer Science,
as well as Honnors students. It presents techniques related to processing online document, such as (A) information retrieval,
(B) natural language processing, (C) machine learning for documents, and (D) relevant tools for the Web. I give lectures concerning
the machine learning part and the social media and sentiment analysis part.
This course is for the second year in the Excellence European Master DMKM. It presents advanced machine
learning techniques. Together with S. Lallich , we present
association rules mining and class rules mining, ensemble methods (bagging, boosting) and statistical testing
procedures (cross-validation, student t-test, etc.).
In July 2009, I obtained my MSc (graduating first of promotion, with honors.) in Data Mining and Knowledge Management from the
Polytechnic School of the University of Nantes, France
and wrote my Master’s Thesis on "Textual Data Clustering and Cluster Naming" after an internship at the ERIC Laboratory.