UMAP '23: Proceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization

Full Citation in the ACM Digital Library

SESSION: Knowledge Graphs, Semantics, Social and Adaptive Web

Combining Graph Neural Networks and Sentence Encoders for Knowledge-aware Recommendations

In this paper, we present a strategy to provide users with knowledge-aware recommendations based on the combination of graph neural networks and sentence encoders. In particular, our approach relies on the intuition that different data sources (i.e., structured data available in a knowledge graph and unstructured data, such as textual content) provide complementary information and can equally contribute to learn an accurate item representation. Accordingly, we first exploited graph neural networks to encode both collaborative features, such as the interactions between users and items, and structured properties of the items. Next, we used a sentence encoder that relies on transformers to learn a representation based on textual content describing the items. Finally, these embeddings are combined by exploiting a deep neural network where both self-attention and cross-attention mechanisms are used to learn the relationships between the initial embeddings and to further refine the representation. Such a neural network provides as output a prediction of users’ interest in the items, which is used to return a top-k recommendation list. In the experimental evaluation, we carried out an experiment against two datasets, and the results showed that our approach overcame several competitive baselines.

SESSION: Intelligent User Interface

Human Expectations and Perceptions of Learning in Machine Teaching

Interactive interfaces in tandem with Machine Learning (ML) models support user understanding of model uncertainty, build confidence, improve predictive accuracy and enable users to teach application-specific concepts that are difficult for the model to learn otherwise. These systems offer empirically proven benefits due to tightly coupled feedback loops and workflow scaffolding. However, deployment with ML non-experts who cannot manage the complex, expertise-heavy process remains challenging. Through deployment with non-expert users in a common classification task, we investigate the impact of human factors of machine teaching interfaces such as user expectations, their perceptions of the learning process and user engagement with respect to teaching process and outcomes. We measure how affective and performance attributes shape the success or failure of the process. Finally, we reflect on how intelligent user interfaces can be designed to accommodate these factors for successful deployment with a broad spectrum of human adjudicators.

Human-centered Information Visualization Adaptation Engine

Data Analytics is the art of turning data into insights for efficient and effective business decisions. Data visualization is among the most powerful tools in the data analyst’s arsenal, enabling the transformation of data into effective visualizations that can be easily comprehended. However, its effectiveness is often affected by the data analysts’ experience and their ability to quickly understand and interpret information. Even though business analytics tools have made a significant progress to deliver immersive data visualization environments for improving users’ efficiency and effectiveness, they still do not consider the individual differences in the core process that influences the visualization structure, encoding, and readability.

This paper leverages the users’ individual differences to deliver a novel human-centered by-design adaptation engine for business users. The adaptation engine aims to improve the comprehension of data visualizations by delivering personalized content (visualization type and adaptation of visual elements), which in turn leads to improved accuracy and time-to-action efficiency. The proposed adaptation mechanism is evaluated using 45 professional business analysts from multiple industry sectors. The results suggest that individual differences can play an important role in the adaptation process of data visualizations enhancing analysts’ comprehensibility and decision making.

Interactive Personalization of Classifiers for Explainability using Multi-Objective Bayesian Optimization

Explainability is a crucial aspect of models which ensures their reliable use by both engineers and end-users. However, explainability depends on the user and the model’s usage context, making it an important dimension for user personalization. In this article, we explore the personalization of opaque-box image classifiers using an interactive hyperparameter tuning approach, in which the user iteratively rates the quality of explanations for a selected set of query images. Using a multi-objective Bayesian optimization (MOBO) algorithm, we optimize for both, the classifier’s accuracy and the perceived explainability ratings. In our user study, we found Pareto-optimal parameters for each participant, that could significantly improve explainability ratings of queried images while minimally impacting classifier accuracy. Furthermore, this improved explainability with tuned hyperparameters generalized to held-out validation images, with the extent of generalization being dependent on the variance within the queried images, and the similarity between the query and validation images. This MOBO-based method has the potential to be used in general to jointly optimize any machine learning objective along with any human-centric objective. The Pareto front produced after the interactive hyperparameter tuning can be useful during deployment, allowing for desired trade-offs between the objectives (if any) to be chosen by selecting the appropriate parameters. Additionally, user studies like ours can assess if commonly assumed trade-offs, such as accuracy versus explainability, exist in a given context.

Service-based Presentation of Multimodal Information for the Justification of Recommender Systems Results

The current models for the explanation and justification of recommender systems results focus on qualitative and quantitative data about items, overlooking the power of images to describe the different aspects of experience that the consumer should expect from their selection to post-sales. In the present paper, we extend previous justification models by exploiting object recognition on images to support a service-oriented presentation of multimodal (textual, quantitative, and images) information about items. As a testbed for our model, we chose the home-booking domain. In a user study, we found that item comparison can be enhanced by empowering the user to filter multimodal data based on a set of evaluation dimensions describing the experience with items. These results encourage the introduction of service-based filters for multimodal information retrieval in product and service catalogs.

User Needs for Explanations of Recommendations: In-depth Analyses of the Role of Item Domain and Personal Characteristics

Explanations can be provided with different goals, such as clarifying how the system works, how well the recommended item meets the user’s preferences, and how an explanation helps the user select an item faster. Although extensive research has been conducted in this research line, not much attention is paid to investigating user needs for explanations. To the best of our knowledge, no studies provide related insights, especially from the perspectives of item domain and personal characteristics. Up to now, it is not completely clear if user needs for explanations change across different item domains and vary according to user characteristics. To analyze these aspects, we developed three web-based prototype recommender systems for low-, average-, and high-involvement item domains and conducted a user study with 553 participants from different countries. Related results show that, in high-involvement item domains, users tend to have a look at explanations when they are not satisfied with the recommended items. An opposite tendency was found in low- and average-involvement item domains. Statistically, there is insufficient evidence to suggest correlations between users’ needs for explanations and item domains or between users’ needs and personal characteristics. However, the descriptive statistics show that users’ need for explanations varies across different item domains. In this study, we also found the best explanation approaches to be used in a specific recommendation domain.

SESSION: Personalization for Persuasive and Behavior Change Systems

Investigating the effectiveness of persuasive justification messages in fair music recommender systems for users with different personality traits

In recent decades, music recommender systems have become increasingly popular and have attracted a lot of research attention. While there has been significant progress in algorithm design to improve the quality of recommendations for listeners, there are new research challenges arising in large scale systems which have to consider the interests of both listeners and artists. To ensure a sustainable community of artists and a diversity of genres, artists, and songs, the recommender needs to ensure that new artists have a chance to be heard and rated. So, in addition to the objective of optimizing the recommendation to the preferences and enjoyment of the listener, a large scale MRS has a “fairness” objective to provide new artists (the protected group) with an opportunity to be heard. Previous research shows that using persuasive explanations can increase user acceptance of the recommended items. We propose to use persuasive justification messages for songs of new artists to influence user acceptance and satisfaction with these recommendations. The messages are designed to implement the six popular Cialdini persuasive strategies. We explore the effects of different persuasive messages on users with different Big-5 (OCEAN) personality types in an online study (n=205). The findings show that users with different personality traits are receptive to different persuasive messages and suggest how to personalize the persuasive justifications to amplify their effect for users with different personalities. These results can guide the development of personalized/ adaptive persuasive recommendation justifications for fair music recommender systems leading to a better user satisfaction and mitigating the “rich get richer” effect in large-scale music recommender systems, ensuring diversity of content and sustainability of the community.

Personalizing Time Loss Aversion to Reduce Social Media Use

This study examines the effectiveness of a novel personalization approach for persuasive and behavior change systems: time loss aversion. Focusing on time instead of money, it influences behavior. Two interventions were developed and tested to reduce daily social media use and boost non-digital activities. Participants received information about their average daily social media use in terms of a week, month, and year, with one intervention offering substitute activities. Among 231 participants, both interventions successfully reduced intentions for future social media use, revealing generational effects, with Gen Zs using social media more than Millennials. These results have practical implications for personalizing interventions that reduce excessive digital engagement.

Toward Changing Users behavior with Emotion-based Adaptive Systems

Interactive computer systems’ designers emphasize the importance of considering humans, their emotions, and behaviors as first-class entities. Emotions are integral parts of human nature, and ignoring that can lead the interactive systems to failure, low quality, or discomfort. User interfaces (UIs) are increasingly becoming adaptive to users’ various characteristics, intending to improve users’ satisfaction, performance, and decisions. However, the previous approaches proposed for supervising such adaptations are not effectively adopted in real-life problems. This paper proposes the novel approach to adapting UIs to users’ emotions using Model-Free Reinforcement Learning (MFRL). The approach aims to maximize applying the essential adaptations and minimize the unnecessary ones towards users’ task completion and satisfaction. We chose emergency evacuation training as a suitable evaluation domain since people experience intense emotions in potential danger. We performed experiments with a mobile application we developed that acts as a recommender system in emergency training. By taking contextual input of the users’ basic emotions from face recognition, the application intelligently adapts its UI to quickly lead people to safe areas while arousing target emotions. The research includes literature analysis, surveys, and further adopting an iterative process in implementation and experimentation. The evaluation process confirms the efficiency and effectiveness of the MFRL in iterations, as well as compared to other possible UI adaptation techniques, i.e., rule-based and sequential adaptation.

Towards a Personalized Online Fake News Taxonomy

Fake news has become a serious and destabilizing problem in our increasingly polarized society. The core tasks of detecting and characterizing untrue or misleading online content are quite challenging. News consumers intervention has been identified as a crucial addition to Fake News Detection Systems (FNDS). However, even though detection explanation is starting to gain research momentum, not as much attention has been given to personalization of explanations. As humans are the obvious targets of fake news, explanations that can evoke emotional responses and/or are aligned with individual personality traits and cognitive styles can be leveraged to nudging of the news consumer into a reflective state about the subject, which has been shown to be more effective than the crude presentation of facts in changing pre-conceived beliefs. This paper adds to the main goal of misinformation detection systems, aiming to expand them onto personalized FNDS. It proposes a metric to be used in their evaluation. It offers a definition to help research on the implementation of personalized fake news explanations. And finally, it proposes a personalized fake news taxonomy, discussing its components centered around emotion-based and personality-based explanations. This taxonomy highlights several opportunities for those researching in the area of personalized fake news explanation systems.

SESSION: Personalizing Learning Experiences through User Modeling

A Bandit You Can Trust

This work proposes Dynamic Linear Epsilon-Greedy, a novel contextual multi-armed bandit algorithm that can adaptively assign personalized content to users while enabling unbiased statistical analysis. Traditional A/B testing and reinforcement learning approaches have trade-offs between empirical investigation and maximal impact on users. Our algorithm seeks to balance these objectives, allowing platforms to personalize content effectively while still gathering valuable data. Dynamic Linear Epsilon-Greedy was evaluated via simulation and an empirical study in the ASSISTments online learning platform. In simulation, Dynamic Linear Epsilon-Greedy performed comparably to existing algorithms and in ASSISTments, slightly increased students’ learning compared to A/B testing. Data collected from its recommendations allowed for the identification of qualitative interactions, which showed high and low knowledge students benefited from different content. Dynamic Linear Epsilon-Greedy holds promise as a method to balance personalization with unbiased statistical analysis. All the data collected during the simulation and empirical study are publicly available at

Composing Groups in Collaborative Learning by Pair Personality Differences

Previous studies have shown that the personality composition of a group significantly affects learners’ satisfaction during collaborative learning. However, while these studies investigated a group as a whole by focusing on group statistics, such as the mean and standard deviation of the members’ personalities, they paid little attention to the personality differences of individual pairs within the group, albeit the group contains many pairwise interactions. In this paper, we studied whether and how pairwise personality differences between a learner and groupmates affect the learner’s satisfaction. Examining data collected from an employee training program during which learners had reflective group discussions, we confirmed that pairwise personality differences significantly affect a learner’s level of satisfaction in the program. Specifically, satisfaction is affected by (1) the average of the personality differences between the learner and each individual groupmate, which reflects the degree to which the learner is different from the groupmates on average, and (2) the personality difference from the groupmate who has the most different/similar personality from/to the learner.

Curb Your Procrastination: A Study of Academic Procrastination Behaviors vs. A Planning and Time Management App

Procrastination is a major issue faced by students which can lead to negative impacts on their academic performance and mental health. Productivity tools aim to help individuals to alleviate this behavior by providing self-regulatory support. However, the processes of how these applications help students conquer academic procrastination are under-explored. Particularly, it is essential to understand what aspects of these applications help which kinds of students in accomplishing their academic tasks. In this paper, we address this gap by presenting an academic planning and time management app (Proccoli) and a study designed to understand the association between student procrastination modeling, in-app behaviors, and perceived performance with app evaluation. As the core of our study, we analyze student perceptions of Proccoli and its impact on their study tasks and time management skills. Then, we model student procrastination behaviors by Hawkes process mining, assess student in-app behaviors by specifying planning and performance-related measures and evaluate the relationship between student behaviors and the evaluation survey results. Our study shows a need for personalized self-regulation support in Proccoli, as students with different in-app studying behaviors are found to have different perceptions of the app functionalities and the association between the prompts for social accountability students received by using Proccoli and their procrastination behavior is significant.

How Close are Predictive Models to Teachers in Detecting Learners at Risk?

Detecting learners in need of support is a complex process for both teachers and machines. Most prior work has devised visualization tools that allow teachers to do so by analyzing educational indicators. Other recent efforts have been devoted to models that predict whether learners might be at risk. However, the question on how teacher-like is the model behaving under this detection task still remains unanswered. In this paper, we investigate the (dis)agreement between teachers and model decisions, using a real-world flipped course as a case study. From the model perspective, we considered a well-known neural network, trained on educational indicators extracted from online pre-class logs. To gather teachers’ understanding, we employed a crowd sourcing approach including over 360 human intelligence tasks from 60 university teachers. We asked each recruited teacher to analyze visualizations pertaining to four relevant educational indicators of a given learner, and reason about their probability of failing the course (and so requiring support). Learners presented to teachers were selected to address different aspects of model confidence and (in)accuracy. Our results show that teacher and model predictions diverged for students who passed the course, while predictions were similar for students who failed the course. Moreover, confidence and correctness were more aligned in teachers than the model, reducing the unknown risks originally present in models. The source code is available at

Improving Proactive Dialog Agents Using Socially-Aware Reinforcement Learning

The next step for intelligent dialog agents is to escape their role as silent bystanders and become proactive. Well-defined proactive behavior may improve human-machine cooperation, as the agent takes a more active role during interaction and takes off responsibility from the user. However, proactivity is a double-edged sword because poorly executed pre-emptive actions may have a devastating effect on the task outcome and the relationship with the user. For designing adequate proactive dialog strategies, we propose a novel approach including both social and task-relevant features in the dialog. Here, the primary goal is to optimize proactive behavior so that it is task-oriented - this implies high task success and efficiency - while also being socially effective by fostering user trust. Including both aspects in the reward function for training a proactive dialog agent using reinforcement learning showed the benefit of our approach for more successful human-machine cooperation.

Temporal-Weighted Bipartite Graph Model for Sparse Expert Recommendation in Community Question Answering

Community Question Answering (CQA) websites are valuable knowledge repositories where individuals exchange information by asking and answering questions. With an ever-increasing number of questions and high in-flow and out-flow of users in these communities, a key challenge is to design effective strategies for recommending experts for new questions. This requires robust approaches that facilitate modeling users’ expertise given their changing interests and sparse historical data, at the same time being computationally less expensive for periodic updates. In this paper, we propose a simple graph diffusion-based expert recommendation model for CQA, that can outperform state-of-the-art convolutional neural network and transformers-based deep learning representatives and collaborative models. Our proposed method learns users’ expertise in the context of both semantic and temporal information to capture their changing interests and activity levels with time. Experiments on six real-world datasets from the Stack Exchange network demonstrate that our approach outperforms competitive baseline methods. Further, experiments on cold-start users (users with a limited historical record) show our model achieves an average of 50% performance gain compared to the best baseline method.

SESSION: Personalized Recommender Systems

Combining Reinforcement Learning and Spatial Proximity Exploration for New User and New POI Recommendations

Tourism Recommender Systems (TRSs) are unable to properly suggest new points of interest (POIs) to new users, i.e., to solve the combined new user and new item problem. To address this limitation we introduce a Reinforcement Learning TRS, which is called QEXP, and relies on a POI visits behaviour model mined from logs of POI visits data. This data is combined with general knowledge about the spatial range of tourists’ movements in a destination to generate recommendations. QEXP can recommend new POIs possessing the features of POIs experienced by tourists in the past, while favouring the exploration of POIs in the proximity of the target tourist position. We compare QEXP with four state-of-the-art POI RSs and we show that it can successfully tame the new user and new item problems. QEXP can also mitigate the concentration and popularity biases of the compared RSs and can recommend diverse and geographically dispersed POIs.

Evaluating Pre-training Strategies for Collaborative Filtering

Pre-training is essential for effective representation learning models, especially in natural language processing and computer vision-related tasks. The core idea is to learn representations, usually through unsupervised or self-supervised approaches on large and generic source datasets, and use those pre-trained representations (aka embeddings) as initial parameter values during training on the target dataset. Seminal works in this area show that pre-training can act as a regularization mechanism placing the model parameters in regions of the optimization landscape closer to better local minima than random parameter initialization. However, no systematic studies evaluate the effectiveness of pre-training strategies on model-based collaborative filtering. This paper conducts a broad set of experiments to evaluate different pre-training strategies for collaborative filtering using Matrix Factorization (MF) as the base model. We show that such models equipped with pre-training in a transfer learning setting can vastly improve the prediction quality compared to the standard random parameter initialization baseline, reaching state-of-the-art results in standard recommender systems benchmarks. We also present alternatives for the out-of-vocabulary item problem (i.e., items present in target but not in source datasets) and show that pre-training in the context of MF acts as a regularizer, explaining the improvement in model generalization.

Modelling the Training Practices of Recreational Marathon Runners to Make Personalised Training Recommendations

These days we have all become increasingly aware of the role that exercise plays in a healthy lifestyle. Activities such as cycling, triathlons, and running have become popular ways for people to keep fit and test their abilities. For recreational athletes there is no shortage of training advice or programmes to follow, yet most offer only one-size-fits-all, or minimally tailored guidance, which often leaves novices under-supported on their fitness journeys. In this work, we describe a case-based reasoning system to generate personalised training recommendations for marathon runners, based on their training histories and the training histories of similar runners with comparable race goals. The system harnesses the type of activity data that is routinely collected by smartwatches and apps like Strava. It uses prefactual explanations to suggest to runners how they may wish to adjust their training as their fitness goals evolve. We evaluate the approach using a large-scale dataset of more than 300,000 real-world runners and we show that it is feasible to generate tailored, personalised recommendations for up to 80% of these runners. Additionally, we show that the recommendations produced are realistic and reasonable for a runner to implement, as part of their training programme. These suggestions typically include a small number (3-5) of incremental training adaptations, such as a change in weekly distance, long-run distance, or mean training pace. We argue that by engaging runners in this type of dialog about their training progress and race goals, we can better support novice runners, as their training unfolds, which may help to keep runners motivated on their long journey to race day.

Promoting Tail Item Recommendations in E-Commerce

The research area of recommender systems (RS) in e-commerce has become extremely popular in recent years. However, traditional RSs tend to recommend popular items, while niche (long-tail) items are often neglected, which is known as the long-tail problem. However, recent studies found that tail items are one of the key success factors in the e-commerce world. The availability of such items encompasses relatively high marginal profits and boosts the sales of popular short-head items. We suggest promoting long-tail items by leveraging the short-head items’ advantages and exposing the user to a tail item that may have not been considered otherwise. We use a classification model and statistical tools to generate personalized recommendations of a long-tail item considering a short-head item that has already been clicked. The uniqueness of our method lies in the combination of tail and head items to uplift the exposure of the latter and in using an applicable solution to deal with the extreme volume of tail items. We demonstrate the effectiveness of our method on real-world data from eBay and provide an analysis of the long-tail phenomenon and consumption behavior.

Together Yet Apart: Multimodal Representation Learning for Personalised Visual Art Recommendation

With the advent of digital media, the availability of art content has greatly expanded, making it increasingly challenging for individuals to discover and curate works that align with their personal preferences and taste. The task of providing accurate and personalized Visual Art (VA) recommendations is thus a complex one, requiring a deep understanding of the intricate interplay of multiple modalities such as image, textual descriptions, or other metadata. In this paper, we study the nuances of modalities involved in the VA domain (image and text) and how they can be effectively harnessed to provide a truly personalized art experience to users. Particularly, we develop four fusion-based multimodal VA recommendation pipelines and conduct a large-scale user-centric evaluation. Our results indicate that early fusion (i.e, joint multimodal learning of visual and textual features) is preferred over a late fusion of ranked paintings from unimodal models (state-of-the-art baselines) but only if the latent representation space of the multimodal painting embeddings is entangled. Our findings open a new perspective for a better representation learning in the VA RecSys domain.

SESSION: Research Methods and Reproducibility

A Missing Piece in the Puzzle: Considering the Role of Task Complexity in Human-AI Decision Making

Recent advances in the performance of machine learning algorithms have led to the adoption of AI models in decision making contexts across various domains such as healthcare, finance, and education. Different research communities have attempted to optimize and evaluate human-AI team performance through empirical studies by increasing transparency of AI systems, or providing explanations to aid human understanding of such systems. However, the variety in decision making tasks considered and their operationalization in prior empirical work, has led to an opacity around how findings from one task or domain carry forward to another. The lack of a standardized means of considering task attributes prevents straightforward comparisons across decision tasks, thereby limiting the generalizability of findings. We argue that the lens of ‘task complexity’ can be used to tackle this problem of under-specification and facilitate comparison across empirical research in this area. To retrospectively explore how different HCI communities have considered the influence of task complexity in designing experiments in the realm of human-AI decision making, we survey literature and provide an overview of empirical studies on this topic. We found a serious dearth in the consideration of task complexity across various studies in this realm of research. Inspired by Robert Wood’s seminal work on the construct, we operationalized task complexity with respect to three dimensions (component, coordinative, and dynamic) and quantified the complexity of decision tasks in existing work accordingly. We then summarized current trends and proposed research directions for the future. Our study highlights the need to account for task complexity as an important design choice. This is a first step to help the scientific community in drawing meaningful comparisons across empirical studies in human-AI decision making and to provide opportunities to generalize findings across diverse domains and experimental settings.

Leveraging Causal Inference to Measure the Impact of a Mental Health App on Users’ Well-being

As stated in the United Nations’ Sustainable Development Goals, poor mental well-being is one of the biggest problems we are facing worldwide. One possible way of addressing it is through interventions delivered via digital devices since they are scalable, ubiquitous and inexpensive. This is also confirmed by the ever-growing plethora of e-health mobile apps being developed. Although these apps rely to some extent on scientific bases, there is still much work to do to understand the effect of specific digital interventions on app users. To shed light on these effects, we ask what types of interventions within the app have the most significant impact on well-being, and to what extent longer engagement leads to improved outcomes. These questions could be answered with dedicated Randomized Controlled Trials (RCTs), which are generally expensive, time-consuming, and single-purposed. To overcome these difficulties, we adopt instrumental variables on a combination of data collected in an RCT, behavioural data from the app, and a randomized recommender system, to evaluate intervention and app dose-response effects on users’ self-reported well-being. Thus, we present a general causal inference approach for extending results from collected data in RCTs applied in the context of digital health intervention. Following this approach, we show how to measure the impact of different types of activities on the users’ well-being. This allows us to identify the most impactful activities in the app (namely, sleep and relaxation activities), which have direct implications for the app design. On the other hand, we prove the positive effect of longer app usage.

SESSION: Responsibility, Compliance, and Ethics

Amplifying Artists’ Voices: Item Provider Perspectives on Influence and Fairness of Music Streaming Platforms

The majority of music consumption nowadays takes place on music streaming platforms. Whichever artists, albums, or songs are exposed to consumers on these platforms therefore greatly influences what music is ultimately consumed. As a result, the impact of these platforms on artists—their main item providers—is considerable. The recommender systems at the core of streaming platforms, though, have traditionally been developed focusing on end consumer objectives. Only recently, researchers have started to include item provider objectives, though rarely through reaching out to item providers directly. By omitting this important stakeholder’s point of view, we risk not understanding what artists value most, and might miss first-hand ideas on how to improve streaming platforms and recommender systems. Therefore, we conducted semi-structured interviews to capture the artists’ view. Specifically, we explore artists’ considerations regarding fairness, transparency, and diversity in music recommender systems, and the role artists envision for streaming platforms regarding those topics. We identify some topics with a clear consensus among artists, such as desiring more control over which music is recommended to whom, and expecting streaming platforms to actively increase music diversity in recommendations. In contrast, artists’ opinions differ on whether platforms should actively intervene in recommender systems to, e.g., increase localization or gender balance. Further, we observe that artists often take user preferences into account and even suggest new platform functionality to benefit both users and item providers. We encourage utilizing these insights when designing and evaluating music streaming platforms and recommender systems.

SESSION: Virtual Assistants, Conversational Interactions, and Personalized Human-robot Interaction

A Comparative Analysis of Automatic Speech Recognition Errors in Small Group Classroom Discourse

In collaborative learning environments, effective intelligent learning systems need to accurately analyze and understand the collaborative discourse between learners (i.e., group modeling) to provide adaptive support. We investigate how automatic speech recognition (ASR) errors influence discourse models of small group collaboration in noisy real-world classrooms. Our dataset consisted of 30 students recorded by consumer off-the-shelf microphones (Yeti Blue) while engaging in dyadic- and triadic- collaborative learning in a multi-day STEM curriculum unit. We found that two state-of-the-art ASR systems (Google Speech and OpenAI Whisper) yielded very high word error rates (0.822, 0.847) but very different profiles of error with Google being more conservative, rejecting 38% of utterances instead of 12% for Whisper. Next, we examined how these ASR errors influenced down-stream small group modeling based on pre-trained large language models for three tasks: Abstract Meaning Representation parsing (AMRParsing), on-task/off-task detection (OnTask), and Accountable Productive Talk prediction (TalkMove). As expected, models trained on clean human transcripts yielded degraded performance on all three tasks, measured by the transfer ratio (TR). However, the TR of the specific sentence-level AMRParsing  task (.39 - .62) was much lower than that of the abstract discourse-level OnTask  (.63- .94) and TalkMove   tasks (.64-.72). Furthermore, different training strategies that incorporated ASR transcripts alone or as augmentations of human transcripts increased accuracy for the discourse-level tasks (OnTask  and TalkMove) but not AMRParsing. Simulation experiments suggested that the models were tolerant of missing utterances in the dialog context, and that jointly improving ASR accuracy on important word classes (e.g., verbs and nouns) can improve performance across all tasks. Overall, our results provide insights into how different types of NLP-based tasks might be tolerant of ASR errors under extremely noisy conditions and provide suggestions for how to improve accuracy in small group modeling settings for a more equitable, engaging, and adaptive collaborative learning environment.

SESSION: Doctoral Consortium

Adaptive Context-Aware Planning Support for Students with Autism

Managing university life is challenging for any student, including fast-paced self-reliant learning, first experiences with independent daily living, and the social demands of student life. These struggles are even more pronounced in students with Autism Spectrum Disorder (ASD), who might face additional difficulties, including interpersonal deficits, organisational challenges, lacking self-advocacy skills and sensory overload. This thesis uses a participatory design process to research how an adaptive, context-aware system could support students with autism in planning their student life, including managing time, tasks, stress and sensory stimulation. The first interviews resulted in a focus on intelligent planning systems with a low-effort interaction design. The design process will incrementally build towards intelligent support for personalised interactive scheduling as well as in-the-moment next-action recommendations that are context-aware and adaptive to data on the user’s stress level and sensory stimulation exposure.

Combining Heterogeneous Embeddings for Knowledge-Aware Recommendation Models

In the last few years, Knowledge-Aware Recommender Systems (KARSs) got an increasing interest in the community thanks to their ability at encoding diverse and heterogeneous data sources, both structured (such as knowledge graphs) and unstructured (such as plain text). Indeed, as shown by several shreds of evidence, thanks to the combination of such information, KARSs are able to provide competitive performances in several scenarios. In particular, state-of-the-art KARSs leverage the current wave of deep learning and are able to process and exploit large corpora of information that provide complementary and useful characteristics of the items, including knowledge graphs, descriptive properties, reviews, text, and multimedia content. The objective of my Ph.D. is to investigate methods to design and develop knowledge-aware recommendation models based on the merging of heterogeneous embeddings. Based on the combination of diverse information sources, I plan to develop novel models able to provide accurate, fair, and explainable recommendations.

Fairness and Sustainability in Multistakeholder Tourism Recommender Systems

In the travel industry, Tourism Recommender Systems (TRS) are gaining popularity as they simplify trip planning for travelers by offering personalized recommendations for accommodations, activities, destinations, and more. Ensuring fairness in TRS involves considering the needs and viewpoints of different stakeholders, including consumers, item providers, the platform, and society. Although previous research has focused on fairness in TRS from a multistakeholder perspective, little attention has been given to generating sustainable recommendations.

This doctoral thesis introduces the concept of Societal Fairness (S-Fairness) to consider the impact of tourism on non-participating stakeholders (society) such as residents, who may be affected by tourism issues such as increased housing prices, environmental pollution, and traffic congestion. The objective of this research is to contribute to the field of TRS by (1) modeling sustainability for societal fairness, (2) developing a fair multistakeholder TRS that balances sustainability concerns with other stakeholders while minimizing trade-offs, and (3) evaluating the approach through user studies and offline dataset evaluation to ensure user acceptance of recommendations.

How To Model Users Through Their Abilities: A Methodological Perspective On Ability-Based Design

Ability-Based Design offers a design approach to support practitioners in creating accessible and indivually optimized systems through a shift in focus on abilities instead of disabilities. As such, it could help to minimize moments of exclusion through technology and offer support for people, especially during times of crisis or social distancing as caused by the COVID-19 pandemic. However, in its current theoretically coined state, the approach includes no instruction on how to achieve this shift in mindset and how to apply a focus on abilities when designing systems. Within my thesis, I address this gap in research by analyzing ways for modeling users based on their abilities. For this, I propose ability models as user representation, identify the challenges as well as potentials of such ability models, and examine how this approach for the personalization of systems can help and support user populations. The overall aim and expected benefit of this research is to provide adequate techniques on how to include abilities into design considerations as well as an understanding of what accommodating for abilities might change, achieve or effect in terms of HCI research and design.

Overcoming Customisation Challenges in Information Dashboards

Scalable and Explainable Linear Shallow Autoencoders for Collaborative Filtering from Industrial Perspective

The popularity of linear shallow autoencoders for collaborative filtering is growing in the research community, and internet industry providers of Recommender Systems are also taking notice. However, despite their simplicity and accuracy, these models often cannot be used in real-world industrial recommender systems due to their inability to scale to very large interaction matrices. Our research aims to address this issue by developing a scalable, explainable, and accurate shallow linear autoencoder method for collaborative filtering that meets the demands of real-world recommenders. In this paper, we present our industrial Ph.D. research project, which includes: (1) the development of a scalable method called ELSA and the adaptation of the method to a large real-world recommender and (2) the creation of a framework to visualize the recommender systems insights based on modeling the distribution of retrieval metrics in latent user space. We discuss the current status of our project, the key steps to finish the project, and the possible future extensions after the dissertation.

Sustainability-oriented Recommender Systems

The Influence of Media Bias on News Recommender Systems

Currently I am at the beginning of my fourth year of a structured PhD programme with an expectation to graduate in May 2024. The advancement of Internet technology has led to the proliferation of accessible online news media, which has overwhelmed people’s lives. Online news platforms have developed personalised recommendation systems to help readers avoid information overload and enhance their experience. However, the filter bubble, one of the side effects of personalised news recommendations, has received severe criticism for limiting readers’ perspectives. Media bias, which is one of the factors causing the “filter bubble” phenomenon, is widely present in news media. It has been extensively studied in the field of social sciences due to its unconscious distortion of readers’ views. Although many studies have focused on examining the effect of media bias on users and their political choices, there is still a lack of direct research on the impact of media bias on news dissemination platforms, such as personalised news recommender systems. My PhD research project aims to explore the influence of media bias on news recommender systems, and understand the factors that accelerate the recommendation of biased news to readers. To help algorithm designers gain insight into the sensitivity of proposed recommendation algorithms to media bias, and to design debiasing algorithms to weaken the impact of media bias on news recommender systems.

SESSION: Tutorials

Accountable Knowledge-aware Recommender Systems

Knowledge-aware algorithms represent one of the most innovative research directions in the area of recommender systems. The use of different types of content representation requires new methods to extract descriptive features to adopt in the recommendation process. The literature on knowledge-aware recommender systems is actually rich and constantly evolving in terms of both techniques and software libraries to implement them. This makes also difficult to define reproducible recommendation pipelines, making the accountability of recommender systems a challenge. This tutorial aims to discuss the most recent trends in the area of knowledge-aware recommender systems, including novel representation methods for textual content, and discuss how to implement reproducible pipelines for knowledge-aware recommender systems. We pursue our goals by using a comprehensive Python framework called ClayRS1 to deal with knowledge-aware recommender systems. We would like to provide: (i) common ground for researchers and practitioners interested in the latest knowledge-aware techniques for user modeling and recommender systems; (ii) a practical way for implementing the whole recommendation pipeline, ranging from the content processing for text to the generation of recommendations and the evaluation of their performance.

Tutorial on User Profiling with Graph Neural Networks and Related Beyond-Accuracy Perspectives

The proposed tutorial aims to introduce the UMAP community to modern user profiling approaches leveraging graph neural networks (GNNs). We will begin by discussing the conceptual foundations of user profiling and GNNs and providing a literature review of the two topics. We will then present a systematic overview of the state-of-the-art GNN architectures designed for user profiling, including the types of data that are typically used for this purpose. We will also discuss ethical considerations and beyond-accuracy perspectives (i.e. fairness and explainability), which can arise within the potential applications of adopting GNNs for user profiling. In the practical session of the tutorial, attendees will have the opportunity to understand concretely how recent GNN models for user profiling are built and trained with open-source tools and publicly available datasets. The audience will also be engaged in investigating the impact of the presented models on case studies involving bias detection and mitigation, as well as user profiles explanations. The tutorial will end with an analysis of existing and emerging open challenges in the field and their future research directions.

User Models as Digital Twins: Using Webassembly Techniques to ensure Privacy, Transparency and Control in Personalization

The half-day tutorial demonstrates how web-assembly techniques can be used to create sandboxed user models as digital twins within web- and mobile applications. Such models can learn from the user’s behaviour and personalise the application to the user, while ensuring transparency of the model, and ability for the user to experiment and control the personalization. The advantage of using webassembly techniques to implement digital twin user models is that the user model is a plugin, insulated from the application, and thus the protecting the privacy of the user model.