Publications

Towards Semi-Supervised Data Quality Detection In Graphs

13th International Workshop on Quality in Databases (QDB) | VLDB 2024

Rubab Zahra Sarfraz

Graph databases have emerged as a powerful tool for representing and analyzing complex relationships in various domains, including social networks, healthcare, and financial systems. Despite their growing popularity, data quality issues such as node duplication, missing nodes or edges, incorrect formats, stale data, and misconfigured topology remain prevalent. While there are numerous libraries and approaches for addressing data quality in tabular data, graph-structured data pose unique challenges of their own. In this paper, we explore an automated approach for detecting data quality issues in graph structured data which focuses on both node attributes and relationships. Since data quality is often governed by pre-established rules and is highly context-dependent, our approach seeks to balance rule-based control with the automation potential of machine learning. We investigate the capabilities of graph convolutional networks (GCNs) and large language models (LLMs) at detecting data quality issues using a few-shot learning approach. We evaluate the data quality detection rates of these models on a graph dataset and compare their effectiveness and potential impact on improving data quality. Our results indicate that LLMs exhibit robust generalization capabilities from limited samples while GCNs offer distinct advantages in certain contexts.

PDF Video

Vizard: Improving Visual Data Literacy With Large Language Models

7th International Workshop on Big Data Visual Exploration and Analytics (BigVis) | VLDB 2024

Rubab Zahra Sarfraz, Samar Haider

Data visualizations are commonplace in both our professional and personal lives. From workplace dashboards to our health charts to spending trackers—we interact with them almost every day. However, despite their crucial role in communicating information to us, many people still struggle to effectively use these tools and draw meaningful insights from them. This issue is particularly acute in developing countries where language barriers and limited technology skills present additional challenges for data visualization literacy. In this paper, we present Vizard: a dashboard companion that uses large language models to analyze data visualizations for users and explain their elements in their language of choice, as well as providing insights and recommendations based on the trends observed according to the user’s industry and job role. We pair this with a novel framework for evaluating visualization literacy which uses procedurally generated questions that are tailored to participants’ interests and current visualization literacy level. We make Vizard open source to encourage more research in this direction.

PDF Video

Risk Factors and Outcomes of Delirium in Hospitalized Older Adults with COVID-19: A Systematic Review and Meta-analysis.

Aging and Health Research (AHR) | Elsevier

Nida Munawar, Rubab Zahra Sarfraz, Maria Costello, David Robinson, Colm Bergin, Elaine Greene

Older adults with COVID-19 are more likely to present with atypical symptoms, notably delirium. The main objective of this meta-analysis is to identify risk factors for delirium and outcomes of delirium in hospitalized older adults (65 years or above) with COVID-19. Our review identifies key factors associated with increased risk of developing delirium in hospitalized older adults with COVID-19. Identification of patients at risk of delirium and attention to these factors early during admission may improve outcomes for this vulnerable cohort.

PDF