Publications
Towards Semi-Supervised Data Quality Detection In Graphs
13th International Workshop on Quality in Databases (QDB) | VLDB 2024
Rubab Zahra SarfrazGraph databases have emerged as a powerful tool for representing and analyzing complex relationships in various domains, including social networks, healthcare, and financial systems. Despite their growing popularity, data quality issues such as node duplication, missing nodes or edges, incorrect formats, stale data, and misconfigured topology remain prevalent. While there are numerous libraries and approaches for addressing data quality in tabular data, graph-structured data pose unique challenges of their own. In this paper, we explore an automated approach for detecting data quality issues in graph structured data which focuses on both node attributes and relationships. Since data quality is often governed by pre-established rules and is highly context-dependent, our approach seeks to balance rule-based control with the automation potential of machine learning. We investigate the capabilities of graph convolutional networks (GCNs) and large language models (LLMs) at detecting data quality issues using a few-shot learning approach. We evaluate the data quality detection rates of these models on a graph dataset and compare their effectiveness and potential impact on improving data quality. Our results indicate that LLMs exhibit robust generalization capabilities from limited samples while GCNs offer distinct advantages in certain contexts.
PDF VideoVizard: Improving Visual Data Literacy With Large Language Models
7th International Workshop on Big Data Visual Exploration and Analytics (BigVis) | VLDB 2024
Rubab Zahra Sarfraz, Samar HaiderData visualizations are commonplace in both our professional and personal lives. From workplace dashboards to our health charts to spending trackers—we interact with them almost every day. However, despite their crucial role in communicating information to us, many people still struggle to effectively use these tools and draw meaningful insights from them. This issue is particularly acute in developing countries where language barriers and limited technology skills present additional challenges for data visualization literacy. In this paper, we present Vizard: a dashboard companion that uses large language models to analyze data visualizations for users and explain their elements in their language of choice, as well as providing insights and recommendations based on the trends observed according to the user’s industry and job role. We pair this with a novel framework for evaluating visualization literacy which uses procedurally generated questions that are tailored to participants’ interests and current visualization literacy level. We make Vizard open source to encourage more research in this direction.
PDF VideoRisk Factors and Outcomes of Delirium in Hospitalized Older Adults with COVID-19: A Systematic Review and Meta-analysis.
Aging and Health Research (AHR) | Elsevier
Nida Munawar, Rubab Zahra Sarfraz, Maria Costello, David Robinson, Colm Bergin, Elaine GreeneOlder adults with COVID-19 are more likely to present with atypical symptoms, notably delirium. The main objective of this meta-analysis is to identify risk factors for delirium and outcomes of delirium in hospitalized older adults (65 years or above) with COVID-19. Our review identifies key factors associated with increased risk of developing delirium in hospitalized older adults with COVID-19. Identification of patients at risk of delirium and attention to these factors early during admission may improve outcomes for this vulnerable cohort.
PDF