Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

Juggling Hats…At a Startup

less than 1 minute read

Published: April 26, 2021

I started as a business/data analyst, pivoted to being a data engineer who was focusing on building infrastructure but was also acting as an analytics engineer who was providing visibility to teams on their products…because that couldn’t wait obviously (it’s a startup, duh). Soon I became the go-to data person for almost everyone in the company. That was the good part actually because people were craving for visibility and I was making dashboards and standardizing the KPIs…..

Evaluating Ceph Deployments with Rook At CERN

less than 1 minute read

Published: October 29, 2018

CERN has been using Ceph since 2013. In addition to operating one of the largest Ceph clusters, it is also an active contributor to Ceph community. CERN benefits from Ceph in several different ways, including:

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

Risk factors and outcomes of delirium in hospitalized older adults with COVID-19: A systematic review and meta-analysis.

Published in Aging and health research., 2023

Older adults with COVID-19 are more likely to present with atypical symptoms, notably delirium. The main objective of this meta-analysis is to identify risk factors for delirium and outcomes of delirium in hospitalized older adults (65 years or above) with COVID-19.

Download Paper

VIZARD: Improving Visual Data Literacy With Large Language Models

Published in VLDB 2024 Workshop: International Workshop on Big Data Visual Exploration and Analytics (BigVis 2024)., 2024

Data visualizations are commonplace in both our professional and personal lives. From workplace dashboards to our health charts to spending trackers—we interact with them almost every day. How- ever, despite their crucial role in communicating information to us, many people still struggle to effectively use these tools and draw meaningful insights from them. This issue is particularly acute in developing countries where language barriers and limited tech- nology skills present additional challenges for data visualization literacy. In this paper, we present Vizard: a dashboard companion that uses large language models to analyze data visualizations for users and explain their elements in their language of choice, as well as providing insights and recommendations based on the trends observed according to the user’s industry and job role. We pair this with a novel framework for evaluating visualization literacy which uses procedurally generated questions that are tailored to partici- pants’ interests and current visualization literacy level. We make Vizard open source to encourage more research in this direction.

Download Paper

Towards Semi-Supervised Data Quality Detection In Graphs

Published in VLDB 2024 Workshop: 13th International Workshop on Quality in Databases (QDB’24)., 2024

Graph databases have emerged as a powerful tool for representing and analyzing complex relationships in various domains, includ- ing social networks, healthcare, and financial systems. Despite their growing popularity, data quality issues such as node dupli- cation, missing nodes or edges, incorrect formats, stale data, and misconfigured topology remain prevalent. While there are numer- ous libraries and approaches for addressing data quality in tabular data, graph-structured data pose unique challenges of their own. In this paper, we explore an automated approach for detecting data quality issues in graph structured data which focuses on both node attributes and relationships. Since data quality is often governed by pre-established rules and is highly context-dependent, our ap- proach seeks to balance rule-based control with the automation potential of machine learning. We investigate the capabilities of graph convolutional networks (GCNs) and large language models (LLMs) at detecting data quality issues using a few-shot learning approach. We evaluate the data quality detection rates of these models on a graph dataset and compare their effectiveness and potential impact on improving data quality. Our results indicate that LLMs exhibit robust generalization capabilities from limited samples while GCNs offer distinct advantages in certain contexts.

Rubab Zahra Sarfraz

Sitemap

Pages

Page Not Found

About Me

Archive Layout with Content

Posts by Category

Posts by Collection

CV

Markdown

Page not in menu

Page Archive

Portfolio

Publications

Sitemap

Posts by Tags

Talk map

Talks and presentations

Teaching

Terms and Privacy Policy

Blog posts

Jupyter notebook markdown generator

Posts

Juggling Hats…At a Startup

Evaluating Ceph Deployments with Rook At CERN

portfolio

Portfolio item number 1

Portfolio item number 2

publications

Risk factors and outcomes of delirium in hospitalized older adults with COVID-19: A systematic review and meta-analysis.

VIZARD: Improving Visual Data Literacy With Large Language Models

Towards Semi-Supervised Data Quality Detection In Graphs

talks

Elevating Trust In Your Data With Python: A Practical Guide

teaching

Teaching Assistant: Programming Fundamentals - Python

Teaching Assistant: Advanced Operating Systems

Instructor: Introduction To Data Science