Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
Juggling Hats…At a Startup
Published:
I started as a business/data analyst, pivoted to being a data engineer who was focusing on building infrastructure but was also acting as an analytics engineer who was providing visibility to teams on their products…because that couldn’t wait obviously (it’s a startup, duh). Soon I became the go-to data person for almost everyone in the company. That was the good part actually because people were craving for visibility and I was making dashboards and standardizing the KPIs…..
Evaluating Ceph Deployments with Rook At CERN
Published:
CERN has been using Ceph since 2013. In addition to operating one of the largest Ceph clusters, it is also an active contributor to Ceph community. CERN benefits from Ceph in several different ways, including:
portfolio
Portfolio item number 1
Short description of portfolio item number 1
Portfolio item number 2
Short description of portfolio item number 2
publications
Risk factors and outcomes of delirium in hospitalized older adults with COVID-19: A systematic review and meta-analysis.
Published in Aging and health research., 2023
Older adults with COVID-19 are more likely to present with atypical symptoms, notably delirium. The main objective of this meta-analysis is to identify risk factors for delirium and outcomes of delirium in hospitalized older adults (65 years or above) with COVID-19.
VIZARD: Improving Visual Data Literacy With Large Language Models
Published in VLDB 2024 Workshop: International Workshop on Big Data Visual Exploration and Analytics (BigVis 2024)., 2024
Data visualizations are commonplace in both our professional and personal lives. From workplace dashboards to our health charts to spending trackers—we interact with them almost every day. How- ever, despite their crucial role in communicating information to us, many people still struggle to effectively use these tools and draw meaningful insights from them. This issue is particularly acute in developing countries where language barriers and limited tech- nology skills present additional challenges for data visualization literacy. In this paper, we present Vizard: a dashboard companion that uses large language models to analyze data visualizations for users and explain their elements in their language of choice, as well as providing insights and recommendations based on the trends observed according to the user’s industry and job role. We pair this with a novel framework for evaluating visualization literacy which uses procedurally generated questions that are tailored to partici- pants’ interests and current visualization literacy level. We make Vizard open source to encourage more research in this direction.
Towards Semi-Supervised Data Quality Detection In Graphs
Published in VLDB 2024 Workshop: 13th International Workshop on Quality in Databases (QDB’24)., 2024
Graph databases have emerged as a powerful tool for representing and analyzing complex relationships in various domains, includ- ing social networks, healthcare, and financial systems. Despite their growing popularity, data quality issues such as node dupli- cation, missing nodes or edges, incorrect formats, stale data, and misconfigured topology remain prevalent. While there are numer- ous libraries and approaches for addressing data quality in tabular data, graph-structured data pose unique challenges of their own. In this paper, we explore an automated approach for detecting data quality issues in graph structured data which focuses on both node attributes and relationships. Since data quality is often governed by pre-established rules and is highly context-dependent, our ap- proach seeks to balance rule-based control with the automation potential of machine learning. We investigate the capabilities of graph convolutional networks (GCNs) and large language models (LLMs) at detecting data quality issues using a few-shot learning approach. We evaluate the data quality detection rates of these models on a graph dataset and compare their effectiveness and potential impact on improving data quality. Our results indicate that LLMs exhibit robust generalization capabilities from limited samples while GCNs offer distinct advantages in certain contexts.
talks
teaching
Teaching Assistant: Programming Fundamentals - Python
Workshop, University Of Engineering And Technology, Computer Science $ Engineering Department, 2016
Instructor: Dr. Irfan Ullah Chaudhary
Teaching Assistant: Advanced Operating Systems
Graduate course, Lahore University Of Management Sciences, Computer Science Department, 2018
Instructor: Dr. Muhammad Hamad Alizai
Instructor: Introduction To Data Science
Summer course, Lahore School Of Economics, All Departments, 2023