Content

Speaker: Yicong Huang

Talk abstract: In an era where data-driven decision-making shapes industries, governments, and everyday life, the ability to leverage data science has become an essential skill. Modern data science techniques, including artificial intelligence (AI), machine learning (ML), and large language models (LLMs), offer advanced capabilities but often require programming expertise, limiting accessibility for a broader audience. In this talk, I will discuss my work on Texera, an open-source system designed to make data science, AI, and ML accessible to everyone. I will begin by introducing Texera’s no-code workflow interface and cloud-based platform, which enable users of varying backgrounds to seamlessly collaborate together in data science, providing an experience similar to Google Docs and Overleaf. Next, I will discuss the design choices behind Texera’s actor-based parallel execution engine that enable interactions during workflow execution. I will dive deep into my work on enhancing user interactions with the distributed parallel data engine, focusing on innovative data debugging techniques that improve transparency and usability. Specifically, I will present Udon, a debugger for user-defined functions (UDFs) in data systems, explaining how it allows users to interact with an operator with fine-grained control down to the code-line level. I will then present IcedTea, a time-travel debugger for data workflows, demonstrating how it allows users to interact with distributed operators while ensuring consistency. To conclude, I will outline future research directions of developing an ecosystem that integrates advanced interfaces and intelligent systems, enhancing accessibility, efficiency, and user empowerment in data science.

Bio: Yicong Huang is a final-year Ph.D. candidate from the Information Systems Group (ISG) in the Computer Science Department, University of California, Irvine. Under the guidance of Dr. Chen Li, his research focuses on big data management, data-processing systems, and systems for data science, AI and ML. Yicong has made significant contributions to the Texera project. He has published in top-tier database venues such as SIGMOD and VLDB. His interdisciplinary research spans venues such as TOCHI, PNAS Nexus, JAMIA, AMIA, and PLOS ONE. Yicong completed research internships at ByteDance, VISA, and Observe, where he contributed to patents and papers. At SIGMOD, his research has received a Best Demo Runner-Up Award. He received honors such as the 2024 Graduate Dean’s Dissertation Fellowship and the 2023 Public Impact Fellowship from UCI. For more information about his work, please visit yicong-huang.github.io.

In person event posted in Research