![]() |
|||
|
Friday November 20, 2009. 4:00 pm
Room 302 HRBB
Sketching asynchronous data streams over sliding windows
Bojian Xu.
Department of Computer Science and Engineering, Texas A&M University
Abstract
Many real world data naturally arrive as streams. Examples include network traffic at a router and the sequence of accesses to a large database. These streaming data need to be monitored online for various reasons, such anomaly detection, load balancing and even helping make business decisions. However, due to the large size of such streaming data, conventional data processing methods, such as storing the data in a database and issuing offline SQL queries, are not feasible.
In this talk, I will introduce the data stream processing phenomenon, followed by the proposition of the new {em asynchronous data streams} model, motivated by applications involving network data. I will introduce a sampling technique for sketching the recent data elements over asynchronous data streams. This small space sketch can return provably error-bounded estimates for two basic aggregates over the relevant stream elements: sum and median. I will conclude the talk by a quick overview of the followup works on more generalized time-decayed asynchronous data stream processing, as well as related open problems.
Biography
Bojian Xu received his B.E. in Computer Science and Engineering from Zhejiang University, China in 2000. He worked for China Mobile Communications Corporation from 2000 to 2004. After spending the Fall 2004 semester as a master student in the Computer Science Department of the University of Alabama, he joined the department of Electrical and Computer Engineering of Iowa State University, where he will be graduating with a Ph.D. in Computer Engineering in December 2009. He is currently a Senior Research Associate in the department of Computer Science and Engineering at Texas A&M University, working with Professor Jeffrey Scott Vitter. His research interests are in developing algorithms and systems for managing large data sets. He has been working on managing distributed massive streaming data with the presence of memory and energy constraints, and is now more focused on compressed data structures for indexing massive data sets.
Parasol Home | Research | People | General info | Seminars | Resources Parasol Lab, 301 Harvey R. Bright Bldg, 3112 TAMU, College Station, TX 77843-3112 Contact Webmaster Phone 979.458.0722 Fax 979.458.0718
Department of Computer Science and Engineering | Dwight Look College of Engineering | Texas A&M University Privacy statement: Computer Science and Engineering Engineering TAMU |