Big Data Systems: 2014-Present

The need for real-time and large-scale data processing has led to the development of frameworks for distributed stream processing in clouds. To provide fast, scalable, and fault tolerant stream processing, recent Distributed Stream Processing Systems (DSPS) have proposed to treat streaming workloads as a series of batch jobs, instead of a series of records. Batch-based stream processing systems could process data at high rate, however, it also leads to large end-to-end latency. To minimize the end-to-end latency for batched processing system (Apache Spark Streaming), we develop online algorithm that dynamically adapts block and batch interval based on the workload and operating conditions. I am also interested in online anomaly detection system, which identifies abnormal behaviors in multivariate time series.


Energy Efficient System Design: 2012-2014

Many of today‚Äôs data centers are housing over tens of thousands of servers consuming tens of mega-watts of energy. So, improving energy efficiency within a datacenter will have a huge positive financial impact. For that purpose, we want to develop an energy efficient workload/VM placement algorithms to achieve the same performance but the power consumption is significantly reduced. In the “RESCUE” project, we assign the VMs to different physical machines based on the application specified energy efficiency (ASEE), which quantifies the performance per watt for distinct applications on heterogeneous environment. The “UPS-aware workload allocation” achieves the goal of minimizing the total power consumption of both IT equipments and power losses in rack-level UPSs, by place the new IT workload on different racks according to the UPS efficiency curve and load. For more details, please refer the those two papers.