Personal tools
Home Events Events Archive 2010 BARAC: An Effective Presentation of Ranked Structured Datasets

BARAC: An Effective Presentation of Ranked Structured Datasets

— filed under:

What
  • Visitor Seminars
When Mar 05, 2010
from 02:00 PM to 03:00 PM
Where Engr IV Maxwell Room 57-124
Add event to calendar vCal
iCal

Julia Stoyanovich
University of Pennsylvania

Friday, March 5, 2010 at 2:00pm
Engr IV Maxwell Room 57-124

Abstract
In online applications such as Yahoo! Personals and Yahoo! Real Estate users define structured profiles in order to find potentially interesting matches. Typically, profiles are evaluated against large datasets and produce thousands of matches. In addition to filtering, users also specify ranking in their profile, and matches are returned in the form of a ranked list. Top results in a ranked list are often homogeneous, which hinders data exploration. For example, a user looking for 1- or 2-bedroom apartments sorted by price will see a large number of cheap 1-bedrooms in undesirable neighborhoods before seeing any apartments with different characteristics. An alternative to ranking is to group matches on common attribute values (e.g., cheap 1-bedrooms in good neighborhoods, 2-bedrooms with 2 baths, etc.). However, not all groups will be of interest to the user given his ranking criteria. We argue here that neither single-list ranking nor attribute-based grouping is adequate for effective exploration of ranked datasets. We formalize rank-aware clustering and develop BARAC, a novel clustering algorithm that enables rank-aware data exploration in domains with a large number of heterogeneous attributes. We present results of a large-scale user study that validate the effectiveness of our approach. We extensively evaluate the performance of our algorithm over large datasets from Yahoo! Personals, a leading online dating site.

Biography
Julia Stoyanovich is a Postdoctoral Researcher and a Computing Innovations Fellow at the University of Pennsylvania. Julia holds M.S. and Ph.D. degrees in Computer Science from Columbia University, and a B.S. in Computer Science and in Mathematics and Statistics from the University of Massachusetts at Amherst. After receiving her B.S. Julia went on to work for two start-ups and one real company in New York City, where she interacted with a variety of massive datasets. Julia's industry experience convinced her that many practical data management challenges remain to be tackled, and that she does not like to wake up early in the morning, prompting her return to academia. Julia's research focuses on improving search, ranking, and data exploration in semantically rich application domains. She is particularly excited about the challenges that arise in life sciences applications and in social information processing.

Document Actions