SEA Data 2020

The 1st Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores

Co-located with EDBT/ICDT 2020 (, Copenhagen, Denmark)

Visit the latest edition

SEA Data workshop will provide a forum for researchers and practitioners to exchange ideas, results, and visions on challenges in data management, information extraction, exploration, and analysis of heterogeneous data and multiple data models at once.

Companies, governments, and organizations are now producing and collecting data from multiple heterogeneous sources, such as transactional data, internet traffic, logs, IoT applications, knowledge bases, and much more. The unprecedented pace in which data is produced and consumed calls for methods that organize, retrieve, and analyze such data appropriately. While traditionally data were organized into homogeneous datastores and formats, our current data collection from multiple different sources makes such datastores impractical. Even within the same organization, data dwells in independent silos each with a distinct data model and serving a specific application, keeping relevant portions of the data separate from each other.

As a consequence, we have witnessed an increasing interest in systems and methods that try to handle and analyze multiple data sources and formats holistically. Data-lakes and polystores are the most prominent examples of such heterogeneous datastores. Moreover, graphs and learned databases have recently attracted the attention of the community for their flexibility in modeling, managing, and organizing heterogeneous data. Due to the fast pace of data collection and evolution, consolidating all the sources into a single data format and loading them into a single store is usually impractical.

Hence, the first challenge that these systems face is to provide flexible storage and retrieval methods that can adapt to multiple models and domains. On the other hand, from the user perspective, when such diverse data is collected, the tasks of data discovery, exploration, and analysis become even more challenging. These solutions in the case of heterogeneous datastores remain still widely uncharted for a lack of established methods that allow effective multi-model data retrieval and exploration. Data analytics should also accommodate issues due to the lack of shared dimensions, ambiguous semantics, and the need to ensure the quality and lineage of the analytical result.

Workshop Chairs

Topics

SEA Data aims at gathering researchers and practitioners from various communities related to databases. We gladly accept submissions that present initial ideas and visions, just as much as reports on early results, or reflections on completed projects. The workshop will focus on discussion and interaction, rather than static presentations of what is in the paper. A list of relevant topics is presented below

  • Querying and analyzing data lakes and polystores;
  • Cross-platform query processing and analytics;
  • Theory of heterogeneous data management;
  • Multi-model data exploration and analysis;
  • Novel user interfaces and query paradigm for searching heterogeneous data;
  • Exploration and search for heterogeneous unstructured and semi-structured data (e.g., knowledge graphs, web documents, semantic web);
  • Exploration of large datasets including multiple sources;
  • Data visualization of heterogeneous data;
  • Example-based search and discovery for heterogeneous data;
  • User-driven approaches on complex datasets;
  • Novel analyses involving multiple data sources;
  • Federated search, exploration, and analysis;
  • Approximate, anytime, and fast algorithms for extracting information from heterogeneous datastores;
  • Learnable structures for multi-model datasets;
  • Self-assembling data management systems;

Program Committee


Workshop Program

Program Schedule
Welcome & Intro

Discussion & Questions about: Keynote: Knowledge graph analytics with AvantGraph, a new main-memory property graph analytics engine

George Fletcher

Discussion and Questions: Part 1

Active Learning for Spreadsheet Cell Classification

Julius Gonsior, Josephine Rehak, Maik Thiele, Elvis Koci, Michael Günther, and Wolfgang Lehner

FacetX: Dynamic Facet Generation for Advanced Information Filtering of Search Results

Raffael Affolter and Andreas Weiler

Data Virtual Machines: Data-Driven Conceptual Modeling of Big Data Infrastructures

Damianos Chatziantoniou and Verena Kantere

Discussion and Questions: Part 2

Toward Visual Interactive Exploration of Heterogeneous Graphs

Irène Burger, Ioana Manolescu, Emmanuel Pietriga, and Fabian Suchanek

REMA: Graph Embeddings-based Relational Schema Matching

Christos Koutras, Marios Fragkoulis, Asterios Katsifodimos, and Christoph Lofi

Optimizing Federated Queries Based on the Physical Design of a Data Lake

Philipp D. Rohde and Maria-Esther Vidal

General Discussion and Questions

Common challenges, research directions, visions on Search, Exploration, and Analysis in Heterogeneous Datastores

Keynotes:

Knowledge graph analytics with AvantGraph, a new main-memory property graph analytics engine
by George Fletcher

Abstract:

Knowledge graphs derived from heterogeneous data sources are common in a range of contemporary data analytics applications. The property graph data model is typically adopted by practical tools for representing and analyzing knowledge graphs. Indeed, property graphs are increasingly common in domains as diverse as e-commerce, sociology, biology, transportation, and public safety. As property graph databases grow in size and complexity, current graph data analytics solutions struggle with scalability and efficiency.
Towards addressing these challenges to next-generation data analytics, at Eindhoven University of Technology we are developing AvantGraph, a new main-memory property graph analytics engine built from the ground up for scalable and efficient complex graph analytics. The team, led by Nikolay Yakovets and George Fletcher, aims for the first open-source release of the full system in the very near future.
In this talk we give an overview of the design principles and architecture of AvantGraph, including insights into advanced features such as support for temporal and recursive graph analytics and worst-case optimal join processing. We also sketch the road-map for the future of AvantGraph, indicating interesting scientific challenges for the knowledge graph analytics research community.

Speaker Bio:

George Fletcher (PhD, Indiana University Bloomington, 2007) is an associate professor of computer science and chair of the Database Group at Eindhoven University of Technology.
His research interests span query language design and engineering, foundations of databases, and data integration. His current focus is on management of complex graphs such as social and biological networks and knowledge graphs. He is co-author of the book "Querying Graphs" (Morgan and Claypool, 2018) on contemporary graph data management and is contributor in the graph query and schema language international standardization efforts of the LDBC.

Venue

SEA Data will be co-located with the EDBT/ICDT 2020 Joint Conference, to be hosted in Copenhagen, Denmark.

Scandic Falkoner Congress Center
Falkoner Alle, 9
Frederiksberg
2000
Copenhagen, Denmark.
The Workshop will be ONLINE!

Follow us on twitter @SEADataConf for #SeaData2020 #EDBT2020 #ICDT2020 .

Submission Guidelines

All submissions will be electronic via the Easychair submission system.

Regular research papers as well as system papers have a page limit of 6 pages (references included).

Vision papers, work-in-progress papers, experiments papers, and experiences papers have a page limit of 4 pages (references included).

SEA Data workshop 2020 is single-blind, and thus authors must include their names and affiliations in submissions.

All workshop papers will be published online at CEUR.

Formatting

Papers must follow the ACM Proceedings Format (available here).

Please make sure you are using the latest 2020 version and use \documentclass[sigconf]{acmart}.

The font size, margins, inter-column spacing, and line spacing in the templates must be kept unchanged.

Any submitted paper violating the length, file type, or formatting requirements will be rejected without review.