SEA Data 2020

The 1st Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores

Co-located with EDBT/ICDT 2020 (30 March 2020, Copenhagen, Denmark)

SEA Data workshop will provide a forum for researchers and practitioners to exchange ideas, results, and visions on challenges in data management, information extraction, exploration, and analysis of heterogeneous data and multiple data models at once.

Companies, governments, and organizations are now producing and collecting data from multiple heterogeneous sources, such as transactional data, internet traffic, logs, IoT applications, knowledge bases, and much more. The unprecedented pace in which data is produced and consumed calls for methods that organize, retrieve, and analyze such data appropriately. While traditionally data were organized into homogeneous datastores and formats, our current data collection from multiple different sources makes such datastores impractical. Even within the same organization, data dwells in independent silos each with a distinct data model and serving a specific application, keeping relevant portions of the data separate from each other.

As a consequence, we have witnessed an increasing interest in systems and methods that try to handle and analyze multiple data sources and formats holistically. Data-lakes and polystores are the most prominent examples of such heterogeneous datastores. Moreover, graphs and learned databases have recently attracted the attention of the community for their flexibility in modeling, managing, and organizing heterogeneous data. Due to the fast pace of data collection and evolution, consolidating all the sources into a single data format and loading them into a single store is usually impractical.

Hence, the first challenge that these systems face is to provide flexible storage and retrieval methods that can adapt to multiple models and domains. On the other hand, from the user perspective, when such diverse data is collected, the tasks of data discovery, exploration, and analysis become even more challenging. These solutions in the case of heterogeneous datastores remain still widely uncharted for a lack of established methods that allow effective multi-model data retrieval and exploration. Data analytics should also accommodate issues due to the lack of shared dimensions, ambiguous semantics, and the need to ensure the quality and lineage of the analytical result.

Workshop Chairs

Davide Mottin, Aarhus University
Matteo Lissandrini, Aalborg University
Yannis Velegrakis, University of Trento & Utrecht University

Important Dates

Workshop: 1:00pm-4:00pm — 30.03.2020

Workshop Program

Conference program

Submission types

Regular research and system papers (up to 6 pages)
Vision & work-in-progress papers (up to 4 pages)
Experiments & experiences (up to 4 pages)

Accepted papers

Workshop proceedings

Topics

SEA Data aims at gathering researchers and practitioners from various communities related to databases. We gladly accept submissions that present initial ideas and visions, just as much as reports on early results, or reflections on completed projects. The workshop will focus on discussion and interaction, rather than static presentations of what is in the paper. A list of relevant topics is presented below

Querying and analyzing data lakes and polystores;
Cross-platform query processing and analytics;
Theory of heterogeneous data management;
Multi-model data exploration and analysis;
Novel user interfaces and query paradigm for searching heterogeneous data;
Exploration and search for heterogeneous unstructured and semi-structured data (e.g., knowledge graphs, web documents, semantic web);
Exploration of large datasets including multiple sources;
Data visualization of heterogeneous data;
Example-based search and discovery for heterogeneous data;
User-driven approaches on complex datasets;
Novel analyses involving multiple data sources;
Federated search, exploration, and analysis;
Approximate, anytime, and fast algorithms for extracting information from heterogeneous datastores;
Learnable structures for multi-model datasets;
Self-assembling data management systems;

Program Committee

Manos Athanassoulis (Boston University)
Nikolaus Augsten (University of Salzburg)
Hamdi Ben Hamadou (Aalborg University)
Sonia Bergamaschi (University of Modena and Reggio Emilia)
Nikos Bikakis (University of Ioannina)
Gautam Das (University of Texas at Arlington)
Anastasia Dimou (Ghent University)
Laura Di Rocco (Northeastern University)
Daniele Foroni (Huawei)
Johan-Christoph Freytag (Humboldt University Berlin)
Paul Groth (University of Amsterdam)
Francesco Guerra (University of Modena and Reggio Emilia)
Christian S. Jensen (Aalborg University)
Panos Karras (Aarhus University)
Arijit Khan (Nanyang Technological University)
Haridimos Kondylakis (Foundation of Research & Technology-Hellas)
Georgia Koutrika (Athena Research Center)
Ioana Manolescu (INRIA)
Renée Miller (Northeastern University)
Felix Naumann (Hasso Plattner Institute)
Themis Palpanas (Paris Descartes University)
Paolo Papotti (EURECOM)
Giulia Preti (University of Trento)
Petra Selmer (Neo4j)
Gianmaria Silvello (University of Padua)
Giovanni Simonini (University of Modena and Reggio Emilia)
Paolo Sottovia (Huawei)
Fabian Suchanek (Télécom Paris University)
Letizia Tanca (Politecnico di Milano)
Daniel Ting (Tableau)
Riccardo Torlone (University Roma Tre)
Aikaterini Tzompanaki (University of Cergy-Pontoise)
Cong Yu (Google)
Kostas Zoumpatianos (Harvard University)

Workshop Program

Program Schedule
Welcome & Intro	13:30
Discussion & Questions about: Keynote: Knowledge graph analytics with AvantGraph, a new main-memory property graph analytics engine George Fletcher
Discussion and Questions: Part 1	13:50
Active Learning for Spreadsheet Cell Classification Julius Gonsior, Josephine Rehak, Maik Thiele, Elvis Koci, Michael Günther, and Wolfgang Lehner
FacetX: Dynamic Facet Generation for Advanced Information Filtering of Search Results Raffael Affolter and Andreas Weiler
Data Virtual Machines: Data-Driven Conceptual Modeling of Big Data Infrastructures Damianos Chatziantoniou and Verena Kantere
Discussion and Questions: Part 2	14:20
Toward Visual Interactive Exploration of Heterogeneous Graphs Irène Burger, Ioana Manolescu, Emmanuel Pietriga, and Fabian Suchanek
REMA: Graph Embeddings-based Relational Schema Matching Christos Koutras, Marios Fragkoulis, Asterios Katsifodimos, and Christoph Lofi
Optimizing Federated Queries Based on the Physical Design of a Data Lake Philipp D. Rohde and Maria-Esther Vidal
General Discussion and Questions Common challenges, research directions, visions on Search, Exploration, and Analysis in Heterogeneous Datastores	14:50

Keynotes:

Knowledge graph analytics with AvantGraph, a new main-memory property graph analytics engine
by George Fletcher

Abstract:

Knowledge graphs derived from heterogeneous data sources are common in a range of contemporary data analytics applications. The property graph data model is typically adopted by practical tools for representing and analyzing knowledge graphs. Indeed, property graphs are increasingly common in domains as diverse as e-commerce, sociology, biology, transportation, and public safety. As property graph databases grow in size and complexity, current graph data analytics solutions struggle with scalability and efficiency.
Towards addressing these challenges to next-generation data analytics, at Eindhoven University of Technology we are developing AvantGraph, a new main-memory property graph analytics engine built from the ground up for scalable and efficient complex graph analytics. The team, led by Nikolay Yakovets and George Fletcher, aims for the first open-source release of the full system in the very near future.
In this talk we give an overview of the design principles and architecture of AvantGraph, including insights into advanced features such as support for temporal and recursive graph analytics and worst-case optimal join processing. We also sketch the road-map for the future of AvantGraph, indicating interesting scientific challenges for the knowledge graph analytics research community.

Speaker Bio:

George Fletcher (PhD, Indiana University Bloomington, 2007) is an associate professor of computer science and chair of the Database Group at Eindhoven University of Technology.
His research interests span query language design and engineering, foundations of databases, and data integration. His current focus is on management of complex graphs such as social and biological networks and knowledge graphs. He is co-author of the book "Querying Graphs" (Morgan and Claypool, 2018) on contemporary graph data management and is contributor in the graph query and schema language international standardization efforts of the LDBC.

Venue

SEA Data will be co-located with the EDBT/ICDT 2020 Joint Conference, to be hosted in Copenhagen, Denmark.

~~Scandic Falkoner Congress Center~~
Falkoner Alle, 9
Frederiksberg 2000
Copenhagen, Denmark.

The Workshop will be ONLINE!

Submission Guidelines

All submissions will be electronic via the Easychair submission system.

Regular research papers as well as system papers have a page limit of 6 pages (references included).

Vision papers, work-in-progress papers, experiments papers, and experiences papers have a page limit of 4 pages (references included).

SEA Data workshop 2020 is single-blind, and thus authors must include their names and affiliations in submissions.

All workshop papers will be published online at CEUR.

Formatting

Papers must follow the ACM Proceedings Format (available here).

Please make sure you are using the latest 2020 version and use \documentclass[sigconf]{acmart}.

The font size, margins, inter-column spacing, and line spacing in the templates must be kept unchanged.

Any submitted paper violating the length, file type, or formatting requirements will be rejected without review.

SEA Data 2020

The 1st Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores

Co-located with EDBT/ICDT 2020 (30 March 2020, Copenhagen, Denmark)

Workshop Chairs

Important Dates

Submission types

Topics

Program Committee

Workshop Program

Discussion & Questions about: Keynote: Knowledge graph analytics with AvantGraph, a new main-memory property graph analytics engine

Discussion and Questions: Part 1

Active Learning for Spreadsheet Cell Classification

FacetX: Dynamic Facet Generation for Advanced Information Filtering of Search Results

Data Virtual Machines: Data-Driven Conceptual Modeling of Big Data Infrastructures

Discussion and Questions: Part 2

Toward Visual Interactive Exploration of Heterogeneous Graphs

REMA: Graph Embeddings-based Relational Schema Matching

Optimizing Federated Queries Based on the Physical Design of a Data Lake

General Discussion and Questions

Keynotes:

Knowledge graph analytics with AvantGraph, a new main-memory property graph analytics engine by George Fletcher

Abstract:

Speaker Bio:

Venue

The Workshop will be ONLINE!

Submission Guidelines

Formatting

Knowledge graph analytics with AvantGraph, a new main-memory property graph analytics engine
by George Fletcher