Schema Inference

Schema inference is the process by which Deephaven automatically determines the structure (columns, types, etc.) of a dataset from its source format. This reduces manual effort and helps ensure that tables are created with the correct schema for downstream analysis.

Deephaven provides schema generation tools to make schema creation easier. Schemas can be inferred from a variety of data sources, each with its own process and considerations.

This page provides an overview of schema inference in Deephaven. For details on inferring schemas from specific data formats, see the following articles:

  • CSV Schema Inference: Learn how Deephaven infers schemas from CSV files, including how column types are determined and best practices for preparing your data.
  • JDBC Schema Inference: Understand how schemas are generated from JDBC data sources and what to consider when importing from relational databases.
  • JSON Schema Inference: See how Deephaven infers schemas from JSON files, including support for nested and semi-structured data.
  • XML Schema Inference: Learn how Deephaven infers schemas from XML files, including options for element types and attribute handling.
  • Avro & Protobuf Schema Inference: See how Deephaven infers schemas from Avro schemas and Protobuf descriptors, especially in streaming and Kafka contexts.

Refer to the linked articles for format-specific guidance and examples.

When to use schema inference

Use schema inference when:

  • You want to avoid manual schema creation for large or complex datasets.
  • You are onboarding new data sources and want to quickly prototype table structures.
  • Your source data structure may change, and you want Deephaven to adapt automatically.