Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Heibaiying BigData Notes Hive Create Table

From Leeroopedia


Knowledge Sources
Domains Data_Warehouse, Big_Data
Last Updated 2026-02-10 10:00 GMT

Overview

Concrete tool for creating internal (managed) and external tables with specified column schemas, row formats, and storage formats provided by Apache Hive.

Description

The CREATE TABLE statement in HiveQL defines a new table in the Hive metastore along with its column definitions, data types, serialization format, and physical storage format. The EXTERNAL keyword controls whether Hive manages the lifecycle of the underlying HDFS data. The statement supports Hive's full range of primitive and complex data types, multiple SerDe configurations, and various columnar and row-based storage formats.

The CREATE TABLE ... AS SELECT (CTAS) variant creates a table and populates it with the results of a query in a single atomic operation. The CREATE TABLE ... LIKE variant clones the schema of an existing table without copying data.

Usage

Use CREATE TABLE when defining the schema for new data landing areas, staging tables, fact tables, dimension tables, or any other structured dataset in the Hive warehouse. Choose between internal and external based on data ownership requirements.

Code Reference

Source Location

  • Repository: BigData-Notes
  • File: notes/Hive常用DDL操作.md (Lines 1-453)
  • File: notes/Hive简介及核心概念.md (Lines 105-113, complex types)

Signature

-- Full CREATE TABLE syntax
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  (col_name data_type [COMMENT 'col_comment'], ...)
  [COMMENT 'table_comment']
  [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
  [CLUSTERED BY (col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [ROW FORMAT row_format]
  [STORED AS file_format]
  [LOCATION 'hdfs_path']
  [TBLPROPERTIES (key='value', ...)];

-- Row format options
ROW FORMAT DELIMITED
  [FIELDS TERMINATED BY char]
  [COLLECTION ITEMS TERMINATED BY char]
  [MAP KEYS TERMINATED BY char]
  [LINES TERMINATED BY char]
  [NULL DEFINED AS char]

-- Or custom SerDe
ROW FORMAT SERDE 'serde_class_name'
  [WITH SERDEPROPERTIES (key='value', ...)]

-- Create table from query results
CREATE TABLE table_name AS SELECT ...;

-- Clone schema from existing table
CREATE TABLE table_name LIKE existing_table;

Import

N/A (HiveQL statements executed via Hive CLI or Beeline)

I/O Contract

Inputs

Name Type Required Description
table_name STRING Yes Name of the table to create
col_name / data_type Column definition Yes One or more column definitions; supported types include INT, BIGINT, FLOAT, DOUBLE, DECIMAL, STRING, VARCHAR, CHAR, BOOLEAN, TIMESTAMP, DATE, BINARY, ARRAY<type>, MAP<key_type, value_type>, STRUCT<field:type, ...>
EXTERNAL Keyword No When specified, Hive does not manage the data lifecycle; dropping the table leaves the HDFS data intact
ROW FORMAT SerDe config No Specifies how columns are serialized/deserialized; defaults to LazySimpleSerDe with Ctrl-A field delimiter
STORED AS File format No Storage format: TEXTFILE (default), ORC, PARQUET, SEQUENCEFILE, RCFILE, AVRO
LOCATION STRING (HDFS path) No Custom HDFS directory for the table data; defaults to the database directory
TBLPROPERTIES MAP<STRING, STRING> No Key-value metadata for the table (e.g., 'orc.compress'='SNAPPY')

Outputs

Name Type Description
Table directory HDFS directory Created at the specified or default HDFS location to hold data files
Metastore entry Metadata Table schema, SerDe, storage format, and properties registered in the Hive metastore

Usage Examples

-- Internal table with delimited text format
CREATE TABLE employees (
    emp_id      INT       COMMENT 'Employee identifier',
    name        STRING    COMMENT 'Full name',
    salary      DECIMAL(10,2),
    hire_date   DATE,
    skills      ARRAY<STRING>,
    address     STRUCT<street:STRING, city:STRING, zip:STRING>
)
COMMENT 'Employee master table'
ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\t'
    COLLECTION ITEMS TERMINATED BY ','
STORED AS TEXTFILE;

-- External table over existing Parquet data
CREATE EXTERNAL TABLE IF NOT EXISTS web_logs (
    request_time TIMESTAMP,
    client_ip    STRING,
    method       STRING,
    url          STRING,
    status_code  INT,
    response_size BIGINT
)
STORED AS PARQUET
LOCATION '/data/raw/web_logs';

-- ORC table with Snappy compression
CREATE TABLE order_facts (
    order_id     BIGINT,
    customer_id  INT,
    product_id   INT,
    quantity     INT,
    unit_price   DECIMAL(8,2),
    order_date   DATE
)
STORED AS ORC
TBLPROPERTIES ('orc.compress'='SNAPPY');

-- Create table from query results (CTAS)
CREATE TABLE top_customers AS
SELECT customer_id, SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 10000;

-- Clone schema without data
CREATE TABLE employees_backup LIKE employees;

Related Pages

Implements Principle

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment