Implementation:Heibaiying BigData Notes Hive Create Table

Knowledge Sources	BigData-Notes Hive Language Manual
Domains	Data_Warehouse, Big_Data
Last Updated	2026-02-10 10:00 GMT

Overview

Concrete tool for creating internal (managed) and external tables with specified column schemas, row formats, and storage formats provided by Apache Hive.

Description

The CREATE TABLE statement in HiveQL defines a new table in the Hive metastore along with its column definitions, data types, serialization format, and physical storage format. The EXTERNAL keyword controls whether Hive manages the lifecycle of the underlying HDFS data. The statement supports Hive's full range of primitive and complex data types, multiple SerDe configurations, and various columnar and row-based storage formats.

The CREATE TABLE ... AS SELECT (CTAS) variant creates a table and populates it with the results of a query in a single atomic operation. The CREATE TABLE ... LIKE variant clones the schema of an existing table without copying data.

Usage

Use CREATE TABLE when defining the schema for new data landing areas, staging tables, fact tables, dimension tables, or any other structured dataset in the Hive warehouse. Choose between internal and external based on data ownership requirements.

Code Reference

Source Location

Repository: BigData-Notes
File: notes/Hive常用DDL操作.md (Lines 1-453)
File: notes/Hive简介及核心概念.md (Lines 105-113, complex types)

Signature

-- Full CREATE TABLE syntax
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  (col_name data_type [COMMENT 'col_comment'], ...)
  [COMMENT 'table_comment']
  [PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
  [CLUSTERED BY (col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [ROW FORMAT row_format]
  [STORED AS file_format]
  [LOCATION 'hdfs_path']
  [TBLPROPERTIES (key='value', ...)];

-- Row format options
ROW FORMAT DELIMITED
  [FIELDS TERMINATED BY char]
  [COLLECTION ITEMS TERMINATED BY char]
  [MAP KEYS TERMINATED BY char]
  [LINES TERMINATED BY char]
  [NULL DEFINED AS char]

-- Or custom SerDe
ROW FORMAT SERDE 'serde_class_name'
  [WITH SERDEPROPERTIES (key='value', ...)]

-- Create table from query results
CREATE TABLE table_name AS SELECT ...;

-- Clone schema from existing table
CREATE TABLE table_name LIKE existing_table;

Import

N/A (HiveQL statements executed via Hive CLI or Beeline)

I/O Contract

Inputs

Name	Type	Required	Description
table_name	STRING	Yes	Name of the table to create
col_name / data_type	Column definition	Yes	One or more column definitions; supported types include INT, BIGINT, FLOAT, DOUBLE, DECIMAL, STRING, VARCHAR, CHAR, BOOLEAN, TIMESTAMP, DATE, BINARY, ARRAY<type>, MAP<key_type, value_type>, STRUCT<field:type, ...>
EXTERNAL	Keyword	No	When specified, Hive does not manage the data lifecycle; dropping the table leaves the HDFS data intact
ROW FORMAT	SerDe config	No	Specifies how columns are serialized/deserialized; defaults to LazySimpleSerDe with Ctrl-A field delimiter
STORED AS	File format	No	Storage format: TEXTFILE (default), ORC, PARQUET, SEQUENCEFILE, RCFILE, AVRO
LOCATION	STRING (HDFS path)	No	Custom HDFS directory for the table data; defaults to the database directory
TBLPROPERTIES	MAP<STRING, STRING>	No	Key-value metadata for the table (e.g., `'orc.compress'='SNAPPY'`)

Outputs

Name	Type	Description
Table directory	HDFS directory	Created at the specified or default HDFS location to hold data files
Metastore entry	Metadata	Table schema, SerDe, storage format, and properties registered in the Hive metastore

Usage Examples

-- Internal table with delimited text format
CREATE TABLE employees (
    emp_id      INT       COMMENT 'Employee identifier',
    name        STRING    COMMENT 'Full name',
    salary      DECIMAL(10,2),
    hire_date   DATE,
    skills      ARRAY<STRING>,
    address     STRUCT<street:STRING, city:STRING, zip:STRING>
)
COMMENT 'Employee master table'
ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\t'
    COLLECTION ITEMS TERMINATED BY ','
STORED AS TEXTFILE;

-- External table over existing Parquet data
CREATE EXTERNAL TABLE IF NOT EXISTS web_logs (
    request_time TIMESTAMP,
    client_ip    STRING,
    method       STRING,
    url          STRING,
    status_code  INT,
    response_size BIGINT
)
STORED AS PARQUET
LOCATION '/data/raw/web_logs';

-- ORC table with Snappy compression
CREATE TABLE order_facts (
    order_id     BIGINT,
    customer_id  INT,
    product_id   INT,
    quantity     INT,
    unit_price   DECIMAL(8,2),
    order_date   DATE
)
STORED AS ORC
TBLPROPERTIES ('orc.compress'='SNAPPY');

-- Create table from query results (CTAS)
CREATE TABLE top_customers AS
SELECT customer_id, SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 10000;

-- Clone schema without data
CREATE TABLE employees_backup LIKE employees;

Related Pages

Implements Principle

Principle:Heibaiying_BigData_Notes_Hive_Table_Definition

Uses Heuristic

Heuristic:Heibaiying_BigData_Notes_Hive_ORC_Parquet_Storage_Tip

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment