Implementation:Heibaiying BigData Notes Hive Create Table
| Knowledge Sources | |
|---|---|
| Domains | Data_Warehouse, Big_Data |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
Concrete tool for creating internal (managed) and external tables with specified column schemas, row formats, and storage formats provided by Apache Hive.
Description
The CREATE TABLE statement in HiveQL defines a new table in the Hive metastore along with its column definitions, data types, serialization format, and physical storage format. The EXTERNAL keyword controls whether Hive manages the lifecycle of the underlying HDFS data. The statement supports Hive's full range of primitive and complex data types, multiple SerDe configurations, and various columnar and row-based storage formats.
The CREATE TABLE ... AS SELECT (CTAS) variant creates a table and populates it with the results of a query in a single atomic operation. The CREATE TABLE ... LIKE variant clones the schema of an existing table without copying data.
Usage
Use CREATE TABLE when defining the schema for new data landing areas, staging tables, fact tables, dimension tables, or any other structured dataset in the Hive warehouse. Choose between internal and external based on data ownership requirements.
Code Reference
Source Location
- Repository: BigData-Notes
- File:
notes/Hive常用DDL操作.md(Lines 1-453) - File:
notes/Hive简介及核心概念.md(Lines 105-113, complex types)
Signature
-- Full CREATE TABLE syntax
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
(col_name data_type [COMMENT 'col_comment'], ...)
[COMMENT 'table_comment']
[PARTITIONED BY (col_name data_type [COMMENT 'col_comment'], ...)]
[CLUSTERED BY (col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION 'hdfs_path']
[TBLPROPERTIES (key='value', ...)];
-- Row format options
ROW FORMAT DELIMITED
[FIELDS TERMINATED BY char]
[COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char]
[LINES TERMINATED BY char]
[NULL DEFINED AS char]
-- Or custom SerDe
ROW FORMAT SERDE 'serde_class_name'
[WITH SERDEPROPERTIES (key='value', ...)]
-- Create table from query results
CREATE TABLE table_name AS SELECT ...;
-- Clone schema from existing table
CREATE TABLE table_name LIKE existing_table;
Import
N/A (HiveQL statements executed via Hive CLI or Beeline)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| table_name | STRING | Yes | Name of the table to create |
| col_name / data_type | Column definition | Yes | One or more column definitions; supported types include INT, BIGINT, FLOAT, DOUBLE, DECIMAL, STRING, VARCHAR, CHAR, BOOLEAN, TIMESTAMP, DATE, BINARY, ARRAY<type>, MAP<key_type, value_type>, STRUCT<field:type, ...> |
| EXTERNAL | Keyword | No | When specified, Hive does not manage the data lifecycle; dropping the table leaves the HDFS data intact |
| ROW FORMAT | SerDe config | No | Specifies how columns are serialized/deserialized; defaults to LazySimpleSerDe with Ctrl-A field delimiter |
| STORED AS | File format | No | Storage format: TEXTFILE (default), ORC, PARQUET, SEQUENCEFILE, RCFILE, AVRO |
| LOCATION | STRING (HDFS path) | No | Custom HDFS directory for the table data; defaults to the database directory |
| TBLPROPERTIES | MAP<STRING, STRING> | No | Key-value metadata for the table (e.g., 'orc.compress'='SNAPPY')
|
Outputs
| Name | Type | Description |
|---|---|---|
| Table directory | HDFS directory | Created at the specified or default HDFS location to hold data files |
| Metastore entry | Metadata | Table schema, SerDe, storage format, and properties registered in the Hive metastore |
Usage Examples
-- Internal table with delimited text format
CREATE TABLE employees (
emp_id INT COMMENT 'Employee identifier',
name STRING COMMENT 'Full name',
salary DECIMAL(10,2),
hire_date DATE,
skills ARRAY<STRING>,
address STRUCT<street:STRING, city:STRING, zip:STRING>
)
COMMENT 'Employee master table'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
STORED AS TEXTFILE;
-- External table over existing Parquet data
CREATE EXTERNAL TABLE IF NOT EXISTS web_logs (
request_time TIMESTAMP,
client_ip STRING,
method STRING,
url STRING,
status_code INT,
response_size BIGINT
)
STORED AS PARQUET
LOCATION '/data/raw/web_logs';
-- ORC table with Snappy compression
CREATE TABLE order_facts (
order_id BIGINT,
customer_id INT,
product_id INT,
quantity INT,
unit_price DECIMAL(8,2),
order_date DATE
)
STORED AS ORC
TBLPROPERTIES ('orc.compress'='SNAPPY');
-- Create table from query results (CTAS)
CREATE TABLE top_customers AS
SELECT customer_id, SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 10000;
-- Clone schema without data
CREATE TABLE employees_backup LIKE employees;