Implementation:Heibaiying BigData Notes Spark SQL View Registration

Knowledge Sources	BigData-Notes Spark SQL API
Domains	Data_Analysis, Big_Data
Last Updated	2026-02-10 10:00 GMT

Overview

Concrete tool for registering DataFrames as temporary views and executing SQL queries provided by Apache Spark.

Description

The createOrReplaceTempView() method registers a DataFrame as a named view in Spark's in-memory catalog, and spark.sql() executes a SQL string against that catalog, returning the result as a DataFrame. Together, these two methods enable the full power of declarative SQL within a Spark application.

The BigData-Notes repository demonstrates this pattern in the Structured API guide (registering views from programmatically created DataFrames) and in the aggregation functions guide (using SQL for aggregate computations on registered views).

Usage

Use createOrReplaceTempView() to expose a DataFrame as a SQL-queryable table. Use spark.sql() to run SQL statements against registered views. This is particularly valuable for complex analytical queries where SQL syntax (especially window functions, CTEs, and subqueries) is more expressive than the programmatic API.

Code Reference

Source Location

Repository files:
- notes/Spark_Structured_API的基本使用.md (approximately line 35 for view registration)
- notes/SparkSQL常用聚合函数.md (approximately line 35 for SQL-based aggregations)
External classes: org.apache.spark.sql.SparkSession, org.apache.spark.sql.DataFrame
External documentation: createOrReplaceTempView Scaladoc

Signature

// Register a DataFrame as a temporary view
df.createOrReplaceTempView(viewName: String): Unit
df.createTempView(viewName: String): Unit
df.createOrReplaceGlobalTempView(viewName: String): Unit
df.createGlobalTempView(viewName: String): Unit

// Execute a SQL query against registered views
spark.sql(sqlText: String): DataFrame

Import

import org.apache.spark.sql.SparkSession

I/O Contract

Inputs

Name	Type	Required	Description
viewName	String	Yes	The name under which the DataFrame will be registered in the catalog
sqlText	String	Yes (for spark.sql)	A valid Spark SQL query string referencing registered view names

Outputs

Name	Type	Description
Unit	Unit	createOrReplaceTempView returns nothing; it registers the view as a side effect
DataFrame	org.apache.spark.sql.DataFrame	spark.sql() returns a DataFrame containing the query results

Usage Examples

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
  .appName("SQL-View-Examples")
  .master("local[*]")
  .getOrCreate()

import spark.implicits._

// Create sample DataFrames
val empDF = Seq(
  (1, "Alice", 10, 90000),
  (2, "Bob", 20, 75000),
  (3, "Carol", 10, 95000),
  (4, "Dave", 20, 70000),
  (5, "Eve", 10, 88000)
).toDF("empId", "name", "deptId", "salary")

val deptDF = Seq(
  (10, "Engineering"),
  (20, "Marketing"),
  (30, "Sales")
).toDF("deptId", "deptName")

// Register as temporary views
empDF.createOrReplaceTempView("employees")
deptDF.createOrReplaceTempView("departments")

// --- Simple SQL query ---
spark.sql("SELECT * FROM employees WHERE salary > 80000").show()

// --- Join using SQL ---
val joinResult = spark.sql("""
  SELECT e.name, d.deptName, e.salary
  FROM employees e
  JOIN departments d ON e.deptId = d.deptId
  ORDER BY e.salary DESC
""")
joinResult.show()

// --- Aggregation using SQL ---
val aggResult = spark.sql("""
  SELECT d.deptName,
         COUNT(*) AS emp_count,
         AVG(e.salary) AS avg_salary,
         MAX(e.salary) AS max_salary
  FROM employees e
  JOIN departments d ON e.deptId = d.deptId
  GROUP BY d.deptName
""")
aggResult.show()

// --- Window function using SQL ---
val windowResult = spark.sql("""
  SELECT name, salary, deptId,
         RANK() OVER (PARTITION BY deptId ORDER BY salary DESC) AS rank
  FROM employees
""")
windowResult.show()

// --- The result of spark.sql() is a DataFrame, so further API calls work ---
joinResult
  .filter($"salary" > 80000)
  .select("name", "deptName")
  .show()

Related Pages

Implements Principle

Principle:Heibaiying_BigData_Notes_Spark_TempView_and_SQL

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment