Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Heibaiying BigData Notes Spark SQL View Registration

From Leeroopedia


Knowledge Sources
Domains Data_Analysis, Big_Data
Last Updated 2026-02-10 10:00 GMT

Overview

Concrete tool for registering DataFrames as temporary views and executing SQL queries provided by Apache Spark.

Description

The createOrReplaceTempView() method registers a DataFrame as a named view in Spark's in-memory catalog, and spark.sql() executes a SQL string against that catalog, returning the result as a DataFrame. Together, these two methods enable the full power of declarative SQL within a Spark application.

The BigData-Notes repository demonstrates this pattern in the Structured API guide (registering views from programmatically created DataFrames) and in the aggregation functions guide (using SQL for aggregate computations on registered views).

Usage

Use createOrReplaceTempView() to expose a DataFrame as a SQL-queryable table. Use spark.sql() to run SQL statements against registered views. This is particularly valuable for complex analytical queries where SQL syntax (especially window functions, CTEs, and subqueries) is more expressive than the programmatic API.

Code Reference

Source Location

  • Repository files:
    • notes/Spark_Structured_API的基本使用.md (approximately line 35 for view registration)
    • notes/SparkSQL常用聚合函数.md (approximately line 35 for SQL-based aggregations)
  • External classes: org.apache.spark.sql.SparkSession, org.apache.spark.sql.DataFrame
  • External documentation: createOrReplaceTempView Scaladoc

Signature

// Register a DataFrame as a temporary view
df.createOrReplaceTempView(viewName: String): Unit
df.createTempView(viewName: String): Unit
df.createOrReplaceGlobalTempView(viewName: String): Unit
df.createGlobalTempView(viewName: String): Unit

// Execute a SQL query against registered views
spark.sql(sqlText: String): DataFrame

Import

import org.apache.spark.sql.SparkSession

I/O Contract

Inputs

Name Type Required Description
viewName String Yes The name under which the DataFrame will be registered in the catalog
sqlText String Yes (for spark.sql) A valid Spark SQL query string referencing registered view names

Outputs

Name Type Description
Unit Unit createOrReplaceTempView returns nothing; it registers the view as a side effect
DataFrame org.apache.spark.sql.DataFrame spark.sql() returns a DataFrame containing the query results

Usage Examples

import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder()
  .appName("SQL-View-Examples")
  .master("local[*]")
  .getOrCreate()

import spark.implicits._

// Create sample DataFrames
val empDF = Seq(
  (1, "Alice", 10, 90000),
  (2, "Bob", 20, 75000),
  (3, "Carol", 10, 95000),
  (4, "Dave", 20, 70000),
  (5, "Eve", 10, 88000)
).toDF("empId", "name", "deptId", "salary")

val deptDF = Seq(
  (10, "Engineering"),
  (20, "Marketing"),
  (30, "Sales")
).toDF("deptId", "deptName")

// Register as temporary views
empDF.createOrReplaceTempView("employees")
deptDF.createOrReplaceTempView("departments")

// --- Simple SQL query ---
spark.sql("SELECT * FROM employees WHERE salary > 80000").show()

// --- Join using SQL ---
val joinResult = spark.sql("""
  SELECT e.name, d.deptName, e.salary
  FROM employees e
  JOIN departments d ON e.deptId = d.deptId
  ORDER BY e.salary DESC
""")
joinResult.show()

// --- Aggregation using SQL ---
val aggResult = spark.sql("""
  SELECT d.deptName,
         COUNT(*) AS emp_count,
         AVG(e.salary) AS avg_salary,
         MAX(e.salary) AS max_salary
  FROM employees e
  JOIN departments d ON e.deptId = d.deptId
  GROUP BY d.deptName
""")
aggResult.show()

// --- Window function using SQL ---
val windowResult = spark.sql("""
  SELECT name, salary, deptId,
         RANK() OVER (PARTITION BY deptId ORDER BY salary DESC) AS rank
  FROM employees
""")
windowResult.show()

// --- The result of spark.sql() is a DataFrame, so further API calls work ---
joinResult
  .filter($"salary" > 80000)
  .select("name", "deptName")
  .show()

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment