SQL

This chapter provides information on how you can run SQL queries on a Hazelcast cluster using the Python client.

Hazelcast API

You can use SQL to query data in maps, Kafka topics, or a variety of file systems. Results can be sent directly to the client or inserted into maps or Kafka topics. For streaming queries, you can submit them to a cluster as jobs to run in the background.

Warning

The SQL feature is stabilized in 5.0 versions of the client and the Hazelcast platform. In order for the client and the server to be fully compatible with each other, their major versions must be the same.

Note

In order to use SQL service from the Python client, Jet engine must be enabled on the members and the hazelcast-sql module must be in the classpath of the members.

If you are using the CLI, Docker image, or distributions to start Hazelcast members, then you don’t need to do anything, as the above preconditions are already satisfied for such members.

However, if you are using Hazelcast members in the embedded mode, or receiving errors saying that The Jet engine is disabled or Cannot execute SQL query because "hazelcast-sql" module is not in the classpath. while executing queries, enable the Jet engine following one of the instructions pointed out in the error message, or add the hazelcast-sql module to your member’s classpath.

Supported Queries

Ad-Hoc Queries

Query large datasets either in one or multiple systems and/or run aggregations on them to get deeper insights.

See the Get Started with SQL Over Maps tutorial for reference.

Streaming Queries

Also known as continuous queries, these keep an open connection to a streaming data source and run a continuous query to get near real-time updates.

See the Get Started with SQL Over Kafka tutorial for reference.

Federated Queries

Query different datasets such as Kafka topics and Hazelcast maps, using a single query. Normally, querying in SQL is database or dataset-specific. However, with Mappings, you can pull information from different sources to present a more complete picture.

See the Get Started with SQL Over Files tutorial for reference.

Mappings

To connect to data sources and query them as if they were tables, the SQL service uses a concept called mappings.

Mappings store essential metadata about the source’s data model, data access patterns, and serialization formats so that the SQL service can connect to the data source and query it.

You can create mappings for the following data sources by using the CREATE MAPPING statement:

Querying Map

With SQL you can query the keys and values of maps in your cluster.

Assume that we have a map called employees that contains values of type Employee:

class Employee(Portable):
    def __init__(self, name=None, age=None):
        self.name = name
        self.age = age

    def write_portable(self, writer):
        writer.write_string("name", self.name)
        writer.write_int("age", self.age)

    def read_portable(self, reader):
        self.name = reader.read_string("name")
        self.age = reader.read_int("age")

    def get_factory_id(self):
        return 1

    def get_class_id(self):
        return 2

employees = client.get_map("employees").blocking()

employees.set(1, Employee("John Doe", 33))
employees.set(2, Employee("Jane Doe", 29))

Before starting to query data, we must create a mapping for the employees map. The details of CREATE MAPPING statement is discussed in the reference manual. For the Employee class above, the mapping statement is shown below. It is enough to create the mapping once per map.

client.sql.execute(
    """
CREATE MAPPING employees (
    __key INT,
    name VARCHAR,
    age INT
)
TYPE IMap
OPTIONS (
  'keyFormat' = 'int',
  'valueFormat' = 'portable',
  'valuePortableFactoryId' = '1',
  'valuePortableClassId' = '2'
)
    """
).result()

The following code prints names of the employees whose age is less than 30:

result = client.sql.execute("SELECT name FROM employees WHERE age < 30").result()

for row in result:
    name = row["name"]
    print(name)

The following subsections describe how you can access Hazelcast maps and perform queries on them in more details.

Case Sensitivity

Mapping names and field names are case-sensitive.

For example, you can access an employees map as employees but not as Employees.

Key and Value Objects

A map entry consists of a key and a value. These are accessible through the __key and this aliases. The following query returns the keys and values of all entries in the map:

SELECT __key, this FROM employees

“SELECT *” Queries

You may use the SELECT * FROM <table> syntax to get all the table fields.

The __key and this fields are returned by the SELECT * queries if they do not have nested fields. For the employees map, the following query does not return the this field, because the value has nested fields name and age:

-- Returns __key, name, age
SELECT * FROM employee

Key and Value Fields

You may also access the nested fields of a key or a value. The list of exposed fields depends on the serialization format, as described Querying Maps with SQL section.

Using Query Parameters

You can use query parameters to build safer and faster SQL queries.

A query parameter is a piece of information that you supply to a query before you run it. Parameters can be used by themselves or as part of a larger expression to form a criterion in the query.

age_to_compare = 30
client.sql.execute("SELECT * FROM employees WHERE age > ?", age_to_compare).result()

Instead of putting data straight into an SQL statement, you use the ? placeholder in your client code to indicate that you will replace that placeholder with a parameter.

Query parameters have the following benefits:

Faster execution of similar queries. If you submit more than one query where only a value changes, the SQL service uses the cached query plan from the first query rather than optimizing each query again.
Protection against SQL injection. If you use query parameters, you don’t need to escape special characters in user-provided strings.

Querying JSON Objects

In Hazelcast, the SQL service supports the following ways of working with JSON data:

json: Maps JSON data to a single column of JSON type where you can use JsonPath syntax to query and filter it, including nested levels.
json-flat: Maps JSON top-level fields to columns with non-JSON types where you can query only top-level keys.

json

To query json objects, you should create an explicit mapping using the CREATE MAPPING statement, similar to the example above.

For example, this code snippet creates a mapping to a new map called json_employees, which stores the JSON values as HazelcastJsonValue objects and queries it using nested fields, which is not possible with the json-flat type:

client.sql.execute(
    """
CREATE OR REPLACE MAPPING json_employees
TYPE IMap
OPTIONS (
    'keyFormat' = 'int',
    'valueFormat' = 'json'
)
    """
).result()

json_employees = client.get_map("json_employees").blocking()

json_employees.set(
    1,
    HazelcastJsonValue(
        {
            "personal": {"name": "John Doe"},
            "job": {"salary": 60000},
        }
    ),
)

json_employees.set(
    2,
    HazelcastJsonValue(
        {
            "personal": {"name": "Jane Doe"},
            "job": {"salary": 80000},
        }
    ),
)

with client.sql.execute(
    """
SELECT JSON_VALUE(this, '$.personal.name') AS name
FROM json_employees
WHERE JSON_VALUE(this, '$.job.salary' RETURNING INT) > ?
    """,
    75000,
).result() as result:
    for row in result:
        print(f"Name: {row['name']}")

The json data type comes with full support for querying JSON in maps and Kafka topics.

JSON Functions

Hazelcast supports the following functions, which can retrieve JSON data.

JSON_QUERY : Extracts a JSON value from a JSON document or a JSON-formatted string that matches a given JsonPath expression.
JSON_VALUE : Extracts a primitive value, such as a string, number, or boolean that matches a given JsonPath expression. This function returns NULL if a non-primitive value is matched, unless the ON ERROR behavior is changed.
JSON_ARRAY : Returns a JSON array from a list of input data.
JSON_OBJECT : Returns a JSON object from the given key/value pairs.

json-flat

To query json-flat objects, you should create an explicit mapping using the CREATE MAPPING statement, similar to the example above.

For example, this code snippet creates a mapping to a new map called json_flat_employees, which stores the JSON values with columns name and salary as HazelcastJsonValue objects and queries it using top-level fields:

client.sql.execute(
    """
CREATE OR REPLACE MAPPING json_flat_employees (
    __key INT,
    name VARCHAR,
    salary INT
)
TYPE IMap
OPTIONS (
    'keyFormat' = 'int',
    'valueFormat' = 'json-flat'
)
    """
).result()

json_flat_employees = client.get_map("json_flat_employees").blocking()

json_flat_employees.set(
    1,
    HazelcastJsonValue(
        {
            "name": "John Doe",
            "salary": 60000,
        }
    ),
)

json_flat_employees.set(
    2,
    HazelcastJsonValue(
        {
            "name": "Jane Doe",
            "salary": 80000,
        }
    ),
)

with client.sql.execute(
        """
SELECT name
FROM json_flat_employees
WHERE salary > ?
        """,
        75000,
).result() as result:
    for row in result:
        print(f"Name: {row['name']}")

Note that, in json-flat type, top-level columns must be explicitly specified while creating the mapping.

The json-flat format comes with partial support for querying JSON in maps, Kafka topics, and files.

For more information about working with JSON using SQL see Working with JSON in Hazelcast reference manual.

SQL Statements

Data Manipulation Language(DML) Statements

SELECT: Read data from a table.
SINK INTO/INSERT INTO: Ingest data into a map and/or forward data to other systems.
UPDATE: Overwrite values in map entries.
DELETE: Delete map entries.

Data Definition Language(DDL) Statements

CREATE MAPPING: Map a local or remote data object to a table that Hazelcast can access.
SHOW MAPPINGS: Get the names of existing mappings.
DROP MAPPING: Remove a mapping.

Job Management Statements

CREATE JOB: Create a job that is not tied to the client session.
ALTER JOB: Restart, suspend, or resume a job.
SHOW JOBS: Get the names of all running jobs.
DROP JOB: Cancel a job.
CREATE OR REPLACE SNAPSHOT (Enterprise only): Create a snapshot of a running job, so you can stop and restart it at a later date.
DROP SNAPSHOT (Enterprise only): Cancel a running job.

Data Types

The SQL service supports a set of SQL data types. Every data type is mapped to a Python type that represents the type’s value.

Type Name	Python Type
BOOLEAN	bool
VARCHAR	str
TINYINT	int
SMALLINT	int
INTEGER	int
BIGINT	int
DECIMAL	decimal.Decimal
REAL	float
DOUBLE	float
DATE	datetime.date
TIME	datetime.time
TIMESTAMP	datetime.datetime
TIMESTAMP_WITH_TIME_ZONE	datetime.datetime (with non-None tzinfo)
OBJECT	Any Python type
JSON	HazelcastJsonValue

Functions and Operators

Hazelcast supports logical and IS predicates, comparison and mathematical operators, and aggregate, mathematical, trigonometric, string, table-valued, and special functions.

See the Reference Manual for details.

Improving the Performance of SQL Queries

You can improve the performance of queries over maps by indexing map entries.

To find out more about indexing map entries, see add_index() method.

If you find that your queries lead to out of memory exceptions (OOME), consider decreasing the value of the Jet engine’s max-processor-accumulated-records option.

Limitations

SQL has the following limitations. We plan to remove these limitations in future releases.

You cannot run SQL queries on lite members.
The only supported Hazelcast data structure is map. You cannot query other data structures such as replicated maps.
Limited support for joins. See Join Tables.

DBAPI-2 Interface

hazelcast.db module supports the Python standard DBAPI-2 Specification.

Connection

The connect() function creates a connection to the cluster and returns a Connection object.

from hazelcast.db import connect
conn = connect()

The connect() function connects to the default cluster by default. There are a few ways to pass the connection parameters.

You can use the following keyword arguments:

host: Host part of the cluster address, by default: localhost.
port: Port part of the cluster address, by default: 5701.
cluster_name: Cluster name, by default: dev.
user: Username for the cluster. Requires Hazelcast EE.
password: Password for the cluster. Requires Hazelcast EE.

from hazelcast.db import connect
conn = connect(user="localhost", port=5701)

You can also provide a DSN (Data Source Name) string to configure the connection. The format of the DSN is hz://[user:password]@address1:port1[,address2:port2, ...][?option1=value1[&option2=value2 ...]] The following options are supported:

cluster.name: Hazelcast cluster name.
cloud.token: Cloud discovery token.
smart: Enables smart routing when true. Defaults to Python client default.
ssl: Enables SSL for client connection.
ssl.ca.path: Path to the CA file.
ssl.cert.path: Path to the certificate file.
ssl.key.path: Path to the private key file.
ssl.key.password: Password to the key file.

from hazelcast.db import connect
conn = connect(dsn="hz://admin:ssap@demo.hazelcast.com?cluster.name=demo1")

In case you have to pass some options which are not supported by the methods above, you can also pass a hazelcast.config.Config object as the first argument to connect.

from hazelcast.db import connect
from hazelcast.config import Config
config = Config()
config.compact_serializers = [AddressSerializer()]
conn = connect(config)

Once the connection is created, you can create a hazelcast.db.Cursor object from it to execute queries. This is explained in the next section. Finally, you can close the Connection object to release its resources if you are done with it.

conn.close()

You can use a with statement to automatically close a Connection.

from hazelcast.db import connect
with connect() as conn:
    # use conn in this block
# conn is automatically closed here

Cursors

The first step of executing a query is, getting a hazelcast.db.Cursor from the connection.

cursor = conn.cursor()

Then, you can execute a SQL query using the hazelcast.db.Cursor.execute() method. You can use this method to run all kinds of queries.

cursor.execute("SELECT * FROM stocks ORDER BY price")

Use the question mark (?) as a placeholder if you are passing arguments in the query. The actual arguments should be passed in a tuple.

cursor.execute("SELECT * FROM stocks WHERE price > ? ORDER BY price", (50,))

hazelcast.db.Cursor.executemany() is also available, which enables running the same query with different kinds of value sets. This method should only be used my mutating queries, such as INSERT.

data = [
    (1, "2006-03-28", "BUY", "IBM", 1000, 45.0),
    (2, "2006-04-05", "BUY", "MSFT", 1000, 72.0),
    (3, "2006-04-06", "SELL", "IBM", 500, 53.0),
]
cursor.executemany("INSERT INTO stocks VALUES(?, CAST(? AS DATE), ?, ?, ?, ?)", data)

Mutating Queries

Mutating queries such as UPDATE, DELETE and INSERT updates, deletes data or adds new rows. You can use execue or executemany for those queries.

cursor.execute("INSERT INTO stocks(__key, price) VALUES(10, 40)")

Row Returning Queries

Queries such as SELECT and SHOW return rows. Once you run execute with the query, call one of hazelcast.db.Cursor.fetchone(), hazelcast.db.Cursor.fetchmany() or hazelcast.db.Cursor.fetchall() to get one, some or all rows in the result. The rows are of the hazelcast.sql.SqlRow type. Note that, fetchall should only be used for small, finite set of rows.

cursor.execute("SELECT * FROM stocks")
one_row = cursor.fetchone()
three_more_rows = cursor.fetchmany(3)
rest_of_rows = cursor.fetchall()

Alternatively, you can iterate on the cursor itself.

cursor.execute("SELECT * FROM stocks")
for row in cursor:
    # handle the row

You can access columns in a hazelcast.sql.SqlRow by using the subscription notation, treating the row as a dictionary.

for row in cursor:
    print(row["__key"], row["symbol"], row["price"])

Alternatively, you can treat the row as an array and use indexes to access column values.

for row in cursor:
    print(row[0], row[1], row[2])

Once you are done with the cursor, you can use its hazelcast.db.Cursor.close() method to release its resources.

cursor.close()

Using the with statement, close is called automatically:

with conn.cursor() as cursor:
    cursor.execute("SELECT * FROM stocks")
    for row in cursor:
        # handle the row
# cursor is automatically closed here.