Skip to content

Latest commit

 

History

History
418 lines (315 loc) · 11.7 KB

File metadata and controls

418 lines (315 loc) · 11.7 KB

IRIS Vector Graph Python SDK (v1.47.0)

This document is outdated (last updated at v1.47.0, current release is v1.84.0). The README.md changelog is the authoritative reference for new API surface from v1.48.0 onward. Key additions since v1.47: bulk_ingest_edges, rebuild_nkg, rebuild_kg, search_nodes_by_vector, status(), bulk_create_edges_temporal, export/import_temporal_edges_ndjson, bm25_*, plaid_*, ivf_* (with build_batch_size), ivf_insert/ivf_delete, KHopCount/KHop2Count fast paths, create_node(graph=), get_labels, get_relationship_types, the full introspection API, and engine.index(name)IndexHandle — unified entry point for all index types (v1.84.0).

This guide documents the Python API surface for iris-vector-graph v1.47.0.


1) Install

pip install iris-vector-graph
pip install iris-vector-graph[full]
pip install iris-vector-graph[plaid]
  • Core (iris-vector-graph): minimal install, includes intersystems-irispython.
  • [full]: adds FastAPI/GraphQL and common data tooling.
  • [plaid]: adds numpy + scikit-learn for plaid_build().

2) Connection

import iris

# External (host process)
conn = iris.connect(
    hostname="localhost",
    port=1972,
    namespace="USER",
    username="_SYSTEM",
    password="SYS",
)

# Embedded (inside IRIS Language=python method)
from iris_vector_graph.embedded import EmbeddedConnection
conn = EmbeddedConnection()

3) IRISGraphEngine

from iris_vector_graph.engine import IRISGraphEngine

engine = IRISGraphEngine(
    conn,
    embedding_dimension=768,   # required before initialize_schema()
)
engine.initialize_schema()     # idempotent — safe to call on every startup

Core graph methods

engine.create_node("node:1", labels=["Entity"], properties={"name": "Example"})
engine.create_edge("node:1", "RELATED_TO", "node:2", qualifiers={"confidence": 0.9})

node  = engine.get_node("node:1")       # dict | None
nodes = engine.get_nodes(["node:1"])    # list[dict]
ok    = engine.delete_node("node:2")    # bool

Bulk ingest

engine.bulk_create_nodes([
    {"id": "n:1", "labels": ["Gene"], "properties": {"name": "BRCA1"}},
    {"id": "n:2", "labels": ["Drug"], "properties": {"name": "Olaparib"}},
])
engine.bulk_create_edges([
    {"source": "n:1", "predicate": "TARGETS", "target": "n:2"},
])

Cypher

result = engine.execute_cypher(
    "MATCH (n:Gene)-[:TARGETS]->(d:Drug) RETURN n.id, d.id LIMIT 10"
)
# result = {"columns": [...], "rows": [...], "sql": "...", "params": [...]}

4) BM25Index API

Pure ObjectScript Okapi BM25 lexical search over ^BM25Idx globals. No SQL tables, no Enterprise license required.

# Build index over all nodes using their 'name' and 'definition' properties
result = engine.bm25_build("ncit", text_props=["name", "definition"], k1=1.5, b=0.75)
# {"indexed": 200000, "avgdl": 8.4, "vocab_size": 45000}

# Search — returns list of (node_id, score) tuples, sorted by score DESC
hits = engine.bm25_search("ncit", "ankylosing spondylitis HLA-B27", k=10)
# [("NCIT:C34718", 8.4), ("NCIT:C2890", 6.1), ...]

# Incremental add / update
engine.bm25_insert("ncit", "NCIT:C99999", "new concept text here")

# Metadata
info = engine.bm25_info("ncit")
# {"N": 200001, "avgdl": 8.4, "vocab_size": 45001}  |  {} if not found

# Remove
engine.bm25_drop("ncit")

kg_TXT automatic upgrade

If a "default" BM25 index exists, IRISGraphOperators.kg_TXT() routes through it automatically instead of using the LIKE fallback:

engine.bm25_build("default", text_props=["name"])   # one-time setup

from iris_vector_graph.operators import IRISGraphOperators
ops = IRISGraphOperators(conn)
ops.graph_engine = engine
hits = ops.kg_TXT("diabetes", k=20)   # uses BM25 automatically

Cypher procedure

engine.execute_cypher(
    "CALL ivg.bm25.search('ncit', $q, 10) YIELD node, score RETURN node, score",
    {"q": "ankylosing spondylitis"}
)

API summary

  • bm25_build(name, text_props, k1=1.5, b=0.75) -> dict{"indexed", "avgdl", "vocab_size"}
  • bm25_search(name, query, k=10) -> list[tuple[str, float]]
  • bm25_insert(name, doc_id, text) -> bool
  • bm25_drop(name) -> None
  • bm25_info(name) -> dict{} if index not found

5) VecIndex API

Lightweight ANN index backed by ObjectScript globals (Graph.KG.VecIndex).

engine.vec_create_index("my_idx", dim=384, metric="cosine", num_trees=4, leaf_size=50)
engine.vec_insert("my_idx", "doc:1", [0.1, 0.2, 0.3])
engine.vec_bulk_insert("my_idx", [
    {"id": "doc:2", "embedding": [0.2, 0.3, 0.4]},
])
engine.vec_build("my_idx")

hits = engine.vec_search("my_idx", [0.1, 0.2, 0.3], k=5, nprobe=8)
multi_hits = engine.vec_search_multi("my_idx", [[0.1, 0.2, 0.3]], k=5)

info = engine.vec_info("my_idx")
engine.vec_expand("my_idx", seed_id="doc:1", k=5)
engine.vec_drop("my_idx")

API summary

  • vec_create_index(name, dim, metric="cosine", num_trees=4, leaf_size=50) -> dict
  • vec_insert(index_name, doc_id, embedding) -> None
  • vec_bulk_insert(index_name, items) -> int
  • vec_build(index_name) -> dict
  • vec_search(index_name, query_embedding, k=10, nprobe=8) -> list
  • vec_search_multi(index_name, query_embeddings, k=10, nprobe=8) -> list
  • vec_info(index_name) -> dict
  • vec_drop(index_name) -> None
  • vec_expand(index_name, seed_id, k=5) -> list

6) Generic Vector Search (v1.45.0+)

Search any IRIS VECTOR column, not just kg_NodeEmbeddings:

# Search a specific table/column
hits = engine.vector_search(
    table="MySchema.DocChunk",
    vector_col="VectorChunk",
    query_embedding=[0.1, 0.2, ...],
    top_k=10,
)

# Fused search across multiple vector tables
hits = engine.multi_vector_search(
    sources=[
        {"table": "Graph_KG.kg_NodeEmbeddings", "vector_col": "emb"},
        {"table": "RAG.SourceDocuments",        "vector_col": "embedding"},
    ],
    query_embedding=[0.1, 0.2, ...],
    top_k=10,
    fusion="rrf",
)

# Node embedding
engine.embed_nodes(
    model="sentence-transformers/all-MiniLM-L6-v2",
    text_fn=lambda node: node.get("name", ""),
    where="label = 'Gene'",
    batch_size=64,
)

7) PLAID API

ColBERT-style multi-vector retrieval (Graph.KG.PLAIDSearch).

docs = [
    {"id": "doc:1", "tokens": [[0.1, 0.2], [0.3, 0.4]]},
    {"id": "doc:2", "tokens": [[0.2, 0.1], [0.4, 0.3]]},
]
engine.plaid_build("plaid_idx", docs, n_clusters=None, dim=2)
results = engine.plaid_search("plaid_idx", query_tokens=[[0.1, 0.2]], k=10, nprobe=4)
engine.plaid_insert("plaid_idx", "doc:3", token_embeddings=[[0.5, 0.6]])
info = engine.plaid_info("plaid_idx")
engine.plaid_drop("plaid_idx")

plaid_build() requires [plaid] install extras (numpy, scikit-learn).


8) Temporal Graph API

import time

# Single edge
engine.create_edge_temporal(
    source="svc:auth", predicate="CALLS_AT", target="svc:payment",
    timestamp=int(time.time()), weight=42.7,
    attrs={"trace_id": "abc123"},
)

# Bulk ingest (~134K edges/sec)
engine.bulk_create_edges_temporal([
    {"s": "svc:auth", "p": "CALLS_AT", "o": "svc:pay", "ts": 1712000000, "w": 38.1},
])

# Window query
edges = engine.get_edges_in_window("svc:auth", "CALLS_AT", ts_start, ts_end)

# Pre-aggregated analytics (O(1))
avg  = engine.get_temporal_aggregate("svc:auth", "CALLS_AT", "avg", ts_start, ts_end)
groups = engine.get_bucket_groups("CALLS_AT", ts_start, ts_end)

9) SQL Table Bridge (v1.44.0+)

Map existing IRIS SQL tables as virtual graph nodes — zero data copy:

engine.map_sql_table("MySchema.Patient", id_column="PatientID", label="Patient")
engine.map_sql_relationship("Patient", "HAS_ENCOUNTER", "Encounter", target_fk="PatientID")

# Now queryable via Cypher
engine.execute_cypher("MATCH (p:Patient)-[:HAS_ENCOUNTER]->(e:Encounter) RETURN p.id LIMIT 5")

10) IRISGraphOperators

from iris_vector_graph.operators import IRISGraphOperators

ops = IRISGraphOperators(conn)
# Text search — uses BM25 if 'default' index exists, LIKE fallback otherwise
hits = ops.kg_TXT("ankylosing spondylitis", k=20)

# Vector + neighborhood
vec_hits  = ops.kg_KNN_VEC('[0.1, 0.2, 0.3]', k=10)
neighbors = ops.kg_NEIGHBORS(["doc:1"], predicate="MENTIONS", direction="out")

# PageRank
global_pr = ops.kg_PAGERANK(damping=0.85, max_iterations=20)
ppr       = ops.kg_PAGERANK(seed_entities=["doc:1"], damping=0.85)

# Graph kernels
wcc     = ops.kg_WCC(max_iterations=100)
cdlp    = ops.kg_CDLP(max_iterations=10)
sub     = ops.kg_SUBGRAPH(seed_ids=["doc:1"], k_hops=2)
guided  = ops.kg_PPR_GUIDED_SUBGRAPH(seed_ids=["doc:1"], top_k=50, max_hops=5)

11) Cypher ivg Procedures

Procedure Signature YIELD
ivg.vector.search (label, property, query_input, limit) node, score
ivg.neighbors (sources, predicate, direction) neighbor
ivg.ppr (seeds, alpha, max_iter) node, score
ivg.bm25.search (name, query, k) node, score
# BM25 lexical search in Cypher
engine.execute_cypher(
    "CALL ivg.bm25.search('ncit', $q, 10) YIELD node, score "
    "RETURN node, score ORDER BY score DESC",
    {"q": "HLA-B27 spondylitis"}
)

# Vector search in Cypher
engine.execute_cypher(
    "CALL ivg.vector.search('Gene', 'embedding', [0.1, 0.2], 5) "
    "YIELD node, score RETURN node, score"
)

# PPR in Cypher
engine.execute_cypher(
    "CALL ivg.ppr(['svc:auth'], 0.85, 20) YIELD node, score RETURN node, score"
)

12) Arno Acceleration

khop_result = engine.khop(seed="node:1", hops=2, max_nodes=500)
ppr_result  = engine.ppr(seed="node:1", alpha=0.85, max_iter=20, top_k=20)
walks       = engine.random_walk(seed="node:1", length=20, num_walks=10)

Falls back transparently if Graph.KG.NKGAccel is unavailable.


13) Bolt Server + Neo4j Browser

Start the combined HTTP + Bolt server:

IRIS_HOST=localhost IRIS_PORT=1972 IRIS_NAMESPACE=USER \
IRIS_USERNAME=_SYSTEM IRIS_PASSWORD=SYS \
python3 -m uvicorn iris_vector_graph.cypher_api:app --port 8000

This starts:

  • HTTP API on port 8000 (/api/cypher, /db/neo4j/tx/commit)
  • Neo4j Browser at http://localhost:8000/browser/
  • Bolt WebSocket on port 8000 (Neo4j Browser connects via bolt://localhost:8000)
  • Bolt TCP on port 7687 (Python driver, LangChain)

Neo4j Python Driver

from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("", ""))
with driver.session() as s:
    result = s.run("MATCH (n) RETURN count(n) AS c")
    print(result.single()["c"])
driver.close()

LangChain Neo4jGraph

from langchain_community.graphs import Neo4jGraph
graph = Neo4jGraph(url="bolt://localhost:7687", username="", password="")
graph.query("MATCH (n) RETURN count(n) AS c")

System Procedures (via Bolt or HTTP)

with driver.session() as s:
    s.run("CALL db.labels()").data()
    s.run("CALL db.relationshipTypes()").data()
    s.run("CALL db.schema.visualization()").data()
    s.run("CALL dbms.queryJmx('org.neo4j:*')").data()

Environment Variables

Variable Default Description
IRIS_HOST (required) IRIS hostname
IRIS_PORT 1972 IRIS SuperServer port
IRIS_NAMESPACE USER IRIS namespace
IRIS_USERNAME _SYSTEM IRIS username
IRIS_PASSWORD SYS IRIS password
IVG_API_KEY (empty=no auth) API key for /api/* and /db/* HTTP routes
BOLT_TCP_PORT 7687 TCP Bolt listener port

Related docs

  • Architecture: docs/architecture/ARCHITECTURE.md
  • Schema reference: docs/architecture/ACTUAL_SCHEMA.md
  • Performance benchmarks: docs/performance/BENCHMARKS.md
  • Testing policy: docs/TESTING_POLICY.md