Skip to content

Advanced Cypher Query Patterns for Apache AGE

This document contains advanced Cypher query patterns for complex graph exploration beyond basic vector search.

Table of Contents

  1. Fuzzy Label Matching
  2. Path Finding with Scoring
  3. Regex-Based Concept Matching
  4. Weighted Path Analysis

Fuzzy Label Matching

When you need flexible matching instead of exact labels, use WHERE clauses with various string operators.

Basic Exact Match (for reference)

MATCH (c:Concept {label: "agile"})
OPTIONAL MATCH evidence = (c)-[:EVIDENCED_BY]->(i:Instance)-[:FROM_SOURCE]->(s:Source)
OPTIONAL MATCH concepts = (c)-[r]-(c2:Concept)
RETURN c, evidence, concepts
LIMIT 30

1. CONTAINS - Most Common for Fuzzy Matching

Finds concepts where label contains the search term anywhere.

MATCH (c:Concept)
WHERE c.label CONTAINS "agile"
OPTIONAL MATCH evidence = (c)-[:EVIDENCED_BY]->(i:Instance)-[:FROM_SOURCE]->(s:Source)
OPTIONAL MATCH concepts = (c)-[r]-(c2:Concept)
RETURN c, evidence, concepts
LIMIT 30

Matches: "agile", "agile methodology", "being agile", "Agile Manifesto"

Most robust for user queries - handles any case variation.

MATCH (c:Concept)
WHERE toLower(c.label) CONTAINS toLower("agile")
OPTIONAL MATCH evidence = (c)-[:EVIDENCED_BY]->(i:Instance)-[:FROM_SOURCE]->(s:Source)
OPTIONAL MATCH concepts = (c)-[r]-(c2:Concept)
RETURN c, evidence, concepts
LIMIT 30

Matches: "Agile", "AGILE", "agile methodology", "Being Agile"

3. STARTS WITH / ENDS WITH

For prefix or suffix matching.

MATCH (c:Concept)
WHERE c.label STARTS WITH "agile"
-- or WHERE c.label ENDS WITH "agile"
OPTIONAL MATCH evidence = (c)-[:EVIDENCED_BY]->(i:Instance)-[:FROM_SOURCE]->(s:Source)
OPTIONAL MATCH concepts = (c)-[r]-(c2:Concept)
RETURN c, evidence, concepts
LIMIT 30

STARTS WITH matches: "agile", "agile methodology", "agile practices" ENDS WITH matches: "being agile", "truly agile"

4. Regular Expression - Most Flexible

Full pattern matching with regex support.

MATCH (c:Concept)
WHERE c.label =~ "(?i).*agile.*"
-- (?i) makes it case-insensitive
-- .* matches any characters before/after
OPTIONAL MATCH evidence = (c)-[:EVIDENCED_BY]->(i:Instance)-[:FROM_SOURCE]->(s:Source)
OPTIONAL MATCH concepts = (c)-[r]-(c2:Concept)
RETURN c, evidence, concepts
LIMIT 30

Regex Patterns: - (?i) - Case insensitive flag - .* - Match any characters (zero or more) - (agile|scrum) - Match alternatives - ^agile - Must start with "agile" - agile$ - Must end with "agile"

5. Multiple Conditions with OR

Search for multiple related terms at once.

MATCH (c:Concept)
WHERE c.label CONTAINS "agile"
   OR c.label CONTAINS "scrum"
   OR c.label =~ "(?i).*kanban.*"
OPTIONAL MATCH evidence = (c)-[:EVIDENCED_BY]->(i:Instance)-[:FROM_SOURCE]->(s:Source)
OPTIONAL MATCH concepts = (c)-[r]-(c2:Concept)
RETURN c, evidence, concepts
LIMIT 30

Path Finding with Scoring

Find connections between concepts with various scoring strategies.

⚠️ Apache AGE Limitation: AGE does NOT support Neo4j's shortestPath() function. See GitHub Issue #2162 and ADR-076. Use variable-length patterns with caution - they enumerate ALL paths and have exponential complexity.

1. All Paths with Length Limit (Use with Caution)

More comprehensive - finds multiple paths up to specified depth.

MATCH (start:Concept), (end:Concept)
WHERE start.label =~ "(?i).*agile.*"
  AND end.label =~ "(?i).*human.*"
MATCH path = (start)-[*1..5]-(end)
WITH path, length(path) as pathLength,
     [n in nodes(path) | n.label] as nodeLabels
RETURN nodeLabels, pathLength
ORDER BY pathLength ASC
LIMIT 20

Parameters: - [*1..5] - Paths between 1 and 5 hops - Adjust LIMIT to control result count

3. Weighted Path (If Relationships Have Scores)

Accumulate relationship weights for path scoring.

MATCH (start:Concept), (end:Concept)
WHERE start.label =~ "(?i).*agile.*"
  AND end.label =~ "(?i).*human.*"
MATCH path = (start)-[rels*1..5]-(end)
WITH path,
     reduce(score = 0, r in rels | score + coalesce(r.weight, 1)) as totalScore,
     length(path) as pathLength
RETURN [n in nodes(path) | n.label] as concepts,
       [r in relationships(path) | type(r)] as relationships,
       totalScore,
       pathLength
ORDER BY totalScore DESC, pathLength ASC
LIMIT 10

Use when: Your graph has weight or confidence properties on relationships.

4. Multiple Start/End Concepts with Bounded Path

Search across multiple concept variations simultaneously.

Note: This uses variable-length patterns, not shortestPath() (which AGE doesn't support). Keep max_hops ≤ 4 to avoid exponential blowup.

MATCH (start:Concept), (end:Concept)
WHERE start.label =~ "(?i).*(agile|scrum|lean).*"
  AND end.label =~ "(?i).*(human|person|people).*"
MATCH path = (start)-[*1..4]-(end)
WITH start, end, path, length(path) as pathLength
RETURN start.label as fromConcept,
       end.label as toConcept,
       [n in nodes(path) | n.label] as pathConcepts,
       pathLength
ORDER BY pathLength ASC
LIMIT 15

Matches: - Start: "agile", "scrum", "lean", "lean agile", "scrum methodology" - End: "human", "person", "people", "human nature"

5. Detailed Path with Relationship Types

See what kinds of relationships connect concepts.

MATCH (start:Concept), (end:Concept)
WHERE start.label =~ "(?i).*agile.*"
  AND end.label =~ "(?i).*human.*"
MATCH path = (start)-[*1..5]-(end)
RETURN [n in nodes(path) | n.label] as concepts,
       [r in relationships(path) | type(r)] as relationshipTypes,
       length(path) as hops
ORDER BY hops ASC
LIMIT 10

Output includes: Concept labels AND relationship types (SUPPORTS, IMPLIES, etc.)


Regex-Based Concept Matching

Key patterns for flexible concept discovery.

Case-Insensitive Anywhere Match

WHERE c.label =~ "(?i).*pattern.*"

Starts With Pattern

WHERE c.label =~ "(?i)^pattern.*"

Ends With Pattern

WHERE c.label =~ "(?i).*pattern$"

Exact Match (Case Insensitive)

WHERE c.label =~ "(?i)^pattern$"

Multiple Alternatives

WHERE c.label =~ "(?i).*(pattern1|pattern2|pattern3).*"

Complex Patterns

-- Match "agile" followed by optional space and any word
WHERE c.label =~ "(?i)agile\\s*\\w+"

-- Match concepts containing numbers
WHERE c.label =~ ".*\\d+.*"

-- Match concepts that are 2-3 words
WHERE c.label =~ "^\\w+\\s+\\w+(\\s+\\w+)?$"

Weighted Path Analysis

Custom scoring based on domain-specific criteria.

Example: Score Paths by Relevant Intermediate Concepts

MATCH (start:Concept), (end:Concept)
WHERE start.label =~ "(?i).*agile.*"
  AND end.label =~ "(?i).*human.*"
MATCH path = (start)-[*1..5]-(end)
WITH path,
     length(path) as pathLength,
     size([n in nodes(path) WHERE n.label CONTAINS "team" OR n.label CONTAINS "collaboration"]) as relevantNodes
WITH path, pathLength, relevantNodes,
     (relevantNodes * 10 - pathLength) as customScore
RETURN [n in nodes(path) | n.label] as concepts,
       pathLength,
       relevantNodes,
       customScore
ORDER BY customScore DESC
LIMIT 10

Scoring Logic: - +10 points for each "team" or "collaboration" concept in path - -1 point for each hop (shorter paths preferred) - Higher score = more relevant path

Example: Relationship Type Preferences

MATCH (start:Concept), (end:Concept)
WHERE start.label =~ "(?i).*agile.*"
  AND end.label =~ "(?i).*human.*"
MATCH path = (start)-[rels*1..5]-(end)
WITH path,
     length(path) as pathLength,
     size([r in rels WHERE type(r) = 'SUPPORTS']) as supportsCount,
     size([r in rels WHERE type(r) = 'IMPLIES']) as impliesCount
WITH path, pathLength, supportsCount, impliesCount,
     (supportsCount * 5 + impliesCount * 3 - pathLength) as customScore
RETURN [n in nodes(path) | n.label] as concepts,
       [r in relationships(path) | type(r)] as relTypes,
       pathLength,
       supportsCount,
       impliesCount,
       customScore
ORDER BY customScore DESC
LIMIT 10

Scoring Logic: - +5 points for each SUPPORTS relationship - +3 points for each IMPLIES relationship - -1 point for each hop - Prioritizes strong conceptual connections


Usage with AGE

Important Notes for Apache AGE

  1. Wrap all Cypher in SELECT:

    SELECT * FROM cypher('knowledge_graph', $$
        MATCH (c:Concept)
        WHERE c.label =~ "(?i).*pattern.*"
        RETURN c
    $$) as (c agtype);
    

  2. ORDER BY Restrictions:

  3. AGE doesn't support ORDER BY with path variables directly
  4. Use WITH clause to extract sortable values first:

    MATCH path = (start)-[*1..4]-(end)
    WITH path, length(path) as len
    RETURN path, len
    ORDER BY len  -- Now works!
    

  5. No shortestPath() Function:

  6. AGE does NOT support Neo4j's shortestPath() or SHORTEST keyword
  7. See GitHub Issue #2162
  8. Use application-level BFS for optimal pathfinding (see ADR-076)

  9. Variable-Length Path Performance:

  10. Patterns like -[*1..N]- enumerate ALL paths (exponential complexity)
  11. Keep max hops ≤ 4 for interactive queries
  12. See GitHub Issue #195 for benchmarks

  13. Result Type Casting:

  14. AGE returns agtype values
  15. Parse with _parse_agtype() in Python
  16. Extract column names with _extract_column_spec()

Integration Examples

Add Fuzzy Search to API

def fuzzy_search_concepts(
    self,
    search_term: str,
    match_type: str = "contains",  # contains, starts_with, regex
    limit: int = 10
) -> List[Dict[str, Any]]:
    """Search concepts with fuzzy label matching."""

    if match_type == "contains":
        where_clause = f"WHERE toLower(c.label) CONTAINS toLower('{search_term}')"
    elif match_type == "starts_with":
        where_clause = f"WHERE c.label STARTS WITH '{search_term}'"
    elif match_type == "regex":
        where_clause = f"WHERE c.label =~ '(?i){search_term}'"
    else:
        raise ValueError(f"Invalid match_type: {match_type}")

    query = f"""
    MATCH (c:Concept)
    {where_clause}
    RETURN c.concept_id as concept_id, c.label as label
    LIMIT {limit}
    """

    return self._execute_cypher(query)

Add Scored Path Finding

def find_scored_path(
    self,
    from_pattern: str,
    to_pattern: str,
    max_hops: int = 5,
    scoring_keywords: List[str] = []
) -> List[Dict[str, Any]]:
    """Find paths with custom relevance scoring."""

    keyword_conditions = " OR ".join([
        f'n.label CONTAINS "{kw}"' for kw in scoring_keywords
    ])

    query = f"""
    MATCH (start:Concept), (end:Concept)
    WHERE start.label =~ '(?i).*{from_pattern}.*'
      AND end.label =~ '(?i).*{to_pattern}.*'
    MATCH path = (start)-[*1..{max_hops}]-(end)
    WITH path,
         length(path) as pathLength,
         size([n in nodes(path) WHERE {keyword_conditions}]) as relevantNodes
    WITH path, pathLength, relevantNodes,
         (relevantNodes * 10 - pathLength) as score
    RETURN
        [n in nodes(path) | n.label] as concepts,
        pathLength,
        score
    ORDER BY score DESC
    LIMIT 10
    """

    return self._execute_cypher(query)

References


Last Updated: 2025-10-08 Apache AGE Migration: ADR-016