For AI Coding Assistants: This document provides comprehensive guidance for understanding and developing Cartography intel modules. It contains codebase-specific patterns, architectural decisions, and implementation details necessary for effective AI-assisted development within the Cartography project.
This guide teaches you how to write intel modules for Cartography using the modern data model approach. We'll walk through real examples from the codebase to show you the patterns and best practices.
- Procedure Skills - Auto-loaded skills under
.agents/skills/ - AI Assistant Quick Reference - Key concepts and imports
- Git and Pull Request Guidelines - Commit signing and PR templates
- Quick Start - Copy an existing module
- Quick Reference Cheat Sheet - Copy-paste templates
Procedures for building and extending Cartography intel modules ship as Claude skills under .agents/skills/. Skill-aware agents auto-load each skill from its YAML frontmatter when a relevant task starts; you do not need to open the files manually. The available skills are:
create-moduleadd-node-typeadd-relationshipanalysis-jobscreate-ruleenrich-ontologyrefactor-legacytroubleshooting
Key Cartography Concepts:
- Intel Module: Component that fetches data from external APIs and loads into Neo4j
- Sync Pattern:
get()->transform()->load()->cleanup()->analysis(optional) - Data Model: Declarative schema using
CartographyNodeSchemaandCartographyRelSchema - Update Tag: Timestamp used for cleanup jobs to remove stale data
- Analysis Jobs: Post-ingestion queries that enrich the graph (e.g., internet exposure, permission inheritance). When a job manages relationships, put
MERGEstatements before the stale-edgeDELETE; iterative deletion exposes a window where concurrent readers see those edges missing. See theanalysis-jobsskill.
Critical Files to Know:
cartography/config.py- Configuration object definitionscartography/cli.py- Typer-based CLI with organized help panelscartography/client/core/tx.py- Coreload()functioncartography/graph/job.py- Cleanup job utilitiescartography/models/core/- Base data model classes
Essential Imports:
import logging
from dataclasses import dataclass
from cartography.models.core.common import PropertyRef
from cartography.models.core.nodes import CartographyNodeProperties, CartographyNodeSchema, ExtraNodeLabels
from cartography.models.core.relationships import (
CartographyRelProperties, CartographyRelSchema, LinkDirection,
make_target_node_matcher, TargetNodeMatcher, OtherRelationships,
make_source_node_matcher, SourceNodeMatcher,
)
from cartography.client.core.tx import load, load_matchlinks, run_write_query
from cartography.graph.job import GraphJob
from cartography.util import timeit
# For analysis jobs (optional)
from cartography.util import run_analysis_job, run_scoped_analysis_job, run_analysis_and_ensure_deps
logger = logging.getLogger(__name__)PropertyRef Quick Reference:
PropertyRef("field_name") # Value from data dict
PropertyRef("KWARG_NAME", set_in_kwargs=True) # Value from load() kwargs
PropertyRef("field", extra_index=True) # Create database index
PropertyRef("field_list", one_to_many=True) # One-to-many relationshipsDebugging Tips:
- Check existing patterns in
cartography/intel/before creating new ones - Ensure
__init__.pyfiles exist in all module directories - Look at
tests/integration/cartography/intel/for similar test patterns - Review
cartography/models/for existing relationship patterns
Manual Write Queries:
- Prefer
load()/load_matchlinks()for normal ingestion andGraphJobfor cleanup. - If you must execute a handwritten write query, use
run_write_query()instead ofneo4j_session.run()so the write runs in a managed transaction with Cartography's retry handling. - Reserve direct
neo4j_session.run()for read queries or intentional low-level paths that cannot use the managed write helpers.
Deprecation Conventions:
- For temporary compatibility shims, legacy aliases, and migration-only edges, add a code comment in the form
# DEPRECATED: ... will be removed in v1.0.0. - Prefer comment-only deprecation markers for internal compatibility code that should stay quiet during normal runs.
- Use runtime warnings or log warnings only when users are actively invoking a deprecated public module or API surface.
Signing Commits: All commits must be signed using the -s flag. This adds a Signed-off-by line to your commit message, certifying that you have the right to submit the code under the project's license.
# Sign a commit with a message
git commit -s -m "feat(module): add new feature"Pull Request Descriptions: All pull requests must follow the template at .github/pull_request_template.md. Update the PR description to match the template sections if they are missing or incomplete.
The fastest way to get started is to copy the structure from an existing module:
- Simple module:
cartography/intel/lastpass/- Basic user sync with API calls - Complex module:
cartography/intel/aws/ec2/instances.py- Multiple relationships and data types - Reference documentation:
docs/root/dev/writing-intel-modules.md
For detailed step-by-step instructions, use the create-module skill.
@timeit
def sync(neo4j_session: neo4j.Session, api_key: str, tenant_id: str,
update_tag: int, common_job_parameters: dict[str, Any]) -> None:
"""
Main sync entry point for the module.
"""
logger.info("Starting MyResource sync")
# 1. GET - Fetch data from API
logger.debug("Fetching MyResource data from API")
raw_data = get(api_key, tenant_id)
# 2. TRANSFORM - Shape data for ingestion
logger.debug("Transforming %d MyResource items", len(raw_data))
transformed = transform(raw_data)
# 3. LOAD - Ingest to Neo4j
load_entities(neo4j_session, transformed, tenant_id, update_tag)
# 4. CLEANUP - Remove stale data
logger.debug("Running MyResource cleanup job")
cleanup(neo4j_session, common_job_parameters)
logger.info("Completed MyResource sync")def load_entities(neo4j_session: neo4j.Session, data: list[dict],
tenant_id: str, update_tag: int) -> None:
load(neo4j_session, YourSchema(), data,
lastupdated=update_tag, TENANT_ID=tenant_id)
def cleanup(neo4j_session: neo4j.Session, common_job_parameters: dict[str, Any]) -> None:
logger.debug("Running cleanup job for MyResource")
GraphJob.from_node_schema(YourSchema(), common_job_parameters).run(neo4j_session)def cleanup_custom_relationships(
neo4j_session: neo4j.Session,
common_job_parameters: dict[str, Any],
) -> None:
run_write_query(
neo4j_session,
"""
MATCH (n:YourNode)
WHERE n.lastupdated <> $UPDATE_TAG
DETACH DELETE n
""",
UPDATE_TAG=common_job_parameters["UPDATE_TAG"],
)@dataclass(frozen=True)
class YourNodeProperties(CartographyNodeProperties):
id: PropertyRef = PropertyRef("id") # REQUIRED
lastupdated: PropertyRef = PropertyRef("lastupdated", set_in_kwargs=True) # REQUIRED
# Your business properties here...# OUTWARD: (:Source)-[:REL]->(:Target)
direction: LinkDirection = LinkDirection.OUTWARD
# INWARD: (:Source)<-[:REL]-(:Target)
direction: LinkDirection = LinkDirection.INWARD# Transform: Create list field
{"entity_id": "123", "related_ids": ["a", "b", "c"]}
# Schema: Use one_to_many=True
target_node_matcher: TargetNodeMatcher = make_target_node_matcher({
"id": PropertyRef("related_ids", one_to_many=True),
})@dataclass(frozen=True)
class YourMatchLinkSchema(CartographyRelSchema):
target_node_label: str = "TargetNode"
target_node_matcher: TargetNodeMatcher = make_target_node_matcher({
"id": PropertyRef("target_id"),
})
source_node_label: str = "SourceNode"
source_node_matcher: SourceNodeMatcher = make_source_node_matcher({
"id": PropertyRef("source_id"),
})
direction: LinkDirection = LinkDirection.OUTWARD
rel_label: str = "CONNECTS_TO"
properties: YourMatchLinkRelProperties = YourMatchLinkRelProperties()
# Required properties for MatchLinks
@dataclass(frozen=True)
class YourMatchLinkRelProperties(CartographyRelProperties):
lastupdated: PropertyRef = PropertyRef("lastupdated", set_in_kwargs=True)
_sub_resource_label: PropertyRef = PropertyRef("_sub_resource_label", set_in_kwargs=True)
_sub_resource_id: PropertyRef = PropertyRef("_sub_resource_id", set_in_kwargs=True)
# Load and cleanup MatchLinks
load_matchlinks(neo4j_session, YourMatchLinkSchema(), mapping_data,
lastupdated=update_tag, _sub_resource_label="AWSAccount", _sub_resource_id=account_id)
GraphJob.from_matchlink(YourMatchLinkSchema(), "AWSAccount", account_id, update_tag).run(neo4j_session)cartography/intel/your_service/
├── __init__.py # Main entry point
└── entities.py # Domain sync modules
cartography/models/your_service/
├── entity.py # Data model definitions
└── tenant.py # Tenant model
tests/data/your_service/
└── entities.py # Mock test data
tests/integration/cartography/intel/your_service/
└── test_entities.py # Integration tests
For test-specific guidance, including integration test boundaries, Cypher usage,
fixtures, and check_nodes() / check_rels() helpers, see tests/AGENTS.md.
Remember: Start simple, iterate, and use existing modules as references. The Cartography community is here to help!