Skip to content

feat(sitesearch): vendor-neutral Site Search — neutral aggregation + OpenSearch impl + phase-aware router (#35786)#36282

Open
fabrizzio-dotCMS wants to merge 9 commits into
mainfrom
issue-35786-sitesearch-neutral-aggregation
Open

feat(sitesearch): vendor-neutral Site Search — neutral aggregation + OpenSearch impl + phase-aware router (#35786)#36282
fabrizzio-dotCMS wants to merge 9 commits into
mainfrom
issue-35786-sitesearch-neutral-aggregation

Conversation

@fabrizzio-dotCMS

@fabrizzio-dotCMS fabrizzio-dotCMS commented Jun 23, 2026

Copy link
Copy Markdown
Member

What & why

Closes the Site Search portion of the ES → OpenSearch migration (#35786). Site Search is decoupled from Elasticsearch types and given a working OpenSearch backend plus a phase-aware router, so it dual-writes and reads correctly across all migration phases.

Two commits:

  1. Vendor-neutral aggregation abstraction — removes org.elasticsearch.* from the SiteSearchAPI contract and SiteSearchWebAPI, reusing the existing com.dotcms.content.index.domain.Aggregation / AggregationBucket DTOs (from Aggregation return-type change breaks existing VTL templates accessing $results.aggregations #36026) with histogram support, and introduces DotSearchException.
  2. OpenSearch implementation + router — adds OSSiteSearchAPI, the SiteSearchAPIImpl phase router, and an integration test.

Changes

Area Change
SiteSearchAPI / SiteSearchWebAPI No vendor imports; getAggregations/getFacets return neutral Aggregation; DotSearchException added
OSSiteSearchAPI (new) @ApplicationScoped @Default OpenSearch impl. Search/aggregations via the generic client → ContentSearchResponse (mirrors OSSearchAPIImpl); doc put/delete via _doc PUT/DELETE; get via typed client.get(...). Default index resolved from VersionedIndicesAPI (not the deprecated IndiciesAPI)
SiteSearchAPIImpl (new, router) PhaseRouter<SiteSearchAPI> mirroring IndexAPIImpl; the single fan-out point. Reads → read provider; doc/index writes → write fan-out; listIndices/listClosedIndices merge in dual-write; Quartz task methods route to a single provider (fan-out would double-schedule jobs)
ESSiteSearchAPI Uses raw ESIndexAPI instead of the IndexAPI router so the SiteSearch router is the only fan-out point (avoids double dual-write of OS indices)
APILocator SITESEARCH_API now returns SiteSearchAPIImpl

Design notes

  • Placement: OS impl + router live next to ESSiteSearchAPI in the enterprise package (license-gated feature). The single annotated beans.xml covers the merged target/classes, so CDI still discovers the @Default bean.
  • OS index naming: site-search indices use untagged logical names. VersionedIndicesAPI force-tags .os on store/load, so the default is IndexTag.strip(...)-ed on read. deactivateIndex calls removeVersion(...) when removing the slot would leave the version empty (saveIndices rejects empty).
  • Highlights: the neutral SearchHit DTO carries no highlights, so OS search() returns empty highlight arrays (the ES path is best-effort too) — marked TODO OS.

Testing

  • ./mvnw compile -pl :dotcms-core → BUILD SUCCESS (Java 25)
  • ./mvnw test-compile -pl :dotcms-integration -am → BUILD SUCCESS
  • New OSSiteSearchAPIIntegrationTest (registered in OpenSearchUpgradeSuite) covers lifecycle, doc round-trip, aggregations, and default-index activation. Requires the opensearch-upgrade container:
    ./mvnw verify -pl :dotcms-integration -Dcoreit.test.skip=false -Dopensearch.upgrade.test=true -Dit.test=OSSiteSearchAPIIntegrationTest
    

🤖 Generated with Claude Code

Decouple SiteSearchAPI/SiteSearchWebAPI from Elasticsearch aggregation
types so Site Search can be served by OpenSearch in Phase 3.

- Reuse the existing neutral com.dotcms.content.index.domain.Aggregation
  / AggregationBucket DTOs (from #36026) instead of a new IndexAggregation
- Add neutral DotSearchException (unchecked) to replace ElasticsearchException
  on the public API surface
- SiteSearchAPI: drop org.elasticsearch.* imports; neutral Aggregation
  return type; createSiteSearchIndex throws DotSearchException
- SiteSearchWebAPI: remove InternalDateHistogram/StringTerms/Bucket casts
  and the Joda DateTime import; getFacets distinguishes histogram vs terms
  by aggregation type and feeds the legacy wrappers neutral buckets
- ESSiteSearchAPI: adapt ES results via Aggregation.from(); ES exception
  throws -> DotSearchException
- Add date/numeric histogram support to the neutral Aggregation ES factory
  (also fixes a latent CCE: the old getFacets cast the histogram key to
  Joda DateTime, which is a java.time.ZonedDateTime in ES 7.x)

OSSiteSearchAPI is deferred to #34609 (not yet in the codebase);
Aggregation.fromOS() is already in place for it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Claude finished @fabrizzio-dotCMS's task in 3m 34s —— View job


Rollback Safety Analysis

  • Load rollback-unsafe categories reference
  • Get PR diff
  • Analyze diff against all unsafe categories
  • Post findings and apply label

Pull Request Unsafe to Rollback!!!

  • Category: H-8 — VTL Viewtool Contract Change

  • Risk Level: 🟠 HIGH

  • Why it's unsafe: SiteSearchWebAPI is registered in toolbox.xml as $sitesearch (request-scoped viewtool). Two public methods on it changed their return types in this PR:

    • getAggregations(String, String) — return type swapped from Map<String, org.elasticsearch.search.aggregations.Aggregation> to Map<String, com.dotcms.content.index.domain.Aggregation>. The new neutral Aggregation record exposes getBuckets(), getHits(), and iterator(), none of which exist on the old ES Aggregation interface.
    • getFacets(String, String) — same swap (same interface change cascades to this method via the SiteSearchAPI interface).
    • The inner classes InternalWrapperCountDateHistogramFacet and InternalWrapperStringTermsFacet had their constructor parameter types changed from List<InternalDateHistogram.Bucket> / List<StringTerms.Bucket> to List<AggregationBucket>. These inner classes are Velocity-accessible (public, non-static inner classes of the viewtool).

    Any Velocity template co-migrated during the N release cycle to call $sitesearch.getAggregations(...) and then access $agg.buckets or $agg.hits (new accessors on the neutral record) will break on N-1 rollback: N-1's binary exposes the old ES Aggregation interface, which lacks those methods.

  • Code that makes it unsafe:

    // dotCMS/src/main/java/com/dotmarketing/sitesearch/business/SiteSearchAPI.java (lines 5–13)
    // Interface signature change on viewtool-exposed methods:
    - import org.elasticsearch.search.aggregations.Aggregation;
    + import com.dotcms.content.index.domain.Aggregation;
    
    // dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchAPI.java (lines 658–662)
    // Return type changed:
    - return response.getAggregations().asMap();
    + return Aggregation.from(response.getAggregations());
    
    // dotCMS/src/main/java/com/dotcms/content/index/domain/Aggregation.java — new public accessors:
    + List<AggregationBucket> getBuckets()  // does not exist on old ES Aggregation interface
    + SearchHits getHits()                  // does not exist on old ES Aggregation interface
    + Iterator<AggregationBucket> iterator() // does not exist on old ES Aggregation interface
  • Alternative (if possible): Apply the two-phase contract change from H-8:

    • Release N (this PR): Keep getAggregations() / getFacets() returning an intermediate type that implements both the old ES Aggregation interface and exposes the new getBuckets() / getHits() accessors — or retain a deprecated overload that returns the old type alongside the new method. This lets templates on both N and N-1 resolve correctly.
    • Release N+1: Remove the old ES Aggregation return type once N-1 is outside the rollback window.
    • Alternatively, if no templates currently use $sitesearch.getAggregations() directly (i.e., none call $agg.buckets or $agg.hits), document this as rollback-unsafe in the release notes and track as a known risk window until templates are confirmed safe to migrate.

No database schema changes, Elasticsearch mapping changes, contentlet_as_json version bumps, DROP TABLE/COLUMN, data migrations, push publishing bundle changes, or OpenSearch mapping changes to the content index were found. The new os-sitesearch-mapping.json / os-sitesearch-settings.json are net-new resources that don't affect existing indices on rollback. The SiteSearchAPI checked-exception swap (ElasticsearchExceptionDotSearchException) is a Java compile-time-only change with no runtime rollback impact. The SiteSearchAPIImpl router and OSSiteSearchAPI are additive new classes.

@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

🤖 Bedrock Review — deepseek.v3.2

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:328setAlias method incorrectly returns false on success. The method calls indexApi.createAlias(indexName, alias) but then returns false, which contradicts the expected behavior (should return true on success). This could cause callers to misinterpret the operation's outcome.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:330 — Missing null/empty check for alias parameter in setAlias. The method validates both indexName and alias but only throws an IllegalArgumentException if both are not set. It should also handle the case where alias is empty or null individually, as the error message suggests.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:336 — Potential NullPointerException when reading resource files. The code assumes url.getPath() is non-null, but classLoader.getResource(...) could return null if the resource is not found. This would cause an NPE when calling new File(url.getPath()).

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:336-338 — Hardcoded resource paths may not be portable. Using File with url.getPath() assumes a file-system URL, which may fail in certain deployment environments (e.g., JAR packaging). Consider using InputStream instead.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:445 — Missing validation for resultType parameter in putToIndex. The method logs resultType but does not validate it; while not critical, unvalidated input in logs could be a minor security concern if it contains malicious characters.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:447 — Potential NullPointerException when res.getMap() returns null. The code accesses res.getMap().put(...) and res.getMap().get(...) without checking if the map is null. This could happen if SiteSearchResult is improperly constructed.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:455 — Inconsistent logging level: Logger.info for each document write in putToIndex could produce excessive log output in production. Consider lowering to debug or making it conditional.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:462 — Hardcoded endpoint path "/" + physicalName(idx) + "/_doc/" + res.getId() assumes OpenSearch's _doc type. While OpenSearch 3.x uses _doc, this is a vendor-specific detail that should be centralized or configurable.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:468 — Error handling in putToIndex only logs the HTTP status; it does not throw an exception or propagate failure. This could silently ignore indexing failures.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:486 — Suppressed warnings @SuppressWarnings({"unchecked", "rawtypes"}) in getFromIndex indicate unsafe type casting. While likely safe due to OpenSearch's JSON response, it's a code smell.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:505 — In deleteFromIndex, logging includes the docId which could be sensitive (e.g., containing PII). Ensure docId is sanitized or avoid logging it at info level.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:511 — Error handling in deleteFromIndex treats HTTP 404 as benign, which is correct, but other error statuses are only logged. Consider throwing an exception for non-404 errors to signal failure.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:752 — In rawSearch, the method catches IOException and wraps it in DotSearchException, but other exceptions (e.g., JSONException from new JSONObject(query)) are caught as Exception and may lose context. Ensure all relevant exceptions are properly handled.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:145 — In isDefaultIndex, the method catches Exception and wraps it in DotDataException, which could mask the original exception type. It should preserve the original exception if possible.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:177 — In createSiteSearchIndex, the method catches Exception and wraps it in IOException, which may not be appropriate for all exception types (e.g., DotSearchException). It should rethrow the original exception if it's already a DotSearchException or IOException.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:195 — In activateIndex, the method catches Exception and wraps it in DotDataException, which could obscure the root cause. It should rethrow DotDataException as is.

🟡 Medium dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:204 — In deactivateIndex, the method catches Exception and wraps it in DotDataException, but IOException should be rethrown as is. The catch block should handle DotDataException and IOException separately.

🟡 Medium dotCMS/src/main/java/com/dotcms/content/index/domain/AggregationBucket.java:115 — In histogramKey, the method converts ZonedDateTime to epoch millis, but the conversion may lose sub-millisecond precision. Ensure this is acceptable for downstream use.

🟡 Medium dotCMS/src/main/java/com/dotcms/content/index/domain/AggregationBucket.java:118 — In histogramKey, converting Number to longValue() may truncate floating-point values. Consider preserving the original number type if needed.

🟡 Medium dotCMS/src/main/java/com/dotmarketing/sitesearch/viewtool/SiteSearchWebAPI.java:190 — In isHistogram, the check type.contains("histogram") is fragile; it could match unrelated aggregation types. Consider a more precise check (e.g., type.endsWith("_histogram")).

🟡 Medium dotCMS/src/main/java/com/dotmarketing/sitesearch/viewtool/SiteSearchWebAPI.java:213 — In InternalWrapperCountDateHistogramFacet, the key may be null if entry.getKeyAsNumber() returns null. The code assigns time = 0L in that case, which may not be correct. Handle null appropriately.

🟠 High dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:336 — Security: File path constructed from classpath resource URL (url.getPath()) could be vulnerable to path traversal if the resource URL is manipulated. Ensure the resource is trusted (it's internal, so low risk).

🟠 High dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:336-338 — Security: Reading files via File with paths derived from classloader resources could expose the server's file system if the resource path is attacker-controlled. Since resources are internal, risk is low, but consider using getResourceAsStream instead.

🟡 Medium dotcms-integration/src/test/java/com/dotcms/content/index/opensearch/OSSiteSearchAPIIntegrationTest.java:300 — Test truncation: The diff ends abruptly, indicating the test file may be incomplete. Ensure the full test suite is included and properly structured.

Note: The PR introduces a new OpenSearch implementation for Site Search with a phase-aware router, which is a significant architectural change. The code generally follows dotCMS conventions and includes comprehensive


Run: #28054257576 · tokens: in: 19743 · out: 2048 · total: 21791

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Pull Request Unsafe to Rollback!!!

  • Category: H-8 — VTL Viewtool Contract Change
  • Risk Level: 🟠 HIGH
  • Why it's unsafe: SiteSearchWebAPI is registered in WEB-INF/toolbox.xml under $sitesearch (request scope). Its getAggregations(String, String) method's return type was swapped from Map<String, org.elasticsearch.search.aggregations.Aggregation> (ES interface) to Map<String, com.dotcms.content.index.domain.Aggregation> (neutral record). These are different types with different accessor sets. The new neutral Aggregation record exposes getBuckets(), getHits(), and iterator() — accessors that do not exist on the old ES Aggregation interface. If any Velocity template during the N release cycle calls $agg.buckets, $agg.hits, or iterates the aggregation directly (all new accessors exclusive to the neutral record), rolling back to N-1 breaks those templates at render time: N-1's getAggregations() returns the ES Aggregation interface which lacks those methods. No templates were co-migrated in this PR, but the risk window is open for the entire N release cycle: any template written or modified to adopt the new contract will fail on N-1 rollback. Per the H-8 reference, absent proof that no templates will use these new accessors, the risk is rated HIGH.
  • Code that makes it unsafe:
    // dotCMS/src/main/java/com/dotmarketing/sitesearch/business/SiteSearchAPI.java
    // Return type changed on the viewtool-exposed method:
    - Map<String, org.elasticsearch.search.aggregations.Aggregation> getAggregations(...);
    + Map<String, com.dotcms.content.index.domain.Aggregation> getAggregations(...);
    
    // dotCMS/src/main/java/com/dotcms/content/index/domain/Aggregation.java
    // New accessors not on the old ES Aggregation interface:
    + List<AggregationBucket> getBuckets()   // Vector 2b: new accessor
    + SearchHits getHits()                   // Vector 2b: new accessor
    + Iterator<AggregationBucket> iterator() // Vector 2b: new accessor
    
    // toolbox.xml registration (unchanged, but confirms VTL exposure):
    <key>sitesearch</key>
    <class>com.dotmarketing.sitesearch.viewtool.SiteSearchWebAPI</class>
    
  • Alternative (if possible): Apply the two-phase contract change from H-8:
    • Release N (this PR): Ship the new neutral Aggregation record with the new accessors, but also keep getAggregations() returning the old ES Aggregation type (or an intermediate adapter that implements both the ES interface and exposes the new accessors). This allows templates on N and N-1 to both resolve correctly.
    • Release N+1: Remove the old ES Aggregation from the return type once N-1 is outside the rollback window.
    • Alternatively, since no templates were co-migrated in this PR, document this as rollback-unsafe in the release notes and identify $sitesearch.getAggregations() as the affected viewtool key before any templates adopt the new getBuckets()/getHits() accessors.

#35786)

Completes the vendor-neutral Site Search extraction begun in #35786 by adding
the OpenSearch implementation and a phase-aware router, so Site Search dual-writes
and reads correctly across the ES -> OS migration phases.

- OSSiteSearchAPI: @ApplicationScoped @default OpenSearch implementation of
  SiteSearchAPI. Search/aggregations via the generic client -> ContentSearchResponse
  (mirrors OSSearchAPIImpl); doc put/delete via _doc PUT/DELETE; get via typed
  client.get(...). Default site-search index resolved from VersionedIndicesAPI
  (not the deprecated IndiciesAPI). Index names handled in logical space; the
  .os tag forced by VersionedIndicesAPI is stripped on read.
- SiteSearchAPIImpl: PhaseRouter<SiteSearchAPI> router mirroring IndexAPIImpl and
  acting as the single fan-out point. Reads -> read provider; doc/index writes ->
  write fan-out; listIndices/listClosedIndices merge in dual-write; Quartz task
  methods route to a single provider (fan-out would double-schedule jobs).
- ESSiteSearchAPI: use raw ESIndexAPI instead of the IndexAPI router so the
  SiteSearch router is the only fan-out point (avoids double dual-write).
- APILocator: SITESEARCH_API now returns SiteSearchAPIImpl.
- OSSiteSearchAPIIntegrationTest: lifecycle, doc round-trip, aggregations, and
  default-index activation; registered in OpenSearchUpgradeSuite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@fabrizzio-dotCMS fabrizzio-dotCMS changed the title fix(sitesearch): vendor-neutral aggregation abstraction (#35786) feat(sitesearch): vendor-neutral Site Search — neutral aggregation + OpenSearch impl + phase-aware router (#35786) Jun 23, 2026
CI (OpenSearch Upgrade Suite) failed: every OSSiteSearchAPIIntegrationTest that
creates an index errored with "Failed to parse index settings". The OS impl was
loading es-sitesearch-settings.json, whose ES-only token-filter syntax (edgeNGram,
side) is rejected by the typed OpenSearch IndexSettings deserializer in
OSIndexAPIImpl.createIndex.

Add os-sitesearch-settings.json declaring the same analyzers (standard_content,
partial_content, comma_analyzer) in OpenSearch syntax (edge_ngram, no side), and
load it from OSSiteSearchAPI.createSiteSearchIndex. The mapping is vendor-neutral
and reused as-is.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l index

The aggregation IT failed: mimeType aggregation hit "Text fields are not optimised
... use a keyword field". Root cause: createSiteSearchIndex delegated the mapping
PUT to MappingOperationsOS, which force-tags the physical name with `.os`. Site
search uses untagged logical names, so the mapping landed on a different (`.os`)
index while the real index kept the dynamic default mapping (string -> text),
breaking keyword aggregations.

Apply the mapping with a raw PUT /<index>/_mapping against the same untagged
physical name used by createIndex/search/put, and drop the MappingOperationsOS
dependency.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

🤖 Bedrock Review — us.deepseek.r1-v1:0

New Issues

  • 🟠 High: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchAPI.java:93 — Constructor directly instantiates ESIndexAPI() instead of using APILocator, potentially bypassing required initialization and causing NPEs in dependent services
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:189 — Dual-write operations lack error recovery — partial failures could leave ES/OS indices inconsistent with no rollback
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:483 — Missing input validation on indexName parameter allows invalid characters through to OpenSearch client

Existing

  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:106 — Merged index lists in dual-write phase could return duplicates if same logical name exists in both ES and OS clusters

Run: #28056199569 · tokens: in: 20424 · out: 1336 · total: 21760

…5786)

Adds SiteSearchWebAPITest covering the view-tool surface affected by the
neutral-aggregation refactor: search() (default-index, alias, pagination, empty
and error paths) with full SiteSearchResults/SiteSearchResult field assertions;
getAggregations() over the neutral Aggregation/AggregationBucket tree (terms,
nested top_hits, numeric-histogram getKeyAsNumber); and getFacets() across all
three legacy wrappers (string-terms, count-histogram, plain Facet fallback).
Registered in MainSuite1b alongside ContentSearchToolTest.

Also a minor List.getFirst() cleanup in SiteSearchAPIImpl.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

🤖 Bedrock Review — us.deepseek.r1-v1:0

New Issues

  • 🟠 High: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:189 — Dual-write operations lack error recovery — partial failures could leave ES/OS indices inconsistent
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:482 — Missing input validation on indexName parameter allows invalid characters
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:357 — PUT document failures are logged but not propagated, risking silent data loss

Existing

  • 🟠 High: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchAPI.java:93 — Constructor directly instantiates ESIndexAPI() instead of using APILocator, potentially bypassing required initialization
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:189 — Dual-write operations lack error recovery — partial failures could leave ES/OS indices inconsistent
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:483 — Missing input validation on indexName parameter allows invalid characters

Resolved

  • dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:106 — Merged index lists use LinkedHashSet to deduplicate entries

Run: #28060964611 · tokens: in: 20687 · out: 1621 · total: 22308

…apping

Two OpenSearch site-search regressions surfaced by the dual-write fan-out:

1. Shared mutable result across the fan-out. SiteSearchAPIImpl.putToIndex
   handed the same SiteSearchResult to both leaves. putToIndex mutates the
   backing map (setKeywords rewrites "keywords" String -> List), so the first
   leaf (ES) corrupted the input the second leaf (OS) then read, throwing
   ClassCastException: EmptyList cannot be cast to String and silently dropping
   every document from OpenSearch. The router now copies the result (and each
   element of the batch overload) per provider.

2. Mapping fan-out leak. ESSiteSearchAPI.createSiteSearchIndex applied its
   mapping through the phase-dispatched ESMappingAPIImpl.putMapping, which fanned
   out a second time to OpenSearch using a .os-tagged physical name that
   site-search OS indices never use -> HTTP 404. Pinned the ES leaf to
   IndexTag.ES, restoring the single-fan-out invariant (SiteSearchAPIImpl already
   drives OSSiteSearchAPI, which owns its own untagged OS index + mapping).

Adds SiteSearchDualWriteRouterIT (registered in OpenSearchUpgradeSuite) which
drives the router in Phase 1 dual-write and asserts documents reach OpenSearch
(single + batch) — the isolated OS-leaf IT cannot reproduce either bug.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

🤖 Bedrock Review — us.deepseek.r1-v1:0

New Issues

  • 🟠 High: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:357 — PUT document failures are logged but not propagated, risking silent data loss when OpenSearch write operations fail
  • 🟠 High: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:189 — Dual-write operations lack transactional error handling — partial failures leave ES/OS indices inconsistent
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:482 — Missing validation on indexName parameter allows invalid OpenSearch index names (special chars, uppercase)

Existing

  • 🟠 High: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchAPI.java:93 — Constructor directly instantiates ESIndexAPI() instead of using APILocator, potentially bypassing required initialization
  • 🟠 High: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:189 — Dual-write operations lack error recovery — partial failures could leave ES/OS indices inconsistent
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:482 — Missing input validation on indexName parameter allows invalid characters
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:357 — PUT document failures are logged but not propagated, risking silent data loss

Run: #28064221231 · tokens: in: 20669 · out: 2540 · total: 23209

@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

🤖 Bedrock Review — qwen.qwen3-next-80b-a3b

New Issues

  • 🔴 Critical: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082putToIndex silently swallows all exceptions during document write; failure to write to OpenSearch is logged but not propagated, risking silent data loss in publishing pipelines
  • 🔴 Critical: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1136deleteFromIndex ignores HTTP 404 as benign but does not validate index existence before attempting delete; if index is missing, the operation silently succeeds despite invalid state
  • 🔴 Critical: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:227putToIndex fan-out copies mutable SiteSearchResult per provider, but copyOf() only shallow-copies the map; if any downstream mutation modifies nested objects (e.g., Map<String, Object> values), concurrent writes corrupt state
  • 🟠 High: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082putToIndex logs raw exception message without context (index, docId, status) — obscures debugging in production
  • 🟠 High: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1136deleteFromIndex uses physicalName(idx) without validating idx is non-null or non-empty — NPE risk if called with malformed input
  • 🟠 High: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1046putToIndex calls ESMappingAPIImpl.toJsonString(res.getMap()) — uses ES-specific JSON serializer on OpenSearch path; may produce incompatible field types (e.g., Date → string format mismatch)
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1136deleteFromIndex logs "deleting from : " + idx + " url:" + docId — misleading; docId is not a URL
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082putToIndex does not validate idx or res.getId() before constructing endpoint — risk of malformed OpenSearch request
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1046putToIndex uses ESMappingAPIImpl for JSON serialization — violates vendor isolation; should use OpenSearch-compatible mapper
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1136deleteFromIndex does not check if index is closed or non-existent before DELETE — may trigger unexpected 400/404 from OpenSearch without recovery
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082putToIndex does not use @WrapInTransaction for index write — if this is part of a larger transactional flow (e.g., content publish), failure to rollback on OpenSearch write leaves inconsistent state
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1046putToIndex calls res.getMap() directly — if SiteSearchResult.getMap() returns a mutable reference, concurrent access from fan-out may race
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082putToIndex uses refresh=true — performance anti-pattern under high load; should be configurable or batched
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1136deleteFromIndex uses docId as endpoint ID — assumes doc ID is URL-safe; no encoding or validation — risk of malformed OpenSearch request
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1046putToIndex does not validate res.getId() is non-null — NPE risk when constructing endpoint
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082putToIndex logs "writing from : " + idx + " type: " + resultType + " url:" + res.getUrl() — logs sensitive URL path without masking; may expose internal paths or PII
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1046putToIndex uses res.getMap() — if SiteSearchResult is reused across threads (e.g., in batch), concurrent modification may occur
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1046putToIndex uses ESMappingAPIImpl.toJsonString — violates vendor neutrality; should use ObjectMapper from OSClientProvider or JsonpMapper
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1136deleteFromIndex does not validate idx is a valid site-search index — may delete from unrelated index if alias or typo provided
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082putToIndex does not check if index exists before PUT — may fail with 404 if index was deleted between listIndices and putToIndex

Existing

  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:357 — Prior finding still present: PUT document failures are logged but not propagated, risking silent data loss when OpenSearch write operations fail

Resolved

  • dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchAPI.java:88 — Replaced direct new ESIndexAPI() with APILocator usage — fixed constructor bypass
  • dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchAPI.java:351 — Replaced ElasticsearchException with DotSearchException — vendor-neutral exception contract enforced
  • dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchAPI.java:385mappingAPI.putMapping now uses IndexTag.ES — prevents dual-write to OS index
  • dotCMS/src/enterprise/java/com/dotmarketing/business/APILocator.java:1484SITESEARCH_API now returns SiteSearchAPIImpl — router is now the single entry point
  • dotCMS/src/main/java/com/dotmarketing/sitesearch/business/SiteSearchAPI.java:36createSiteSearchIndex now throws DotSearchException — consistent with new contract
  • dotCMS/src/main/java/com/dotcms/content/index/domain/Aggregation.java:88 — Added histogram bucket support — fixes aggregation deserialization
  • dotCMS/src/main/java/com/dotcms/content/index/domain/AggregationBucket.java:92 — Added fromHistogram and histogramKey — fixes numeric/date histogram key handling
  • dotCMS/src/main/java/com/dotcms/content/index/domain/DotSearchException.java — New vendor-neutral exception — replaces ElasticsearchException across API surface
  • dotCMS/src/main/java/com/dotmarketing/sitesearch/viewtool/SiteSearchWebAPI.java:173 — Replaced ES-specific InternalDateHistogram/StringTerms with vendor-neutral Aggregation/AggregationBucket — fixes type coupling
  • dotCMS/src/main/java/com/dotmarketing/sitesearch/viewtool/SiteSearchWebAPI.java:189isHistogram now checks type string — avoids class-cast exceptions
  • dotCMS/src/main/java/com/dotmarketing/sitesearch/viewtool/SiteSearchWebAPI.java:244InternalWrapperCountDateHistogramFacet now uses AggregationBucket.getKeyAsNumber() — fixes date histogram key conversion
  • dotCMS/src/main/java/com/dotmarketing/sitesearch/viewtool/SiteSearchWebAPI.java:280InternalWrapperStringTermsFacet now uses AggregationBucket.getKey() — fixes string key handling
  • dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082putToIndex now logs HTTP status — improves observability (though still swallows failure)

Run: #28107343243 · tokens: in: 19596 · out: 3443 · total: 23039

…index path

The OpenSearch site-search create path loaded its settings from
os-sitesearch-settings.json but reused es-sitesearch-mapping.json for the
mapping. The mapping is functionally OS-compatible (its analyzers exist in the
OS settings), but reading an es-*.json resource from the OS lifecycle couples
the two vendors: a future ES-only mapping change would silently alter OS.

Adds os-sitesearch-mapping.json (identical content today) and points
OSSiteSearchAPI.createSiteSearchIndex at it, mirroring the settings split so
ES and OS own their resources independently.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

🤖 Bedrock Review — qwen.qwen3-next-80b-a3b

New Issues

  • 🔴 Critical: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082putToIndex silently swallows all exceptions during document write; failure to write to OpenSearch is logged but not propagated, risking silent data loss in publishing pipelines
  • 🔴 Critical: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1136deleteFromIndex ignores HTTP 404 as benign but does not validate index existence before attempting delete; if index is missing, the operation silently succeeds despite invalid state
  • 🔴 Critical: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:227putToIndex fan-out copies mutable SiteSearchResult per provider, but copyOf() only shallow-copies the map; if any downstream mutation modifies nested objects (e.g., Map<String, Object> values), concurrent writes corrupt state
  • 🟠 High: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082putToIndex logs raw exception message without context (index, docId, status) — obscures debugging in production
  • 🟠 High: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1136deleteFromIndex uses physicalName(idx) without validating idx is non-null or non-empty — NPE risk if called with malformed input
  • 🟠 High: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1046putToIndex calls ESMappingAPIImpl.toJsonString(res.getMap()) — uses ES-specific JSON serializer on OpenSearch path; may produce incompatible field types (e.g., Date → string format mismatch)
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1136deleteFromIndex logs "deleting from : " + idx + " url:" + docId — misleading; docId is not a URL
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082putToIndex does not validate idx or res.getId() before constructing endpoint — risk of malformed OpenSearch request
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1046putToIndex uses ESMappingAPIImpl for JSON serialization — violates vendor isolation; should use OpenSearch-compatible mapper
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1136deleteFromIndex does not check if index is closed or non-existent before DELETE — may trigger unexpected 400/404 from OpenSearch without recovery
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082putToIndex uses refresh=true — performance anti-pattern under high load; should be configurable or batched
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1136deleteFromIndex uses docId as endpoint ID — assumes doc ID is URL-safe; no encoding or validation — risk of malformed OpenSearch request
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1046putToIndex does not validate res.getId() is non-null — NPE risk when constructing endpoint
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082putToIndex logs "writing from : " + idx + " type: " + resultType + " url:" + res.getUrl() — logs sensitive URL path without masking; may expose internal paths or PII
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1046putToIndex uses res.getMap() — if SiteSearchResult is reused across threads (e.g., in batch), concurrent modification may occur

Existing

  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1082 — prior finding still present: putToIndex silently swallows exceptions (same as prior ESSiteSearchAPI issue)
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/OSSiteSearchAPI.java:1136 — prior finding still present: deleteFromIndex ignores 404 and lacks validation (same as prior ESSiteSearchAPI issue)
  • 🟡 Medium: dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:227 — prior finding still present: shallow copy of SiteSearchResult map risks nested mutation corruption (same as prior issue)

Resolved

  • dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchAPI.java:379ElasticsearchException replaced with DotSearchException
  • dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchAPI.java:634ElasticsearchException replaced with DotSearchException
  • dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchAPI.java:669ElasticsearchException replaced with DotSearchException
  • dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchAPI.java:683ElasticsearchException replaced with DotSearchException
  • dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchAPI.java:1046ESMappingAPIImpl.putMapping now uses IndexTag.ES to prevent dual-write
  • dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/ESSiteSearchAPI.java:1082ESMappingAPIImpl no longer used for OpenSearch path
  • dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/sitesearch/SiteSearchAPIImpl.java:227copyOf() now creates independent SiteSearchResult per provider (shallow copy remains, but context clarifies intent)
  • dotCMS/src/main/java/com/dotmarketing/sitesearch/business/SiteSearchAPI.java:36ElasticsearchException replaced with DotSearchException
  • dotCMS/src/main/java/com/dotcms/content/index/domain/Aggregation.java:88Histogram bucket handling added for ES compatibility
  • dotCMS/src/main/java/com/dotcms/content/index/domain/AggregationBucket.java:92fromHistogram added to handle numeric/date histogram keys
  • dotCMS/src/main/java/com/dotcms/content/index/domain/DotSearchException.java — new vendor-neutral exception introduced
  • dotCMS/src/main/java/com/dotmarketing/business/APILocator.java:1484SITESEARCH_API now returns SiteSearchAPIImpl instead of ESSiteSearchAPI
  • dotCMS/src/main/java/com/dotmarketing/sitesearch/viewtool/SiteSearchWebAPI.java:173Aggregation and AggregationBucket now used instead of ES-specific types
  • dotCMS/src/main/java/com/dotmarketing/sitesearch/viewtool/SiteSearchWebAPI.java:189isHistogram() added to detect histogram type without ES class dependency
  • dotCMS/src/main/java/com/dotmarketing/sitesearch/viewtool/SiteSearchWebAPI.java:237InternalWrapperStringTermsFacet now uses AggregationBucket.getKey() instead of Bucket.getKey().toString()
  • dotCMS/src/main/java/com/dotmarketing/sitesearch/viewtool/SiteSearchWebAPI.java:279InternalWrapperCountDateHistogramFacet now uses AggregationBucket.getKeyAsNumber() for epoch-millis
  • dotCMS/src/main/resources/os-sitesearch-mapping.json — new OpenSearch-specific mapping added
  • dotCMS/src/main/resources/os-sitesearch-settings.json — new OpenSearch-specific settings added
  • dotcms-integration/src/test/java/com/dotcms/OpenSearchUpgradeSuite.java — new integration tests added for OSSiteSearchAPI and SiteSearchAPIImpl
  • dotcms-integration/src/test/java/com/dotcms/content/index/opensearch/OSSiteSearchAPIIntegrationTest.java — new integration test file added

Run: #28109554310 · tokens: in: 20441 · out: 2560 · total: 23001

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Not Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants