Skip to content

MDEV-10526: Add binary string support to bitwise operators#5190

Open
kjarir wants to merge 2 commits into
MariaDB:mainfrom
kjarir:feature/MDEV-10526-bitwise-binary
Open

MDEV-10526: Add binary string support to bitwise operators#5190
kjarir wants to merge 2 commits into
MariaDB:mainfrom
kjarir:feature/MDEV-10526-bitwise-binary

Conversation

@kjarir

@kjarir kjarir commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Draft for review. Implements byte-by-byte binary string mode for all scalar bitwise operators. Aggregate function
support (BIT_AND/BIT_OR/BIT_XOR) to follow.

Tested:

  • All 6 operators on VARBINARY columns
  • INET6 subnet masking (primary use case)
  • NULL handling
  • Mismatched length error
  • Integer mode backward compatibility
  • _binary introducer triggers binary mode correctly

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for binary-to-binary bitwise operations (AND, OR, XOR, NOT, and bitwise shifts) on binary strings by implementing new handler classes and adding corresponding error messages for mismatched operand sizes. The review feedback highlights several critical issues: a potential integer overflow/wrap-around in the left-shift index calculation on 32-bit systems, incorrect ASCII string conversion of numeric arguments in mixed-type bitwise operations, and potential undefined behavior when handling empty binary strings due to passing a null pointer to memset.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread sql/item_func.cc Outdated
Comment on lines +2266 to +2268
size_t src_idx= i + byte_shift;
if (src_idx < len)
{

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The calculation size_t src_idx = i + byte_shift is susceptible to integer overflow/wrap-around on 32-bit systems if len is very large (e.g., close to size_t max). If i + byte_shift overflows, src_idx wraps around and can evaluate to a value less than len, leading to incorrect data copying from wrapped-around indices instead of zero-filling. To prevent this, perform the bounds check using subtraction (byte_shift < len - i) before computing src_idx.

        if (byte_shift < len - i)
        {
          size_t src_idx= i + byte_shift;

Comment thread sql/item_func.cc
Comment on lines +7875 to +7876
StringBuffer<128> b_buf;
String *b= item->arguments()[1]->val_str(&b_buf);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In mixed-type bitwise operations (e.g., binary_string & numeric), calling val_str on the numeric argument converts it to its decimal string representation (e.g., 255 becomes the 3-byte ASCII string "255" / 0x323535), rather than its binary representation (e.g., 0xFF). This causes mismatched length errors (e.g., VARBINARY(1) & 255 compares length 1 with length 3) or incorrect bitwise operations if the lengths happen to match. To align with standard bitwise behavior (and MySQL 8.0 compatibility), numeric arguments should be converted to a binary string representation of their integer value, padded or truncated to match the length of the other binary string operand.

Comment thread sql/item_func.cc
Comment on lines +2243 to +2245
size_t len= a->length();

if (to->realloc(len))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When the input binary string a is empty (len == 0), calling to->realloc(0) can return a null pointer or do nothing. Subsequently, passing a null pointer to memset (e.g., memset(out_ptr, 0, len)) is technically undefined behavior in C/C++, even if the length is 0. Adding an early exit for len == 0 avoids this potential undefined behavior and improves efficiency by bypassing unnecessary allocation and loop overhead.

    size_t len= a->length();
    if (len == 0)
    { 
      to->length(0);
      to->set_charset(&my_charset_bin);
      item->null_value= false;
      return to;
    }

    if (to->realloc(len))

@kjarir kjarir force-pushed the feature/MDEV-10526-bitwise-binary branch from fa5cd7c to 862a606 Compare June 6, 2026 19:44
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Jun 8, 2026

@grooverdan grooverdan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be off the main branch as a new feature.

from @abarkov

"Would be nice to add tests that uuid, inet6, inet4, geometry are not allowed for bit operations. Later we can probably implement bit operations for uuid, inet6, inet4. But to avoid compatibility problems we need to make sure they return error now."

Comment thread sql/item_func.cc Outdated
Comment thread sql/item_func.cc

bool fix_length_and_dec(Item_handled_func *item) const override
{
item->max_length= item->arguments()[0]->max_length;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from @abarkov
"check that max_length is calculated correctly in all functions"

This will show up in mtr --cursor-protocol on test that have truncated test result output on fields.

Comment thread sql/item_func.cc Outdated
Comment thread sql/item_func.cc Outdated
@kjarir kjarir force-pushed the feature/MDEV-10526-bitwise-binary branch from 221664d to 6b9dd9e Compare June 11, 2026 07:24
@CLAassistant

CLAassistant commented Jun 11, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@grooverdan grooverdan changed the base branch from 10.11 to main June 11, 2026 07:28
@kjarir kjarir marked this pull request as ready for review June 25, 2026 07:07
@kjarir kjarir requested a review from grooverdan June 25, 2026 07:07

@gkodinov gkodinov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

Please squash your commits to 1 per feature. Maybe keep the spider part in a separate commit and the rest of it in another? Just a suggestion. But there definitely should not be 13 commits.

# =========================================================================
CREATE TABLE t1 (a VARBINARY(4), b VARBINARY(4));
INSERT INTO t1 VALUES (x'FFFF0000', x'FF00FF00');
# Expected output: FF000000, FFFFFF00, 00FFFF00, 0000FFFF, FFFE0000, 7FFF8000

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and everywhere: you do not need this line. The expected result is already recorded.

SELECT HEX(INET6_ATON('ffff:ffff::') & INET6_ATON('2001:db8::1'));
HEX(INET6_ATON('ffff:ffff::') & INET6_ATON('2001:db8::1'))
20010DB8000000000000000000000000
# Expected output: TBD - verify via --record

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

especially this needs to go away

# =========================================================================
CREATE TABLE t_null (a VARBINARY(4), b VARBINARY(4));
INSERT INTO t_null VALUES (x'FFFF0000', NULL);
# Expected output: NULL, NULL, NULL, NULL, NULL, NULL

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you really need a table for that?

Comment thread storage/spider/mysql-test/spider/t/direct_aggregate_bit.test
@gkodinov gkodinov self-assigned this Jun 25, 2026
kjarir added 2 commits June 26, 2026 00:41
…ates

Extends all six scalar bitwise operators (&, |, ^, ~, <<, >>)
and aggregate functions (BIT_AND, BIT_OR, BIT_XOR) to work
byte-by-byte on BINARY/VARBINARY columns, returning VARBINARY
of the same length.

Previously all operators silently cast binary arguments to
BIGINT (64-bit), truncating values wider than 64 bits. This
broke operations on INET6 addresses, UUIDs, and any binary
column wider than 8 bytes.

Binary mode activates when both operands have a true non-hybrid
binary string type_handler (Type_handler_longstr with my_charset_bin
collation, excluding hex hybrids like 0xFF). Uses the existing
Item_handled_func pluggable Handler pattern, adding new
Handler_str subclasses per operator alongside existing
Handler_ulonglong handlers.

Aggregate functions gain a parallel String accumulator with
correct neutral elements (0xFF for AND, 0x00 for OR/XOR).
Window function support added via a dynamically-sized uint32
bit-counter array (same sliding-window technique as the
existing 64-bit bit_counters[], extended to arbitrary lengths).

Key fixes included:
  - reset_field()/update_field() binary mode (crash fix:
    raw int8store on string-typed temp table field caused SIGSEGV)
  - return_type_handler() varies by max_length following the
    blob_type_handler/type_handler_varchar/type_handler_string
    pattern from Item_char_typecast_func_handler_fbt_to_binary
  - DERIVATION_COERCIBLE (not IMPLICIT) for binary results
  - MY_MAX() overflow pattern replaced with plain increment
    in window function counters
  - Mismatched lengths push warning (not hard error), escalating
    to error under strict SQL mode via THD::raise_condition()
  - Regressions in main.gis (geometry entering binary mode via
    STRING_RESULT+my_charset_bin) and main.func_json (JSON
    over-rejected by stricter check_arguments() override) fixed
  - GEOMETRY and hex hybrids excluded from binary mode detection

New error codes:
  ER_INVALID_BITWISE_OPERANDS_SIZE
  ER_INVALID_BITWISE_AGGREGATE_OPERANDS_SIZE

Closes: MDEV-10526
Implements Spider direct_add() support for BIT_AND/BIT_OR/BIT_XOR,
allowing each remote shard to compute its own partial aggregate
merged locally, mirroring the existing SUM/COUNT/MIN/MAX pattern.

BIT_AND/OR/XOR are associative and commutative so partial results
from shards can be merged correctly via direct_add().

Also fixes a pre-existing Spider bug: row->val_int() uses atoi()
which silently overflows for BIGINT UNSIGNED values above INT_MAX.
Fixed for this code path using my_strtoll10() instead.

Closes: MDEV-10526
@kjarir kjarir force-pushed the feature/MDEV-10526-bitwise-binary branch from 934bb95 to 75cfdf6 Compare June 25, 2026 19:14
@mariadb-YuchenPei mariadb-YuchenPei self-requested a review June 26, 2026 05:44

@gkodinov gkodinov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please stand by for the final review.

--echo # =========================================================================

SELECT HEX(INET6_ATON('ffff:ffff::') & INET6_ATON('2001:db8::1'));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cut the empty lines please

@mariadb-YuchenPei

Copy link
Copy Markdown
Contributor

I'm reviewing the spider bit

@gkodinov gkodinov assigned grooverdan and unassigned gkodinov Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. GSoC

Development

Successfully merging this pull request may close these issues.

5 participants