Skip to content

Commit f03702b

Browse files
committed
docs: add-warning-about-backslash-corruption-when-building-CQL-with-string-formatting
1 parent 6fbcacb commit f03702b

1 file changed

Lines changed: 50 additions & 0 deletions

File tree

docs/getting_started.rst

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,56 @@ Althought it is not recommended, you can also pass parameters to non-prepared
215215
statements. The driver supports two forms of parameter place-holders: positional
216216
and named.
217217

218+
.. warning::
219+
220+
Never use Python string formatting (f-strings, ``str.format()``, the ``%``
221+
operator) to interpolate query results directly into CQL strings.
222+
Collection types such as ``map``, ``list``, and ``set`` are returned by the
223+
driver as Python objects (e.g. :class:`~cassandra.util.OrderedMapSerializedKey`).
224+
Their ``__str__`` representation uses Python's ``repr()`` format for nested
225+
string values, which escapes backslashes (``\`` -> ``\\``). Embedding this
226+
representation in a CQL string will corrupt data containing backslashes or
227+
other special characters because CQL does not treat backslash as an escape
228+
character.
229+
230+
For example, suppose a row contains ``data = 'https:\/\/example.com'``
231+
(one backslash before each slash) and
232+
``map_data = {'url': 'https:\/\/example.com'}``:
233+
234+
.. code-block:: python
235+
236+
row = session.execute("SELECT * FROM t WHERE id = 'id1';").one()
237+
238+
# row.data is a plain Python str – str() prints it as-is:
239+
print(row.data)
240+
# -> https:\/\/example.com (1 backslash – correct)
241+
242+
# row.map_data is an OrderedMapSerializedKey – str() uses repr()
243+
# format for nested strings, which escapes every backslash:
244+
print(row.map_data)
245+
# -> {'url': 'https:\\/\\/example.com'} (2 backslashes – wrong!)
246+
247+
# Embedding str(row.map_data) in a CQL string sends the doubled
248+
# backslashes to Cassandra, corrupting the stored value:
249+
session.execute( # WRONG
250+
f"UPDATE t SET data='{row.data}', "
251+
f"map_data={row.map_data} WHERE id='{row.id}'"
252+
)
253+
# The CQL Cassandra receives:
254+
# UPDATE t SET map_data={'url': 'https:\\/\\/example.com'} ...
255+
# Cassandra stores 2 backslashes instead of 1 – data is corrupted.
256+
257+
Use a prepared statement instead, values are passed via the binary
258+
protocol and are never converted to CQL string literals:
259+
260+
.. code-block:: python
261+
262+
stmt = session.prepare(
263+
"UPDATE t SET data=?, map_data=? WHERE id=?"
264+
)
265+
session.execute(stmt, (row.data, row.map_data, row.id))
266+
# Cassandra receives the raw bytes – backslashes are preserved exactly.
267+
218268
Positional parameters are used with a ``%s`` placeholder. For example,
219269
when you execute:
220270

0 commit comments

Comments
 (0)