Description
Hello,
RFC3629 says:
The definition of UTF-8 prohibits encoding character numbers between
U+D800 and U+DFFF, which are reserved for use with the UTF-16
encoding form (as surrogate pairs) and do not directly represent
characters.
But both of mb_check_encoding and preg_match('//u', $input) doesn't detect when such characters occurs.
I found this whilst validating my code here:
Seldaek/jsonlint#91
I had to comment out the fast path relying on mb_check_encoding and preg_match.
Maybe there are related security issues.
Have a nice day, best regards,
Laurent Lyaudet
PHP Version
PHP 8.5.4 (cli) (built: May 25 2026 12:19:37) (NTS)
Copyright (c) The PHP Group
Built by Ubuntu
Zend Engine v4.5.4, Copyright (c) Zend Technologies
with Zend OPcache v8.5.4, Copyright (c), by Zend Technologies
Operating System
Ubuntu 26.04
Description
Hello,
RFC3629 says:
But both of mb_check_encoding and preg_match('//u', $input) doesn't detect when such characters occurs.
I found this whilst validating my code here:
Seldaek/jsonlint#91
I had to comment out the fast path relying on mb_check_encoding and preg_match.
Maybe there are related security issues.
Have a nice day, best regards,
Laurent Lyaudet
PHP Version
Operating System
Ubuntu 26.04