or: Why “ä” and “ä” isn’t the same..
Don’t these characters look the same to you?
To me – they do. Well now they do – i noticed during a project that one of the characters didn’t show up on screen while being clearly visible in the Code Inspection Tools of Chrome or Firefox.
What had happend?
A colleague copy pasted text from a PDF File and used parts from it in a description text.
It seems that some software instead of using the simple “ä” use a UTF-8 combination equivalent of “a” and ” ¨ “.
Often the single ” ¨ ” is not contained in public available fonts. This character is called trema or dieresis.
Fortunately the php-intl package already contains a solution for my problem – the Normalizer Class: https://www.php.net/manual/en/normalizer.normalize.php
I attached an example for you:
<?php $a ='ä'; $b ='ä'; echo urlencode($a); echo ' '; echo urlencode($b).PHP_EOL.PHP_EOL; $a = Normalizer::normalize( $a, Normalizer::FORM_C ); $b = Normalizer::normalize( $b, Normalizer::FORM_C ); echo urlencode($a); echo ' '; echo urlencode($b);