PHP: ユニコードのコードポイントから文字を表示

ソースコード

toCodePoint

文字からコードポイントを表示

1
2
3
4
5
6
7
8
<?php

function toCodePoint($string, $encoding = 'UTF-8')
{
    return bin2hex(mb_convert_encoding($string, 'UTF-32BE', $encoding));
}
echo toCodePoint('髙'), PHP_EOL;
/* vim:set fenc=utf-8 ff=unix: */

fromCodePoint

コードポイントから一覧表を表示

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
<?php
$encodings = [
    'CP932' => 9, //< Shift_JIS like
    //'CP51932' => 9, //< EUC-JP like
    'UTF-8' => 26,
    'UTF-16' => 14
];

function fromCodePoint($codePoint)
{
    $utf32 = '';
    foreach ((array)$codePoint as $code) {
        $utf32 .= hex2bin(str_pad($code, 8, 0, STR_PAD_LEFT));
    }
    return $utf32;
}
// Print header
foreach($encodings as $enc => $width) {
    $header[] = str_pad($enc, $width);
    $separator[] =  str_repeat('-', $width);
}
echo '|UTF-32BE            |', implode('|', $header), '|char', PHP_EOL;
echo '|--------------------|', implode('|', $separator), '|----', PHP_EOL;
foreach([
    '005C', '00A5', 'ffe5', 'ff3c', '8868', '30f3', //< REVERSE SOLIDUS/YEN SIGN //'2f', 'ff0f', //< /
    '007e', '203e', '02dc', 'FF5E', '301C', //< TILDE
    ['0031', '20E3'], ['0065', '0308'], '01D6',['00FC', '0304'], ['0075', '0308', '0304'], ['0915', '094D', '0937'], //< Combining character
    '2460', '32bf', 'ff71', '20bb7', '9ad9', '4f39', '1F37A', //< others
] as $codePoint) {
    $utf32 = fromCodePoint($codePoint);
    $info = array(str_pad('U+'.implode(' U+', (array)$codePoint), 20));
    foreach($encodings as $enc => $width) {
        $convert = mb_convert_encoding($utf32, $enc, 'UTF-32BE');
        if ($utf32 === mb_convert_encoding($convert, 'UTF-32BE', $enc)) {
            $info[] = str_pad(wordwrap(bin2hex($convert), ($enc === 'UTF-16') ? 4 : 2, ' ', true), $width);
        } elseif (strpos($convert, '?') !== false) {
            $info[] = str_repeat(' ', $width);
        } else {
            $info[] = str_pad('**'.wordwrap(bin2hex($convert), 2, ' ', true).'**', $width);
        }
    }
    //$info[] = str_pad(mb_convert_encoding($utf32, 'CP932', 'UTF-32BE'), 5);
    $info[] = str_pad(mb_convert_encoding($utf32, 'UTF-8', 'UTF-32BE'), 5);
    echo '|'.implode('|', $info), PHP_EOL;
}

一覧表(特殊な文字をピックアップ)

IS_SLASH(0x5C)

UNICODE CP932 UTF-8 UTF-16 char Summary
U+005C 5c 5c 005c REVERSE SOLIDUS
U+00A5 5c c2 a5 00a5 ¥ YEN SIGN
U+ffe5 81 8f ef bf a5 ffe5 FULLWIDTH YEN SIGN
U+ff3c 81 5f ef bc bc ff3c FULLWIDTH REVERSE SOLIDUS
U+8868 95 5c e8 a1 a8 8868
CP932: 2nd byte(5c) is IS_SLASH
e.g.ソⅨ圭構十申貼能表暴予禄 etc ...
U+30f3 83 93 e3 83 b3 30f3
CP932: 2nd byte(93) is IsDBCSLeadByte(93) => TRUE
e.g.☆●メモワヲンΑ決月号書買売覧 etc ...

TILDE(Special map)

UNICODE CP932 UTF-8 UTF-16 char Summary
U+007e 7e 7e 007e ~ VERTICAL LINE
U+203e 7e e2 80 be 203e OVERLINE
U+02dc   cb 9c 02dc ˜ SMALL TILDE
U+FF5E 81 60 ef bd 9e ff5e FULLWIDTH TILDE
U+301C 81 60 e3 80 9c 301c WAVE DASH

Combining character

UNICODE UTF-8 UTF-16 char Summary
U+0031 U+20E3 31 e2 83 a3 0031 20e3 1⃣  
U+0065 U+0308 65 cc 88 0065 0308 e + umlaut
U+01D6 c7 96 01d6 ǖ u + umlaut + macron
U+00FC U+0304 c3 bc cc 84 00fc 0304 ǖ u + umlaut + macron
U+0075 U+0308 U+0304 75 cc 88 cc 84 0075 0308 0304 ǖ u + umlaut + macron
U+0915 U+094D U+0937 e0 a4 95 e0 a5 8d e0 a4 b7 0915 094d 0937 क्ष Devanagari

Others

UNICODE CP932 UTF-8 UTF-16 char Summary
U+2460 87 40 e2 91 a0 2460 CIRCLED DIGIT ONE
U+32bf   e3 8a bf 32bf CIRCLED NUMBER FIFTY
U+ff71 b1 ef bd b1 ff71 HALFWIDTH KATAKANA LETTER SMALL A
U+20bb7   f0 a0 ae b7 d842 dfb7 𠮷 Surrogate Pair
U+9ad9 fb fc e9 ab 99 9ad9  
U+4f39 fa 6d e4 bc b9 4f39 EUC-JP:0x8FB0E3(3byte)
U+1F37A   f0 9f 8d ba d83c df7a 🍺 BEER MUG(emoji)