MENU

[PHP]单字符Unicode编码解码函数

April 16, 2019 • Read: 232 • 程序源码

PHP 自带函数里面似乎是没有能够对字符或字符串进行直接转换的函数,百度了一下,发现了一个封装函数能用。


<font color=red>2019年4月16号当晚23:33更新</font>
精简过后的函数内部还是会经过几次编码转换,但是我发现编码之后对特殊字符的转换有问题,索性再精简直接去掉了编码。

所以函数现在只支持UTF-8且只能单字符(传入字符串返回错值)

function char_unicode($str, $DECODE = True) {
    $result = '';
    if ($DECODE === False) {
        $unicodestr = intval(base_convert(bin2hex(iconv('utf-8', 'UCS-4', $str)), 16, 10));
        $result = $unicodestr;
    } else {
        $temp = intval($str);
        $result = iconv('UCS-2BE', 'utf-8', ($temp < 256) ? chr(0) . chr($temp) : chr($temp / 256) . chr($temp % 256));
    }
    return $result;
}

原地址:https://www.jb51.net/article/112503.htm

需要的是单字符编码,对此函数进行了一点精简和修改,精简后默认UTF-8是没有问题的,本人对编码的认知不深,所以对其他编码能否完美支持这里不做测试了。

测试效果:
编码效果.png

函数内容:

/**
 * $str 编码字符串
 * $DECODE 是否解码
 * $encoding 字符串的编码,默认utf-8
 */
function char_unicode($str, $DECODE = True, $encoding = 'utf-8') {
    $result = '';
    if ($DECODE !== True) {
        $str = iconv($encoding, "gb2312", $str);
        if (ord(substr($str, 0, 1)) < 0xA1) { //如果为英文则取1个字节
            $row = iconv("gb2312", $encoding, substr($str, 0, 1));
        } else {
            $row = iconv("gb2312", $encoding, substr($str, 0, 2));
        }
        //转换Unicode码
        $unicodestr = base_convert(bin2hex(iconv($encoding, 'UCS-4', $row)), 16, 10);
        $result = $unicodestr;
    } else {
        $temp = intval($str);
        $unistr = ($temp < 256) ? chr(0) . chr($temp) : chr($temp / 256) . chr($temp % 256);
        $result = iconv('UCS-2', $encoding, $unistr);
    }
    return $result;
}

测试代码:

<?php
header('Content-type:application/json;;charset=UTF-8');

/**
 * $str 编码字符串
 * $DECODE 是否解码
 * $encoding 字符串的编码,默认utf-8
 */
function char_unicode($str, $DECODE = True, $encoding = 'utf-8') {
    $result = '';
    if ($DECODE !== True) {
        $str = iconv($encoding, "gb2312", $str);
        if (ord(substr($str, 0, 1)) < 0xA1) { //如果为英文则取1个字节
            $row = iconv("gb2312", $encoding, substr($str, 0, 1));
        } else {
            $row = iconv("gb2312", $encoding, substr($str, 0, 2));
        }
        //转换Unicode码
        $unicodestr = base_convert(bin2hex(iconv($encoding, 'UCS-4', $row)), 16, 10);
        $result = $unicodestr;
    } else {
        $temp = intval($str);
        $unistr = ($temp < 256) ? chr(0) . chr($temp) : chr($temp / 256) . chr($temp % 256);
        $result = iconv('UCS-2BE', $encoding, $unistr);
    }
    return $result;
}

$str = "爱";
$int = char_unicode($str,False);
$unstr = char_unicode($int);
$str2 = char_unicode($unstr,False);
echo 'unicode编码前:'.$str .PHP_EOL;
echo 'unicode编码后:'.$unstr.PHP_EOL;
echo 'unicode解码后:'.$str2.PHP_EOL;
Last Modified: June 13, 2020