您的位置:首页 > 编程语言 > PHP开发

php+mysql存储emoji表情

2015-12-05 17:25 615 查看
背景介绍:

    使用php+mysql存储app的用户数据,在存储emoji表情时发现存储遇到“????”乱码;

当前mysql版本为5.3;

踩坑过程:

1、首先是理解emoji表情:emoji表情详情请百科,关于emoji的几种编码方案,参考http://code.iamcal.com/php/emoji/,并且使用了github上的emoji-php也没能解决;

2、网上查到utf8_general_ci最大支持3字节,而emoji是4个字节存储,所以先将mysql表的字段和表从utf8改成utf8mb4,还是显示“????”乱码;

3、后来查到mysql字符集设置的问题,参考http://www.cnblogs.com/discuss/articles/1862248.html,将mysql character_set_client(客户端来源数据子使用的字符集) character_set_results(查询结果字符集) character_set_connection(连接层字符集)从utf8改成utf8mb4 ,还是显示“????”乱码;

4、因为app使用的mysql接口是wordpress的接口,发现wordpress网页端存储和显示emoji表情是正常的,因此调研了wordpress存储emoji表情的方法,发现mysql仍然使用了utf8编码方式,然后存储emoji表情时进行了编码;

5、接下来就是考虑emoji表情的编码方案,源码如下,参考https://zh.wpseek.com/function/wp_encode_emoji/:

<pre class="php hljs " style="box-sizing: border-box; overflow: auto; font-family: Menlo, Monaco, Consolas, 'Courier New', monospace; font-size: 13px; padding: 10px; margin-top: 0px; margin-bottom: 0px; line-height: 1.42857; word-break: break-all; word-wrap: break-word; color: rgb(34, 34, 34); border: 0px; border-radius: 4px; outline: 0px; vertical-align: baseline; clear: left; background: rgb(246, 246, 246);"><span class="hljs-function" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; background: transparent;"><span class="hljs-keyword" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(133, 153, 0); background: transparent;">function</span> <span class="hljs-title" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(38, 139, 210); background: transparent;">wp_encode_emoji</span><span class="hljs-params" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; background: transparent;">( <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$content</span> )</span> {</span>
<span class="hljs-keyword" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(133, 153, 0); background: transparent;">if</span> ( function_exists( <span class="hljs-string" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(42, 161, 152); background: transparent;">'mb_convert_encoding'</span> ) ) {
<span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$regex</span> = <span class="hljs-string" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(42, 161, 152); background: transparent;">'/(
\x23\xE2\x83\xA3               # Digits
[\x30-\x39]\xE2\x83\xA3
| \xF0\x9F[\x85-\x88][\xA6-\xBF] # Enclosed characters
| \xF0\x9F[\x8C-\x97][\x80-\xBF] # Misc
| \xF0\x9F\x98[\x80-\xBF]        # Smilies
| \xF0\x9F\x99[\x80-\x8F]
| \xF0\x9F\x9A[\x80-\xBF]        # Transport and map symbols
)/x'</span>;

<span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$matches</span> = <span class="hljs-keyword" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(133, 153, 0); background: transparent;">array</span>();
<span class="hljs-keyword" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(133, 153, 0); background: transparent;">if</span> ( preg_match_all( <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$regex</span>, <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$content</span>, <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$matches</span> ) ) {
<span class="hljs-keyword" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(133, 153, 0); background: transparent;">if</span> ( ! <span class="hljs-keyword" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(133, 153, 0); background: transparent;">empty</span>( <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$matches</span>[<span class="hljs-number" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(42, 161, 152); background: transparent;">1</span>] ) ) {
<span class="hljs-keyword" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(133, 153, 0); background: transparent;">foreach</
4000
span> ( <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$matches</span>[<span class="hljs-number" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(42, 161, 152); background: transparent;">1</span>] <span class="hljs-keyword" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(133, 153, 0); background: transparent;">as</span> <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$emoji</span> ) {
<span class="hljs-comment" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(147, 161, 161); background: transparent;">/*
* UTF-32's hex encoding is the same as HTML's hex encoding.
* So, by converting the emoji from UTF-8 to UTF-32, we magically
* get the correct hex encoding.
*/</span>
<span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$unpacked</span> = unpack( <span class="hljs-string" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(42, 161, 152); background: transparent;">'H*'</span>, mb_convert_encoding( <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$emoji</span>, <span class="hljs-string" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(42, 161, 152); background: transparent;">'UTF-32'</span>, <span class="hljs-string" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(42, 161, 152); background: transparent;">'UTF-8'</span> ) );
<span class="hljs-keyword" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(133, 153, 0); background: transparent;">if</span> ( <span class="hljs-keyword" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(133, 153, 0); background: transparent;">isset</span>( <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$unpacked</span>[<span class="hljs-number" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(42, 161, 152); background: transparent;">1</span>] ) ) {
<span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$entity</span> = <span class="hljs-string" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(42, 161, 152); background: transparent;">'&#x'</span> . ltrim( <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$unpacked</span>[<span class="hljs-number" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(42, 161, 152); background: transparent;">1</span>], <span class="hljs-string" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(42, 161, 152); background: transparent;">'0'</span> ) . <span class="hljs-string" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(42, 161, 152); background: transparent;">';'</span>;
<span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$content</span> = str_replace( <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$emoji</span>, <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$entity</span>, <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$content</span> );
}
}
}
}
}

<span class="hljs-keyword" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(133, 153, 0); background: transparent;">return</span> <span class="hljs-variable" style="box-sizing: border-box; margin: 0px; padding: 0px; border: 0px; outline: 0px; vertical-align: baseline; color: rgb(181, 137, 0); background: transparent;">$content</span>;
}




wordpress在读取时没有进行emoji表情的解码,这涉及到两种编码方式;

6、关于5提到的两种编码方式:NCR与HTML Entities,参考http://www.laruence.com/2010/02/25/1324.html

Character Entities References(HTML Entities)和Numeric Character Reference(NCR)都是让你可以在页面中显示特殊的字符的编码方式, 俩者之间有什么异同呢?

一个Numeric Character Reference编码是由一个与号(&)跟着一个井号(#), 然后跟着这个字符的Unicode编码值(注意不是utf8的编码值),
最后跟着一个分号组成的, 如:
&#nnnn;
或者
&#xhhhh;

其中, nnnn是字符编码的十进制表示, 而hhhh是字符的16进制表示.

另外要注意的是x在xml中必须是小写的.而hhhh可以大小写混用, 另外nnnn和hhhh也可以有前导零.

与NCR不同, HTML Entites是使用一个与号(&),跟着这个字符的名字, 然后以分号(;)结尾来表示一个字符, 这个字符的名字必须是在HTML中已经定义的,比如:
& //&
© //©

5里边的代码是先根据unicode范围匹配出emoji表情,mb_convert_encoding 函数把utf8转换成utf32(utf32和unicode基本一一对应,UTF-32编码以32位无符号整数为单位。Unicode的UTF-32编码就是其对应的32位无符号整数。),然后编码成ncr存储,而ncr使html文件本身能够看到unicode编码,所以从mysql读取时不用将ncr解码,html直接去显示emoji表情

但是安卓端必须将ncr解码成emoji表情,因此参照wordpress的wp_encode_emoji函数,写了个wp_decode_emoji函数,如下

function wp_decode_emoji($content) {
<span style="white-space:pre">	</span>// Loosely match the Emoji Unicode range. from wp-includes/formatting.php
<span style="white-space:pre">	</span>$regex = '/(&#x[2-3][0-9a-f]{3};|[1-6][0-9a-f]{2};)/';

<span style="white-space:pre">	</span>$matches = array();
<span style="white-space:pre">	</span>if (preg_match_all($regex, $content, $matches)) {
<span style="white-space:pre">		</span>if (!empty($matches[1])){
<span style="white-space:pre">			</span>foreach ($matches[1] as $emoji) {
<span style="white-space:pre">				</span>//$entity = mb_decode_numericentity($emoji, array(0x0000, 0xFFFF, 0, 0xFFFF), 'UTF-8');
<span style="white-space:pre">				</span>$entity = mb_convert_encoding($emoji , "utf-8", 'HTML-ENTITIES');  
<span style="white-space:pre">				</span>$content = str_replace($emoji, $entity, $content );
<span style="white-space:pre">			</span>}
<span style="white-space:pre">		</span>}
<span style="white-space:pre">	</span>}
<span style="white-space:pre">	</span>
<span style="white-space:pre">	</span>return $content;
}
这样就通过对emoji表情的编解码处理,使安卓端能够支持emoji表情了;

7、目前mysql版本大于5.5.3即可通过设置utf8mb4的编码方式,支持emoji表情了,无须存储时编码处理;

PS:

此文是解决问题后一两个月才发文,难免对当时环境记忆不清晰,请谅解!

参考资料:
http://code.iamcal.com/php/emoji/ http://www.cnblogs.com/discuss/articles/1862248.html https://zh.wpseek.com/function/wp_encode_emoji/ http://www.laruence.com/2010/02/25/1324.html
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  mysql emoji ncr