本文章源地址:http://www.itican.net/lmy0083/?cat=3
最近搞一个WAP项目,发现有些手机用POST方式提交中文字符的编码有些奇怪,我的环境是GB18030,支持GBK/GB2312。我开发测试Opera7.60。开发语言JSP/JAVA。
据我所知,JAVA默认传输的字符集是8859_1(单字节字符集),手机上大部分是UTF-8,少部分是GB2312;
关于这些字符集请参考http://www.itican.net/lmy0083/?page_id=88
我主要拿”人”这个汉字测试,
人
GBK/GB2312(Hex) C8 CB
UTF-8(Hex) E4 BA BA
Unicode 人;
我开发测试以Opera7.60 这个浏览器,比较方便,当然,开发都是按这个浏览器支持的开发的,比如我把opera的字符集设为UTF-8,所有POST的中文字符就会以8859_1 默认的字符集发送(这是单字节字符集,不可能包含中文字符的,中文字符都是双字节,UTF-8为3字节)但是奇怪的是,接收必须
String sPost = new String(request.getParameter(”input_name”).getBytes(”8859_1″),”UTF-8″);
否则不能满足我应用,我不能直接拿
request.getParameter(”input_name”)
来用,当然也不能写成
String sPost = new String(request.getParameter(”input_name”).getBytes(”8859_1″),”其他字符集”);
这样开发,一些手机没问题,可以支持,比如Nokia的6681,但是实际测试发现,有些手机比如N800,Nokia6670,输入中文以这样的方式接收,将会接收到乱码;
于是我拿一些浏览器/模拟器/手机做测试,来看我到底接收到的是那种编码;
测试程序(接收部分,发送部分任意写个input,name为”input_name”):
- String sPost= request.getParameter("input_name");
- if(sPost== null || sPost.length()==0){sPost= "0";}
-
- String GB2312_TO_UTF_8 = new String(sPost.getBytes("GB2312"),"UTF-8");
- String GB2312_TO_GBK = new String(sPost.getBytes("GB2312"),"GBK");
- String GB2312_TO_8859_1 = new String(sPost.getBytes("GB2312"),"8859_1");
-
- String GBK_TO_UTF_8 = new String(sPost.getBytes("GBK"),"UTF-8");
- String GBK_TO_GB2312 = new String(sPost.getBytes("GBK"),"GB2312");
- String GBK_TO_8859_1 = new String(sPost.getBytes("GBK"),"8859_1");
-
- String ISO_8859_1_TO_UTF_8 = new String(sPost.getBytes("8859_1"),"UTF-8");
- String ISO_8859_1_TO_GB2312 = new String(sPost.getBytes("8859_1"),"GB2312");
- String ISO_8859_1_TO_GBK = new String(sPost.getBytes("8859_1"),"GBK");
-
- String UTF_8_TO_8859_1 = new String(sPost.getBytes("UTF-8"),"8859_1");
- String UTF_8_TO_GB2312 = new String(sPost.getBytes("UTF-8"),"GB2312");
- String UTF_8_TO_GBK = new String(sPost.getBytes("UTF-8"),"GBK");
-
-
- System.out.println("<WAP TEST>*********************************************************");
- System.out.println("<WAP TEST>ren");
- System.out.println("<WAP TEST> UTF_8 |" + "E4BABA|" + java.net.URLEncoder.encode(sPost,"UTF-8"));
- System.out.println("<WAP TEST> GBK |" + "C8CB |" + java.net.URLEncoder.encode(sPost,"GBK"));
- System.out.println("<WAP TEST> GB2312 |" + "C8CB |" + java.net.URLEncoder.encode(sPost,"GB2312"));
- System.out.println("<WAP TEST> 8859_1 |" + " |" + java.net.URLEncoder.encode(sPost,"8859_1"));
-
- System.out.println("<WAP TEST>GB2312 TO UTF_8 |" + "E4BABA|" + java.net.URLEncoder.encode(GB2312_TO_UTF_8,"UTF-8"));
- System.out.println("<WAP TEST>GB2312 TO GBK |" + "C8CB |" + java.net.URLEncoder.encode(GB2312_TO_GBK,"GBK"));
- System.out.println("<WAP TEST>GB2312 TO 8859_1 |" + "C8CB |" + java.net.URLEncoder.encode(GB2312_TO_8859_1,"8859_1"));
-
- System.out.println("<WAP TEST>GBK TO UTF_8 |" + "E4BABA|" + java.net.URLEncoder.encode(GBK_TO_UTF_8,"UTF-8"));
- System.out.println("<WAP TEST>GBK TO GB2312 |" + "C8CB |" + java.net.URLEncoder.encode(GBK_TO_GB2312,"GB2312"));
- System.out.println("<WAP TEST>GBK TO 8859_1 |" + "C8CB |" + java.net.URLEncoder.encode(GBK_TO_8859_1,"8859_1"));
-
- System.out.println("<WAP TEST>8859_1 TO UTF_8 |" + "E4BABA|" + java.net.URLEncoder.encode(ISO_8859_1_TO_UTF_8,"UTF-8"));
- System.out.println("<WAP TEST>8859_1 TO GB2312 |" + "C8CB |" + java.net.URLEncoder.encode(ISO_8859_1_TO_GB2312,"GB2312"));
- System.out.println("<WAP TEST>8859_1 TO GBK |" + "C8CB |" + java.net.URLEncoder.encode(ISO_8859_1_TO_GBK,"GBK"));
-
- System.out.println("<WAP TEST>UTF_8 TO 8859_1 |" + "E4BABA|" + java.net.URLEncoder.encode(UTF_8_TO_8859_1,"8859_1"));
- System.out.println("<WAP TEST>UTF_8 TO GB2312 |" + "C8CB |" + java.net.URLEncoder.encode(UTF_8_TO_GB2312,"GB2312"));
- System.out.println("<WAP TEST>UTF_8 TO GBK |" + "C8CB |" + java.net.URLEncoder.encode(UTF_8_TO_GBK,"GBK"));
测试结果如下:
N800 Openwave V6.1
- <WAP TEST>*********************************************************
- <WAP TEST>ren
- <WAP TEST> UTF_8 |E4BABA|%E4%BA%BA
- <WAP TEST> GBK |C8CB |%C8%CB
- <WAP TEST> GB2312 |C8CB |%C8%CB
- <WAP TEST> 8859_1 | |%3F
- <WAP TEST>GB2312 TO UTF_8 |E4BABA|%EF%BF%BD%EF%BF%BD
- <WAP TEST>GB2312 TO GBK |C8CB |%C8%CB
- <WAP TEST>GB2312 TO 8859_1 |C8CB |%C8%CB
- <WAP TEST>GBK TO UTF_8 |E4BABA|%EF%BF%BD%EF%BF%BD
- <WAP TEST>GBK TO GB2312 |C8CB |%C8%CB
- <WAP TEST>GBK TO 8859_1 |C8CB |%C8%CB
- <WAP TEST>8859_1 TO UTF_8 |E4BABA|%3F
- <WAP TEST>8859_1 TO GB2312 |C8CB |%3F
- <WAP TEST>8859_1 TO GBK |C8CB |%3F
- <WAP TEST>UTF_8 TO 8859_1 |E4BABA|%E4%BA%BA
- <WAP TEST>UTF_8 TO GB2312 |C8CB |%E4%BA%3F
- <WAP TEST>UTF_8 TO GBK |C8CB |%E4%BA%3F
Opera UTF-8
- <WAP TEST>*********************************************************
- <WAP TEST>ren
- <WAP TEST> UTF_8 |E4BABA|%C3%A4%C2%BA%C2%BA
- <WAP TEST> GBK |C8CB |%3F%3F%3F
- <WAP TEST> GB2312 |C8CB |%3F%3F%3F
- <WAP TEST> 8859_1 | |%E4%BA%BA
- <WAP TEST>GB2312 TO UTF_8 |E4BABA|%3F%3F%3F
- <WAP TEST>GB2312 TO GBK |C8CB |%3F%3F%3F
- <WAP TEST>GB2312 TO 8859_1 |C8CB |%3F%3F%3F
- <WAP TEST>GBK TO UTF_8 |E4BABA|%3F%3F%3F
- <WAP TEST>GBK TO GB2312 |C8CB |%3F%3F%3F
- <WAP TEST>GBK TO 8859_1 |C8CB |%3F%3F%3F
- <WAP TEST>8859_1 TO UTF_8 |E4BABA|%E4%BA%BA
- <WAP TEST>8859_1 TO GB2312 |C8CB |%E4%BA%3F
- <WAP TEST>8859_1 TO GBK |C8CB |%E4%BA%3F
- <WAP TEST>UTF_8 TO 8859_1 |E4BABA|%C3%A4%C2%BA%C2%BA
- <WAP TEST>UTF_8 TO GB2312 |C8CB |%C3%A4%C2%BA%C2%BA
- <WAP TEST>UTF_8 TO GBK |C8CB |%C3%A4%C2%BA%C2%BA