您的位置:首页 > Web前端 > JavaScript

java解析JavaScript片段

2016-05-19 18:53 519 查看
前几天帮某个人抓取某电商网站商品属性的,得到页面后需要解析一个<script>内的代码获得其中一个json对象的属性, 开始是想字符串截取呢,后来感觉不怎么好,就换成用java解析script了,感觉还行,其中有几个坑,在这里记录下,

对于一段js代码,java在解析时,需要补齐其中的空间变量,比如 var window={};
还有js中调用的函数,如果不关心的话,也需要按调用方式预先定义好,比如consloe.log();否则js片段缺少变量无法继续
对于你需要获取的数据,比如json串,最好还是写个js函数附在js片段后面来执行获取;
对于简单的工具函数也最好能抄过来定义好
每个变量,函数都需要以分号结尾,java这样的强类型遇到js这样的弱类型,错误提示简直不能看啊.(此处耗时1小时),因为 var a=function(){}; 这里最后缺少一个分号,提示各种错乱.

我需要解析的js如下:

[code=language-javascript]window.markPageStylesReady("e58dfbdc-9154-4482-aeb3-d1d32aa05195");
if (!window.pageParams) {
window.pageParams = {};
}
window.pageParams["e58dfbdc-9154-4482-aeb3-d1d32aa05195"] = {
"browser_type": "unknown",
"contest": {
"product_rating": {
"rating": 5.0,
},
"id": "5731b2460efe256073e4631e",
"remarket_tag": {
"name": "Hats"
},
"commerce_product_info": {
"fbw_pending": 0,
"is_fulfill_by_wish": false,
"variations": [{
"original_price": 1,
"shipping_price_country_code": "US",
"color": "Multicolor",
"variation_id": "573307b481e30f5ec28e3801",
"type": ["1"],
"size": [{
"1": "1"
}],
"shipping": 1
}, {
"original_price": 1,
"color": "Multicolor",
"type": ["2"],
"size": [{
"2": "2"
}],
"size_ordering": 0,
"variation_id": "123",
"shipping": 1
}],
"is_active_fbw_in_us": false,
},
"contest_selected_picture": "0",
"num_bought": 0,
"is_expired": false,
"num_won": 0,
"external_url": "http:\/\/www.xxoo.com\/c\/5731b2460efe256073e4631e"
},
"force_login_required": false
};
// This parameter will get consumed by the page's script and used
// to initialize the page correctly
window.nextInitializePageId = 'e58dfbdc-9154-4482-aeb3-d1d32aa05195';


需要获取
window.pageParams['e58dfbdc-9154-4482-aeb3-d1d32aa05195'].contest.commerce_product_info.variations
这个json串,
为此找了个json2str函数,稍微写了点垃圾代码

然后用javax.script.ScriptEngine就搞定了:

代码如下:

[code=language-java]///file是就整个html文件
Document doc = Jsoup.parse(file, "UTF-8");
Elements ee = doc.getElementsByTag("script");
String text = "";
for ( Element e : ee ) {
String t = e.html();
if ( t.indexOf("\"5731b2460efe256073e4631e\"") > 0 && t.indexOf("\"product_id\"") > 0 ) {
text = t;
}
}
if ( "".equals(text) ) return null;
//		System.out.println(text);
String strbef = ";var t=''; var window={ markPageStylesReady: function(a){t=a}    }    ; ";
String straft = ";function getv(){return window.pageParams[t].contest.commerce_product_info.variations ;}";
String json2str = "function json2str(o) { "
+"    var arr = []; "
+"    var fmt = function(s) { "
+"         if( typeof s == \"string\") {return \"\\\"\" + s + \"\\\"\" ;}"
+"         if( typeof s == \"number\") {return \"\\\"\" + s + \"\\\"\";} "
+"        if (typeof s == \"object\" ) { if( s != null) { return json2str(s);} } "
+"       return s ;"
+"    }; "
+"    for (var i in o) {arr.push(\"\\\"\" + i + \"\\\":\" + fmt(o[i]));} "
+"    return \"{\" + arr.join(\",\") + \"}\"; "
+"}; "
+ " var o=getv(); "
+ " function aa(){ return json2str(o);}; "  ;

/*所有变量和函数都要有分号结尾..*/

javax.script.ScriptEngineManager sem = new javax.script.ScriptEngineManager();
javax.script.ScriptEngine engine = sem.getEngineByExtension("js");
engine.eval(strbef + text + straft+json2str);
Invocable inv = (Invocable) engine;
Object obj = inv.invokeFunction("aa");
JSONObject jsonobj = JSON.parseObject(obj.toString());
System.out.println(jsonobj.getJSONObject("0").get("max_shipping_time"));

如期得到结果,good job!
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  java script json