论文阅读笔记—Reasoning on Knowledge Graphs with Debate Dynamics(AAAI,2020)
这是一篇非常有趣的工作,看完会觉得眼前一亮。
论文标题:Reasoning on Knowledge Graphs with Debate Dynamics
发表于AAAI,2020
动机
很多机器学习的任务都是通过将节点与关系嵌入,并计算三元组置信度得分,然后最大化正例的得分得到嵌入向量,但究竟哪一部分对最终的得分起作用是难以解释的,本文模型有个三个模块,分别是两个agent和 judge,对于待查询三元组:q=(sq,pq,oq)q=\left(s_{q}, p_{q}, o_{q}\right)q=(sq,pq,oq),两个agent分别寻找证据链证明此三元组为True和False,并有Judge整合所有证据,得到最终结果。(听起来有点像GAN,但看下去会发现并不是)
Agent模块
定义状态States: 记et(i)e_t^{(i)}et(i)为第i个agent在t时刻查询的位置,则目前的状态可记为:St(i)=(et(i),q)∈S=E2×R×ES_{t}^{(i)}=\left(e_{t}^{(i)}, q\right) \in \mathcal{S} = \mathcal{E}^{2} \times \mathcal{R} \times \mathcal{E}St(i)=(et(i),q)∈S=E2×R×E
定义行动Actions:从状态St(i)=(et(i),q)S_{t}^{(i)}=\left(e_{t}^{(i)}, q\right)St(i)=(et(i),q)出发,所有可能到达的节点集合(即et(i)e_t^{(i)}et(i)的邻居集),记做ASt(i)\mathcal{A}_{S_{t}^{(i)}}ASt(i):
ASt(i)={(r,e)∈R×E:St(i)=(et(i),q)∧(et(i),r,e)∈KG}
\mathcal{A}_{S_{t}^{(i)}}=\left\{(r, e) \in \mathcal{R} \times \mathcal{E}: S_{t}^{(i)}=\left(e_{t}^{(i)}, q\right) \wedge\left(e_{t}^{(i)}, r, e\right) \in \mathcal{K} \mathcal{G}\right\}
ASt(i)={(r,e)∈R×E:St(i)=(et(i),q)∧(et(i),r,e)∈KG}
定义转移过程:若在状态St(i)=(et(i),q)S_{t}^{(i)}=\left(e_{t}^{(i)}, q\right)St(i)=(et(i),q)时选择行动At(i)=(r,et+1(i))A_{t}^{(i)}=\left(r, e_{t+1}^{(i)}\right)At(i)=(r,et+1(i)),则转移过程为:
δt(i)(St(i),At(i)):=(et+1(i),q)
\delta_{t}^{(i)}\left(S_{t}^{(i)}, A_{t}^{(i)}\right):=\left(e_{t+1}^{(i)}, q\right)
δt(i)(St(i),At(i)):=(et+1(i),q)
将采取过的行动合并在一起得到历史路径:Ht(i)=(Ht−1(i),At−1(i))H_{t}^{(i)}=\left(H_{t-1}^{(i)}, A_{t-1}^{(i)}\right)Ht(i)=(Ht−1(i),At−1(i)),其中H0(i)=(sq,pq,oq)H_{0}^{(i)}=\left(s_{q}, p_{q}, o_{q}\right)H0(i)=(sq,pq,oq)
用LSTM网络对上一步的信息进行编码:ht(i)=LSTM(i)([at−1(i),q(i)])\boldsymbol{h}_{t}^{(i)}=\mathrm{LSTM}^{(i)}\left(\left[\boldsymbol{a}_{t-1}^{(i)}, \boldsymbol{q}^{(i)}\right]\right)ht(i)=LSTM(i)([at−1(i),q(i)])
其中at−1(i)=[rt−1(i),et(i)]∈R2d\boldsymbol{a}_{t-1}^{(i)}=\left[\boldsymbol{r}_{t-1}^{(i)}, \boldsymbol{e}_{t}^{(i)}\right] \in \mathbb{R}^{2 d}at−1(i)=[rt−1(i),et(i)]∈R2d,q(i)=[es(i),rp(i),eo(i)]∈R3d\boldsymbol{q}^{(i)}=\left[\boldsymbol{e}_{s}^{(i)}, \boldsymbol{r}_{p}^{(i)}, \boldsymbol{e}_{o}^{(i)}\right] \in \mathbb{R}^{3 d}q(i)=[es(i),rp(i),eo(i)]∈R3d,这里的LSTM的输入应该是5个长度为d的向量。值得一提的是,两个agent和法官所使用的嵌入向量是不同的,也就是说每个节点与边分别有三个嵌入向量。
根据上一步编码的信息和这一步待选的行动空间计算每个行动的分数作为新行动的选择策略:
dt(i)=softmax(At(i)(W2(i)ReLU(W1(i)ht(i))))\boldsymbol{d}_{t}^{(i)}=\operatorname{softmax}\left(\boldsymbol{A}_{t}^{(i)}\left(\boldsymbol{W}_{2}^{(i)} \operatorname{ReLU}\left(\boldsymbol{W}_{1}^{(i)} \boldsymbol{h}_{t}^{(i)}\right)\right)\right)dt(i)=softmax(At(i)(W2(i)ReLU(W1(i)ht(i))))
这里策略dt(i)\boldsymbol{d}_{t}^{(i)}dt(i)的第k个分量表示选择行动空间中第k个行动的概率,根据这一概率采样选择下一个行动,这一策略是马尔科夫决策过程,因为计算中仅考虑了t-1步的策略与t步的行动空间,与之前的信息无关,然后基于此策略选择下一步的行动:At(i)∼ Categorical (dt(i))
A_{t}^{(i)} \sim \text { Categorical }\left(d_{t}^{(i)}\right)
At(i)∼ Categorical (dt(i))
每个agent采样得到N个证据链,限制每个证据链的长度为T,则第i个agent第n次采样得到的证据链为:
τn(i):=(An~(i,T)+1,An~(i,T)+2,…,An~(i,T)+T)
\tau_{n}^{(i)}:=\left(A_{\tilde{n}(i, T)+1}, A_{\tilde{n}(i, T)+2}, \ldots, A_{\tilde{n}(i, T)+T}\right) τn(i):=(An~(i,T)+1,An~(i,T)+2,…,An~(i,T)+T)
其中下标定义为:
n~(i,T):=(2(n−1)+i−1)T\tilde{n}(i, T):=(2(n-1)+i-1) T n~(i,T):=(2(n−1)+i−1)T
所有结果可汇总为:
τ:=(τ1(1),τ1(2),τ2(1),τ2(2),…,τN(1),τN(2))
\tau:=\left(\tau_{1}^{(1)}, \tau_{1}^{(2)}, \tau_{2}^{(1)}, \tau_{2}^{(2)}, \ldots, \tau_{N}^{(1)}, \tau_{N}^{(2)}\right)
τ:=(τ1(1),τ1(2),τ2(1),τ2(2),…,τN(1),τN(2))
Judge
Judge实际上是一个二分类器,将两个agent得到的证据链汇总得到最终的置信概率:
yn(i)=f([τn(i),qJ])
\boldsymbol{y}_{n}^{(i)}=f\left(\left[\boldsymbol{\tau}_{n}^{(i)}, \boldsymbol{q}^{J}\right]\right)
yn(i)=f([τn(i),qJ])
其中qJ\boldsymbol{q^J}qJ表示Judge中查询q的嵌入向量:qJ=[rpJ,eoJ]∈R2d\boldsymbol{q}^{J}=\left[\boldsymbol{r}_{p}^{J}, \boldsymbol{e}_{o}^{J}\right] \in \mathbb{R}^{2 d}qJ=[rpJ,eoJ]∈R2d。
预测最终分数:
tτ=σ(w⊤ReLU(W∑i=12∑n=1Nyn(i)))
t_{\tau}=\sigma\left(\boldsymbol{w}^{\top} \operatorname{ReLU}\left(\boldsymbol{W} \sum_{i=1}^{2} \sum_{n=1}^{N} \boldsymbol{y}_{n}^{(i)}\right)\right)
tτ=σ(w⊤ReLU(Wi=1∑2n=1∑Nyn(i)))
则Judge部分目标函数:
Lq=ϕ(q)logtτ+(1−ϕ(q))(1−logtτ)
\mathcal{L}_{q}=\phi(q) \log t_{\tau}+(1-\phi(q))\left(1-\log t_{\tau}\right)
Lq=ϕ(q)logtτ+(1−ϕ(q))(1−logtτ)
Reward
为了体现两个agent工作的不同,分别计算每个agent得到的证据的得分为:
tn(i)=w⊤ReLU(Wf([τn(i),qJ]))
t_{n}^{(i)}=\boldsymbol{w}^{\top} \operatorname{ReL} \mathrm{U}\left(\boldsymbol{W} f\left(\left[\boldsymbol{\tau}_{n}^{(i)}, \boldsymbol{q}^{J}\right]\right)\right)
tn(i)=w⊤ReLU(Wf([τn(i),qJ]))
定义奖赏函数:
Rn(i)={tn(i) if i=1−tn(i) if i=2
R_{n}^{(i)}=\left\{\begin{array}{ll}
{t_{n}^{(i)}} & {\text { if } i=1}\\
{-t_{n}^{(i)}} & {\text { if } i=2}
\end{array}\right.
Rn(i)={tn(i)−tn(i) if i=1 if i=2
agent的累积奖赏为:
G(i):=∑n=1NRn(i)G^{(i)}:=\sum_{n=1}^{N} R_{n}^{(i)}G(i):=n=1∑NRn(i)
用强化学习的思想最大化累积奖赏的期望对agent进行训练:argmaxθ(i)Eq∼KG+Eτ1(i),τ2(i),…,τN(i)∼πθ(i)[G(i)∣q]\underset{\theta(i)}{\arg \max } \mathbb{E}_{q \sim \mathcal{K} \mathcal{G}_{+}} \mathbb{E}_{\tau_{1}^{(i)}, \tau_{2}^{(i)}, \ldots, \tau_{N}^{(i)} \sim \pi_{\theta^{(i)}}}\left[G^{(i)} | q\right]θ(i)argmaxEq∼KG+Eτ1(i),τ2(i),…,τN(i)∼πθ(i)[G(i)∣q]
整个模型的训练采用交替训练的方式,即每一次仅训练agent或judge,将另一个模块中的所有参数冻结。
欢迎讨论~
- 点赞 1
- 收藏
- 分享
- 文章举报
- 论文阅读笔记-CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases
- 论文阅读笔记之How to Keep a Knowledge Base Synchronized with Its Encyclopedia Source
- 【论文阅读笔记】CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
- 论文笔记之:Semi-supervised Classification with Graph Convolutional Networks
- 《Multi-Label Zero-Shot Learning With Structured Knowledge Graphs》论文笔记
- 《System Service Call-oriented Symbolic Execution of Android Framework with Applications to...》论文阅读笔记
- 【论文阅读笔记】——《Incorporating Context-Relevant Knowledge into CNN for Short Text Classification》
- 1701.Re-ranking Person Re-identification with k-reciprocal Encoding--PRW 论文阅读笔记
- 论文笔记:Multi-Label Zero-Shot Learning with Structured Knowledge Graphs
- 论文笔记:Person Re-identification with Deep Similarity-Guided Graph Neural Network
- 图割论文阅读笔记:“GrabCut” — Interactive Foreground Extraction using Iterated Graph Cuts
- TAO: Facebook's Distributed Data Store for the Social Graph论文阅读笔记
- Person Re-identification with Deep Similarity-Guided Graph Neural Network 论文笔记
- Benchmarking Denoising Algorithms with Real Photographs论文阅读笔记
- 『论文阅读』Generating Videos With Scene Dynamics
- 论文阅读笔记:Object Detection Networks on Convolutional Feature Maps
- 论文阅读:A Walk-based Model on Entity Graphs for Relation Extraction
- 论文笔记之Label-Free Supervision of Neural Networks with Physics and Domain Knowledge
- 论文阅读笔记《PatchMatch Stereo - Stereo Matching with Slanted Support Windows》
- 论文阅读笔记:Cross-Domain Sentiment Classification with Target Domain Specific Information