はしくれエンジニアもどきのメモ

情報系技術・哲学・デザインなどの勉強メモ・備忘録です。

t分布のパラメータをEMアルゴリズムで推定する

t分布のパラメータをEMアルゴリズムで推定する

t分布のパラメータをEMアルゴリズムで推定するメモ.

計算する値が多いのでまとめておく. 平均$\vec{\mu}$, 分散 \mathbf{\Sigma} は解析的に求まる. ただし,自由度$\nu$は求まらないのでグリッドサーチ,ランダムサーチ,他の反復法などが必要になる.

  • Estep:

    • 各データに対して,事後分布になるガンマ分布$q_{i}(h_{i})$を求める.
  • Mstep:

    • 平均パラメータ$\vec{\mu}$と共分散行列パラメータ  \mathbf{\Sigma} の推定

      • 事後分布の期待値$E[ q_{i}(h_{i}) ]$を使う.
    • 自由度$\nu$の推定

      • 条件付期待値$\sum_{i=1}^{I} \int \hat{q}_{i}(h_{i}) \log{\left[ Pr(\vec{x}_{i}, h_{i}|\mathbf{\theta}) \right] } d h_{i}$を計算(グリッドサーチなどで自由度を変えながらこの期待値が大きくなっているか確認)

参考:

混合分布とt分布

潜在変数と混合分布の考え方を使うとt分布を導出できる.

混合正規分(MoG)では,潜在変数がcategorical分布(試行回数1の多項分布)に従うとし, 重みをcategorical分布の確率とした混合分布であった.

関連リンク:

cartman0.hatenablog.com

この考え方を拡張して,

  • 平均(位置)が同じ,
  • 分散がガンマ分布に従うスケールパラメータ$h$でスケールされた \sigma^{2}/h

正規分布をhの全範囲(無限)についてガンマ分布の確率で重みづけした混合分布がt分布になる.

式に直すと以下になる.


\begin{eqnarray}
Pr(h) = \text{Gam}_{h}\left[ \frac{\nu}{2}, \frac{\nu}{2} \right] \\\\
Pr(\vec{x} | h) = \text{Norm}_{\vec{x}}\left[ \vec{\mu}, \frac{\Sigma}{h} \right] \\\\
\end{eqnarray}

\begin{eqnarray}
Pr(\vec{x} )
= \int_{0}^{\infty} Pr(\vec{x}, h) dh
&=& \int_{0}^{\infty} Pr(\vec{x} | h)Pr(h) dh \\\\
&=& \int_{0}^{\infty} \text{Norm}_{\vec{x}}\left[ \vec{\mu}, \frac{\Sigma}{h} \right] \cdot \text{Gam}_{h}\left[ \frac{\nu}{2}, \frac{\nu}{2} \right] dh \\\\
&=& \text{Stud}_{\vec{x}} \left[\vec{\mu}, \mathbf{\Sigma}, \nu \right]
\end{eqnarray}

この積分(周辺化)を計算するとt分布のpdfと同じになる.

$\nu$がt分布の自由度になっている.

関連リンク:

cartman0.hatenablog.com

t分布におけるEMアルゴリズム

EMアルゴリズムの概要

通常のMLでは,データに対して対数尤度を最大化するときのパラメータを求める.


\hat{\vec{\theta}} = \mathop{ \text{argmax}}_{\vec{\theta}} \left[ \sum_{i=1}^{I} \log{ \left[ Pr(\vec{x}_{i} | \vec{\theta}) \right] } \right]

$Pr(x)$ を容易に計算できない場合,EMアルゴリズムでは潜在変数を導入して同時分布を周辺化して$Pr(x)$を求め尤度の最大化を図る.


\hat{\vec{\theta}}
= \mathop{\text{argmax}}_{\vec{\theta}} \left[ \sum_{i=1}^{I} \log{ \left[ \int Pr(\vec{x}_{i}, \vec{h}_{i} | \vec{\theta}) d\vec{h}_{i} \right] } \right]

実際には上式でなく,次の下界を最大化する.


\begin{eqnarray}
\cal{B}\left[ \left\{ q_{i}(\vec{h}_{i}) \right\}, \vec{\theta} \right]
&=& \sum_{i=1}^{I} \int q_{i}(\vec{h}_{i}) \log{ \left[ \frac{Pr(\vec{x}_{i}, \vec{h}_{i} | \vec{\theta})}{q_{i}(\vec{h}_{i})} \right] }d\vec{h}_{i} \\\\
&\leq& \sum_{i=1}^{I} \log{ \left[ \int q_{i}(\vec{h_{i}}) \frac{Pr(\vec{x}_{i}, \vec{h}_{i} | \vec{\theta})}{ q_{i}(\vec{h_{i}})  }  d\vec{h}_{i} \right] }
= \sum_{i=1}^{I} \log{ \left[ \int Pr(\vec{x}_{i}, \vec{h}_{i} | \vec{\theta}) d\vec{h}_{i} \right] }
\end{eqnarray}

GOAL: t分布のパラメータ  \mathbf{\theta} = \left{ \vec{\mu}, \mathbf{\Sigma}, \nu \right} をデータ$\mathbf{x}_{1 \cdots I}$から推定する.

下界最大化にはE-stepとM-stepを使って反復させる.

  • E-step: 前回のM-stepで推定したパラメータ$\mathbf{\theta}$で固定して,$q(h_{i})$を変えて下界を最大化する.(=潜在変数$h$の事後分布$q(h_{i})$を求める.)

q_{i}(h_{i}) = Pr(h_{i} | \vec{x}_{i}, \mathbf{\theta}^{[ t ] })
= \frac{Pr(\vec{x}_{i} | h_{i}, \mathbf{\theta}^{[ t ] } ) Pr(h_{i})}{ Pr(\vec{x}_{i} | \mathbf{\theta}^{ [ t ] })}
  • M-step: E-stepで求めた事後分布$q_{i}(h_{i})$で固定して, データと潜在変数の同時分布密度の条件付期待値(下界)を最大化するときのパラメータ$\mathbf{\theta}$を求める.

\hat{\mathbf{\theta}}^{[ t+1 ] }
= \mathop{argmax}_{\mathbf{\theta}} \sum_{ i=1 }^{ I } \int \hat{ q }_{ i }( h_{ i } ) \log{\left[ Pr(\vec{x}_{i}, h_{i} | \mathbf{\theta}) \right] } d h_{i}

<sodipodi:namedview id="base" pagecolor="#ffffff" bordercolor="#666666" borderopacity="1.0" gridtolerance="10000" guidetolerance="10" objecttolerance="10" inkscape:pageopacity="0.0" inkscape:pageshadow="2" inkscape:zoom="3.0666616" inkscape:cx="220.60675" inkscape:cy="182.609" inkscape:document-units="px" inkscape:current-layer="layer1" showgrid="false" inkscape:window-width="2560" inkscape:window-height="1338" inkscape:window-x="1440" inkscape:window-y="22" showguides="true" inkscape:guide-bbox="true" inkscape:window-maximized="0" fit-margin-top="0" fit-margin-left="0" fit-margin-right="0" fit-margin-bottom="0" /> <inkscape:perspective id="perspective10" inkscape:persp3d-origin="372.04724 : 350.78739 : 1" inkscape:vp_z="744.09448 : 526.18109 : 1" inkscape:vp_y="0 : 1000 : 0" inkscape:vp_x="0 : 526.18109 : 1" sodipodi:type="inkscape:persp3d" /> <inkscape:perspective sodipodi:type="inkscape:persp3d" inkscape:vp_x="0 : 526.18109 : 1" inkscape:vp_y="0 : 1000 : 0" inkscape:vp_z="744.09448 : 526.18109 : 1" inkscape:persp3d-origin="372.04724 : 350.78739 : 1" id="perspective2430" /> <inkscape:perspective sodipodi:type="inkscape:persp3d" inkscape:vp_x="0 : 526.18109 : 1" inkscape:vp_y="0 : 1000 : 0" inkscape:vp_z="744.09448 : 526.18109 : 1" inkscape:persp3d-origin="372.04724 : 350.78739 : 1" id="perspective3726" /> <inkscape:perspective sodipodi:type="inkscape:persp3d" inkscape:vp_x="0 : 526.18109 : 1" inkscape:vp_y="0 : 1000 : 0" inkscape:vp_z="744.09448 : 526.18109 : 1" inkscape:persp3d-origin="372.04724 : 350.78739 : 1" id="perspective8392" /> <inkscape:perspective sodipodi:type="inkscape:persp3d" inkscape:vp_x="0 : 0.5 : 1" inkscape:vp_y="0 : 1000 : 0" inkscape:vp_z="1 : 0.5 : 1" inkscape:persp3d-origin="0.5 : 0.33333333 : 1" id="perspective28711" /> <inkscape:perspective sodipodi:type="inkscape:persp3d" inkscape:vp_x="0 : 0.5 : 1" inkscape:vp_y="0 : 1000 : 0" inkscape:vp_z="1 : 0.5 : 1" inkscape:persp3d-origin="0.5 : 0.33333333 : 1" id="perspective28711-7" /> <inkscape:perspective sodipodi:type="inkscape:persp3d" inkscape:vp_x="0 : 0.5 : 1" inkscape:vp_y="0 : 1000 : 0" inkscape:vp_z="1 : 0.5 : 1" inkscape:persp3d-origin="0.5 : 0.33333333 : 1" id="perspective28711-2" /> <inkscape:perspective sodipodi:type="inkscape:persp3d" inkscape:vp_x="0 : 0.5 : 1" inkscape:vp_y="0 : 1000 : 0" inkscape:vp_z="1 : 0.5 : 1" inkscape:persp3d-origin="0.5 : 0.33333333 : 1" id="perspective28779" /> <inkscape:perspective id="perspective30561" inkscape:persp3d-origin="0.5 : 0.33333333 : 1" inkscape:vp_z="1 : 0.5 : 1" inkscape:vp_y="0 : 1000 : 0" inkscape:vp_x="0 : 0.5 : 1" sodipodi:type="inkscape:persp3d" /> <inkscape:perspective id="perspective30561-2" inkscape:persp3d-origin="0.5 : 0.33333333 : 1" inkscape:vp_z="1 : 0.5 : 1" inkscape:vp_y="0 : 1000 : 0" inkscape:vp_x="0 : 0.5 : 1" sodipodi:type="inkscape:persp3d" /> <inkscape:perspective id="perspective30561-4" inkscape:persp3d-origin="0.5 : 0.33333333 : 1" inkscape:vp_z="1 : 0.5 : 1" inkscape:vp_y="0 : 1000 : 0" inkscape:vp_x="0 : 0.5 : 1" sodipodi:type="inkscape:persp3d" /> <inkscape:perspective id="perspective34258" inkscape:persp3d-origin="0.5 : 0.33333333 : 1" inkscape:vp_z="1 : 0.5 : 1" inkscape:vp_y="0 : 1000 : 0" inkscape:vp_x="0 : 0.5 : 1" sodipodi:type="inkscape:persp3d" /> <rdf:RDF> <cc:Work rdf:about=""> <dc:format>image/svg+xml</dc:format> <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage" /> <dc:title /> </cc:Work> </rdf:RDF> E M E M E M
E-Mstepのイメージ

E-step

E-stepで求める事後分布 $q_{i}(h_{i})$ がどんな式になるか導出する.

結果としては以下のガンマ分布になる. これは,事前分布をガンマ分布,尤度を正規分布としたとき”分散の逆数”(精度)にかかるスケールパラメータに対して共役であるため.


\begin{eqnarray}
q_{i}( h_{i} )
= Pr( h_{i} | \vec{x}_{i}, \mathbf{\theta}^{[ t ] })
&=& \frac{Pr(\vec{x}_{i} | h_{i}, \mathbf{\theta}^{ [ t ] })Pr( h_{i} )}{ Pr(\vec{x}_{i} | \mathbf{\theta}^{ [ t ] })} \\\\
&=& \frac{ \text{Norm}_{\vec{x}_{i}}
\left[ \vec{\mu}, \frac{\mathbf{\Sigma}}{ h_{i}} \right] \text{Gam}_{h_{i}}\left[ \frac{\nu}{2}, \frac{\nu}{2}\right]
}{Pr(\vec{x}_{i})} \\\\
&=& \text{Gam}_{h_{i}}\left[ \frac{\nu+D}{2},
\frac{(\vec{x}_{i} - \mu)^{T}\mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})}{2} + \frac{\nu}{2} \right]
\end{eqnarray}

導出過程は,以下に載せる.


\begin{eqnarray}
q_{i}(h_{i})
= Pr(h_{i} | \vec{x}_{i}, \mathbf{\theta}^{ [ t ] })
&=& \frac{Pr(\vec{x}_{i} | h_{i}, \mathbf{\theta}^{ [ t ] })Pr(h_{i})}{Pr(\vec{x}_{i})} \\
&=& \frac{\text{Norm}_{\vec{x}_{i}}
\left[ \vec{\mu}, \frac{\mathbf{\Sigma}}{ h_{i}} \right] \text{Gam}_{h_{i}}\left[ \frac{\nu}{2}, \frac{\nu}{2} \right] }{Pr(\vec{x}_{i})}
\end{eqnarray}

分子の式展開:


\begin{eqnarray}
\text{Norm}_{\vec{x}_{i}}
\left[ \vec{\mu}, \frac{\mathbf{\Sigma}}{ h_{i}} \right] \text{Gam}_{h_{i}}\left[ \frac{\nu}{2}, \frac{\nu}{2}\right]
&=& \frac{1}{(2\pi)^{\frac{D}{2}} |\frac{\mathbf{\Sigma}}{h_{i}}|^{\frac{1}{2}} }\exp{\left[ - \frac{1}{2}(\vec{x}_{i}-\vec{\mu})^{T}h_{i}\mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu}) \right]}
\cdot
\frac{ (\frac{\nu}{2})^{\frac{\nu}{2}}}{\Gamma\left[ \frac{\nu}{2} \right] } \exp{\left[ -\frac{\nu}{2} h_{i} \right]} h_{i}^{\frac{\nu}{2} - 1}
\end{eqnarray}

分母の式展開:分母の周辺化はt分布になっている.


\begin{eqnarray}
Pr(\vec{x}_{i})
&=& \int Pr(\vec{x}_{i} | h_{i}, \mathbf{\theta}^{ [ t ] }) Pr(h_{i}) dh_{i} \\\\
&=& \text{Stud}_{x}\left[ \mu, \sigma^{2}, \nu \right] \\\\
&=& \frac{\Gamma\left[ \frac{\nu + D}{2} \right] }{(\nu \pi)^{\frac{D}{2}} |\Sigma|^{\frac{1}{2}} \Gamma\left[ \frac{\nu}{2} \right] } \left( 1 + \frac{(\vec{x}_{i} - \vec{\mu})^{T} \mathbf{\Sigma}^{-1} (\vec{x}_{i} - \vec{\mu})}{\nu}\right)^{-\frac{\nu + D}{2}}
\end{eqnarray}

分子に出てくる定数倍の行列式は,$|a A| = a^{D} |A|$, このように展開できるので


|\frac{\mathbf{\Sigma}}{h_{i}}|^{\frac{1}{2}}
= (\frac{1}{h_{i}^{D}} |\mathbf{\Sigma}|)^{\frac{1}{2}}
= \frac{1}{h_{i}^{\frac{D}{2}}} |\mathbf{\Sigma}|^{\frac{1}{2}}

\begin{eqnarray}
\frac{Pr(\vec{x}_{i}|h_{i})Pr(h_{i})}{Pr(\vec{x}_{i})}
&=& \frac{ \frac{1}{(2\pi)^{\frac{D}{2}} \frac{1}{h_{i}^{\frac{D}{2}}}|\mathbf{\Sigma}|^{\frac{1}{2}} }\exp{\left[ - \frac{1}{2}(\vec{x}_{i}-\vec{\mu})^{T}h_{i}\mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu}) \right] }
\cdot
\frac{ (\frac{\nu}{2})^{\frac{\nu}{2}}}{\Gamma\left[ \frac{\nu}{2} \right] } \exp\left[ -\frac{\nu}{2} h_{i} \right] h_{i}^{\frac{\nu}{2} - 1}  }{ \frac{\Gamma\left[ \frac{\nu + D}{2} \right] }{(\nu \pi)^{\frac{D}{2}} |\Sigma|^{\frac{1}{2}} \Gamma\left[ \frac{\nu}{2} \right] } \left( 1 + \frac{(\vec{x}_{i} - \vec{\mu})^{T} \mathbf{\Sigma}^{-1} (\vec{x}_{i} - \vec{\mu})}{\nu}\right)^{-\frac{\nu + D}{2}}  } \\\\
&=& \frac{ \frac{1}{2^{\frac{D}{2}}  }\exp{\left[ - \frac{1}{2}(\vec{x}_{i}-\vec{\mu})^{T}h_{i}\mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu}) \right]}
\cdot
\frac{ (\frac{\nu}{2})^{\frac{\nu}{2}}}{1} \exp\left[ -\frac{\nu}{2} h_{i} \right] h_{i}^{\frac{\nu+D}{2} - 1}  }
{ \frac{\Gamma\left[ \frac{\nu + D}{2} \right] }{\nu^{\frac{D}{2}}  } \left( 1 + \frac{(\vec{x}_{i} - \vec{\mu})^{T} \mathbf{\Sigma}^{-1} (\vec{x}_{i} - \vec{\mu})}{\nu}\right)^{-\frac{\nu + D}{2}}  } \\\\
&=& \frac{(\frac{\nu}{2})^{\frac{\nu+D}{2}} \left( 1 + \frac{(\vec{x}_{i} - \vec{\mu})^{T} \mathbf{\Sigma}^{-1} (\vec{x}_{i} - \vec{\mu})}{\nu}\right)^{\frac{\nu + D}{2}}
\exp{
\left[ - \frac{1}{2}
\left\{
\nu + (\vec{x}_{i}-\vec{\mu})^{T}\mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})
\right\} h_{i}
\right]
}
h_{i}^{\frac{\nu+D}{2} - 1}  }
{ \Gamma\left[ \frac{\nu + D}{2} \right]   } \\\\
&=& \frac{ \left\{ \frac{1}{2} \left( \nu + (\vec{x}_{i} - \vec{\mu})^{T} \mathbf{\Sigma}^{-1} (\vec{x}_{i} - \vec{\mu})\right)\right\}^{\frac{\nu + D}{2}}
\exp{
\left[ - \frac{1}{2}
\left\{
\nu + (\vec{x}_{i} - \vec{\mu})^{T}\mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})
\right\} h_{i}
\right]
}
h_{i}^{\frac{\nu+D}{2} - 1}  }
{ \Gamma\left[ \frac{\nu + D}{2} \right]  } \\\\
&=& \text{Gamma}_{h_{i}}\left[ \frac{\nu + D}{2}, \frac{1}{2}
\left\{
\nu + (\vec{x}_{i} - \vec{\mu})^{T}\mathbf{\Sigma}^{-1}( \vec{x}_{i} - \vec{\mu} )
\right\}\right]
\end{eqnarray}

M-Step

E-stepの事後分布で固定して条件付期待値の対数(下界)を最大にするときのパラメータ$\mathbf{\theta}$を求める.

式展開すると以下の期待値になる.


\begin{eqnarray}
\hat{\mathbf{\theta}}^{[ t+1 ] }
&=& \mathop{\text{argmax}}_{\mathbf{\theta}} \left[ \sum_{i=1}^{I} \int \hat{q}_{i}(h_{i}) \log{\left[ Pr(\vec{x}_{i}, h_{i}|\mathbf{\theta}) \right] } d h_{i} \right] \\\\
&=& \mathop{\text{argmax}}_{\mathbf{\theta}} \left[ \sum_{i=1}^{I} \int \hat{q}_{i}(h_{i}) \left( \log{\left[ Pr(\vec{x}_{i} | h_{i}, \mathbf{\theta}) \right] } + \log{\left[ Pr(h_{i}) \right] }\right) dh_{i} \right] \\\\
&=& \mathop{\text{argmax}}_{\mathbf{\theta}}
\left[ \sum_{i=1}^{I} \int Pr(h_{i} | \vec{x}_{i}, \mathbf{\theta}^{[ t ] }) \left( \log{\left[ Pr(\vec{x}_{i} | h_{i}, \mathbf{\theta}) \right] } + \log{\left[ Pr(h_{i}) \right] }\right) dh_{i} \right] \\\\
&=& \mathop{\text{argmax}}_{\mathbf{\theta}}
\left[
\sum_{i=1}^{I}\left\{ E\left[ \log{\left[ Pr(\vec{x}_{i} | h_{i}, \mathbf{\theta}) \right] } \right] + E\left[ \log{\left[ Pr( h_{ i } ) \right] } \right] \right\} \right] \\
\end{eqnarray}

この2つの期待値がどんな式になるのか分かればいい.

ガンマ分布における対数変換した正規分布密度の期待値$E[\log{\left[ Pr(\vec{x}_{i} | h_{i}, \mathbf{\theta}) \right] }]$


\begin{eqnarray}
\log{\left[ Pr(\vec{x}_{i} | h_{i}, \mathbf{\theta}) \right] }
&=& \log{\left[ \text{Norm}_{\vec{x}_{i}}(\vec{\mu}, \mathbf{\Sigma}/h_{i}) \right] }\\
&=& - \frac{D}{2} \log{2\pi} - \frac{1}{2}\log{|\mathbf{\Sigma}/h_{i}|} - \frac{1}{2} (\vec{x}- \vec{\mu})^{T}h_{i}\mathbf{\Sigma}^{-1}(\vec{x}- \vec{\mu}) \\\\
&=& - \frac{D}{2} \log{2\pi} - \frac{1}{2}\log{(|\mathbf{\Sigma}|\frac{1}{h_{i}^{D}})} - \frac{1}{2} (\vec{x}- \vec{\mu})^{T}h_{i}\mathbf{\Sigma}^{-1}(\vec{x}- \vec{\mu}) \\\\
&=& -\frac{D}{2}\log{2\pi} - \frac{1}{2}\log{|\mathbf{\Sigma}|} + \frac{D}{2}\log{h_{i}} - \frac{1}{2} (\vec{x}- \vec{\mu})^{T}h_{i}\mathbf{\Sigma}^{-1}(\vec{x}- \vec{\mu})
\end{eqnarray}

よって,期待値を取ると,

(xは観測データで,今は定数である.)


\begin{eqnarray}
E[ \log{\left[ Pr(\vec{x}_{i} | h_{i}, \mathbf{\theta}) \right] } ]
&=& - \frac{D}{2}\log{2\pi} - \frac{1}{2} \log{|\mathbf{\Sigma}|} + \frac{D}{2} E[ \log{h_{i}} ] - \frac{1}{2} (\vec{x}- \vec{\mu})^{T} \mathbf{\Sigma}^{-1}(\vec{x} - \vec{\mu})E[ h_{i} ] \\\\
&=& \frac{D \cdot E[ \log{h_{i}} ] - D \log{2\pi} - \log{|\mathbf{\Sigma}|} - ( \vec{x} - \vec{\mu} )^{T} \mathbf{\Sigma}^{-1}( \vec{x}- \vec{\mu} ) E[ h_{i} ] }{2}
\end{eqnarray}

ガンマ分布における事前ガンマ分布密度の対数の期待値 $E[\log{Pr(h_{i})} ]$


\begin{eqnarray}
\log{Pr(h_{i})}
&=& \log{ \text{Gam}_{h_{i}}\left[ \frac{\nu}{2}, \frac{\nu}{2} \right]} \\\\
&=& \log{ \left[ \frac{(\frac{\nu}{2})^{\frac{\nu}{2}}}{\Gamma\left[ \frac{\nu}{2} \right] }
\exp{ \left[ -\frac{\nu}{2} h_{i} \right] } h_{i}^{\frac{\nu}{2} - 1} \right] } \\\\
&=& \frac{\nu}{2}\log{[ \frac{\nu}{2} ] }
- \log{\Gamma\left[ \frac{\nu}{2} \right]}
- \frac{\nu}{2} h_{i}
+ (\frac{\nu}{2} - 1)\log{h_{i}}
\end{eqnarray}

よって,期待値は,


E[ \log{Pr(h_{i})} ]
=
\frac{\nu}{2}\log{[ \frac{\nu}{2} ] }
- \log{\Gamma\left[ \frac{\nu}{2} \right] }
- \frac{\nu}{2} E[ h_{ i } ]
+ (\frac{\nu}{2} - 1) E[ \log{h_{i}} ]

ここで,条件付期待値Eを求めるには厳密には$E[h_{i} ], E[ \log{h_{i}} ]$を求める必要がある.

ガンマ分布の期待値$E[ h_{i} ]$を求める

まず,一般的なガンマ分布$\text{Gamma}_{x}(\alpha, \beta)$の期待値を導出する.

参考:https://to-kei.net/distribution/gamma-distribution/g-parameter-derivation/


\begin{eqnarray}
E[ x ]
&=& \int_{0}^{\infty} x \text{Gam}_{x}\left[ \alpha, \beta \right] dx \\\\
&=& \int_{0}^{\infty} x \frac{\beta^{\alpha}}{\Gamma\left[ \alpha \right] } \exp\left[ -\beta x \right] x^{\alpha - 1} dx \\\\
&=& \frac{1}{\Gamma\left[ \alpha \right] } \int_{0}^{\infty} \beta^{\alpha}x^{\alpha} \exp\left[ -\beta x \right] dx \\\\
&=& \frac{1}{\Gamma\left[ \alpha \right] } \int_{0}^{\infty} (\beta x)^{\alpha} \exp\left[ -\beta x \right] dx \\\\
&=& \frac{\beta^{-1}}{\Gamma\left[ \alpha \right] } \int_{0}^{\infty} (\beta x)^{\alpha} \exp\left[ -\beta x \right] \beta dx
\end{eqnarray}

$\beta x = t$と置いて変数変換する. $dt = \beta dx$


E[ x ] = \frac{\beta^{-1}}{\Gamma\left[ \alpha \right] } \int_{0}^{\infty} t^{\alpha} \exp\left[ -t \right] dt

この積分はガンマ積分になっている.


\begin{eqnarray}
E[ x ]
&=& \frac{\beta^{-1}}{\Gamma\left[ \alpha \right] } \Gamma[ \alpha + 1 ] \\
&=& \frac{\beta^{-1}}{\Gamma\left[ \alpha \right] } \alpha \Gamma[ \alpha ] \\
&=& \frac{\alpha}{\beta}
\end{eqnarray}

t分布のパラメータ推定に話を戻すと,$E[h_{i} ]$はどうなるか.

E-stepで求まる事後分布は以下なので,


\text{Gamma}_{ h_{ i } }\left[ \frac{\nu + D}{2}, \frac{1}{2}
\left\{ \nu + (\vec{x}_{i} - \vec{\mu})^{T}\mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})
\right\}\right]

事後分布の期待値は,以下になる.


E[ h_{i} ] =
\frac{\nu + D}{\nu + (\vec{x}_{i} - \vec{\mu})^{T}\mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})}

ガンマ分布の確率変数を対数変換した期待値$E[ \log{ h_{i} } ]$を求める

参考:https://math.stackexchange.com/questions/138252/expected-value-of-ln-x-if-x-is-gammaa-b-distributed

まず,一般的なガンマ分布$\text{Gamma}_{x}(\alpha, \beta)$で,対数変換した変数の期待値を求める.


\begin{eqnarray}
E[ \log{x} ]
&=& \int_{0}^{\infty} \log{x} \text{Gam}_{x}\left[ \alpha, \beta \right] dx \\\\
&=& \int_{0}^{\infty} \log{x} \frac{\beta^{\alpha}}{\Gamma\left[ \alpha \right]}
\exp{\left[ -\beta x \right]} x^{\alpha - 1} dx
\end{eqnarray}

$\beta x = y$として


\begin{eqnarray}
E[ \log{x} ]
&=& \frac{1}{\Gamma\left[ \alpha \right]} \int_{0}^{\infty} \log{x} {(\beta x)}^{\alpha - 1} \exp\left[ -\beta x \right] \beta dx \\\\
&=& \frac{1}{\Gamma\left[ \alpha \right]} \int_{0}^{\infty} \log{(\frac{y}{\beta})} y^{\alpha - 1} \exp{ \left[ -y \right]} dy \\\\
&=& \frac{1}{\Gamma\left[ \alpha \right]} \int_{0}^{\infty} (\log{y} - \log{(\beta)}) y^{\alpha - 1} \exp{\left[ -y \right]} dy \\\\
&=& -\log{\beta} + \frac{1}{\Gamma\left[ \alpha \right] } \int_{0}^{\infty} \log{(y)} y^{\alpha - 1} \exp\left[ -y \right] dy
\end{eqnarray}

ここで,ガンマ関数の微分を考える.

$$ \Gamma\left[ \alpha \right] = \int_{0}^{\infty} x^{\alpha-1} e^{-x} dx $$


\begin{eqnarray}
\Gamma[ \alpha ]'
= \frac{d \Gamma\left[ \alpha \right] }{d \alpha}
&=& \int_{0}^{\infty} \frac{d}{d \alpha} \exp{(\log{x^{\alpha-1}} )} e^{-x} dx \\\\
&=& \int_{0}^{\infty} \frac{d}{d \alpha} \exp{\{ (\alpha-1)(\log{x} ) \}} e^{-x } dx \\\\
&=& \int_{0}^{\infty} \exp{\{ (\alpha-1)(\log{x} ) \}} \log{(x)}  e^{-x } dx \\\\
&=& \int_{0}^{\infty} \log{(x)} x^{\alpha-1} e^{-x } dx
\end{eqnarray}

\begin{eqnarray}
E[ \log{x} ]
&=& \frac{\Gamma\left[ \alpha \right]' }{\Gamma\left[ \alpha \right] } - \log{\beta}
\end{eqnarray}

$\frac{d \log{ f( x ) } }{ d x } = \frac{ f'(x) }{ f(x) }$より,


\begin{eqnarray}
E[ \log{x} ]
&=& \frac{d \log{ \Gamma[ \alpha ] }}{d \alpha} - \log{\beta}
\end{eqnarray}

$\frac{d \log{ \Gamma[ \alpha ] }}{d \alpha}$ をdigmma関数$\psi$ とする. つまり,対数の期待値は以下になる.


\begin{eqnarray}
E[ \log{x} ]
= \frac{d \log{ \Gamma[\alpha ] }}{d \alpha} - \log{\beta}
= \psi(\alpha) - \log{\beta}
\end{eqnarray}

よって,t分布のパラメータ推定に話を戻すと, 事後分布のガンマ分布は次より,


\text{Gamma}_{h_{i}}\left[ \frac{\nu + D}{2}, \frac{1}{2}
\left\{ \nu + (\vec{x}_{i}-\vec{\mu})^{T}\mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})
  \right\}\right]

\begin{eqnarray}
E[ \log{h_{i}} ]
&=& \psi \left[ \frac{\nu +D}{2} \right] - \log{\left[ \frac{\nu + (\vec{x}_{i} - \vec{\mu})^{T}\mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})}{2} \right] }
\end{eqnarray}

各パラメータ$\mathbf{\theta}$を推定

M-stepの条件付き期待値をt分布の各パラメータで微分して0とおけば推定できる. ただし,自由度$\nu$は解析的に求められない.

結果,以下の式になる.


\begin{eqnarray}
\vec{\mu}^{[ t+1 ] }
&=& \frac{\sum_{i=1}^{ I } E[ h_{i} ] \vec{x}_{i}}{\sum_{i=1}^{I} E[ h_{i} ]} \\\\
\mathbf{\Sigma}^{[ t+1 ] }
&=& \frac{\sum_{i=1}^{ I } E[ h_{i} ] (\vec{x}_{i} - \vec{\mu}^{[ t+1 ] }) (\vec{x}_{i} - \vec{\mu}^{ [ t+1 ] } )^{T}}{\sum_{i=1}^{ I } E[ h_{i} ] }
\end{eqnarray}

平均パラメータ$\mu$の更新式

M-stepの条件付き期待値をパラメータ$\mu$で微分すると,


\frac{d E}{d \vec{\mu}}
= \frac{d}{d \vec{\mu} } \sum_{i=1}^{I} \frac{D \cdot E[ \log{h_{i}} ] - D\log{2 \pi} - \log{|\mathbf{\Sigma}|} - (\vec{x}_{i} - \vec{\mu})^{T}\mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})E[ h_{i} ] }{2}
= 0 \\\\
\sum_{i=1}^{I} - \frac{1}{2} (- 2 \mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})E[ h_{i} ] ) = 0\\\\
\sum_{i=1}^{I} \mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})E[ h_{i} ] = 0 \\\\
\sum_{i=1}^{I} (\vec{x}_{i} - \vec{\mu})E[ h_{i} ] = 0 \\\\
\sum_{i=1}^{I} \vec{x}_{i} E[ h_{i} ] - \vec{\mu}\sum_{i=1}^{I} E[ h_{i} ] = 0\\
\vec{\mu}\sum_{i=1}^{I} E[ h_{i} ] = \sum_{i=1}^{I} \vec{x}_{i} E[ h_{i} ] \\\\
\vec{\mu} = \frac{\sum_{i=1}^{I} \vec{x}_{i} E[ h_{i} ] }{\sum_{i=1}^{I} E[ h_{i} ] }

共分散行列パラメータ  \mathbf{\Sigma} の更新式

M-stepの条件付き期待値をパラメータ  \mathbf{\Sigma}微分すると,

 \frac{d \sum_{i=1}^{I} E[\log{ Pr(h_{i})} ] }{d \mathbf{\Sigma}} = 0,
\frac{d \log{|A|} }{d A} = (A^{-1})^{T},
\frac{d \vec{x}^{T} \mathbf{A} \vec{x}}{d \mathbf{A}} = - (\mathbf{A}^{-1})^{T}\vec{x}\vec{x}^{T}(\mathbf{A}^{-1})^{T}

より,


\begin{eqnarray}
\frac{d \sum_{i=1}^{I} E[\log{\left[ Pr(\vec{x}_{i} | h_{i}, \mathbf{\theta}) \right] ] }}{d \mathbf{\Sigma}}
=  \frac{d }{d \mathbf{\Sigma}} \sum_{i=1}^{I} \left(-\frac{1}{2}\log{|\mathbf{\Sigma}|} -\frac{1}{2}(\vec{x}_{i} - \vec{\mu})^{T} \mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})E[ h_{i} ] \right)
&=& 0 \\\\
\sum_{i=1}^{I} \left\{ -\frac{1}{2}(\mathbf{\Sigma}^{-1})^{T} - \frac{1}{2}\left(- \mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})(\vec{x}_{i} - \vec{\mu})^{T}\mathbf{\Sigma}^{-1} E[ h_{i} ] \right) \right\}
&=& 0
\end{eqnarray}

逆行列のみの項を左辺に移行


\begin{eqnarray}
\sum_{i=1}^{I} \mathbf{\Sigma}^{-1}
&=& \sum_{i=1}^{I} \mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})(\vec{x}_{i} - \vec{\mu})^{T}\mathbf{\Sigma}^{-1} E[ h_{i} ] \\\\
I \mathbf{\Sigma}^{-1}
&=& \sum_{i=1}^{I} \mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})(\vec{x}_{i} - \vec{\mu})^{T}\mathbf{\Sigma}^{-1} E[ h_{i} ] \\\\
I \mathbf{\Sigma}
&=& \sum_{i=1}^{I} (\vec{x}_{i} - \vec{\mu})(\vec{x}_{i} - \vec{\mu})^{T} E[ h_{i} ] \\\\
\mathbf{\Sigma}
&=& \frac{\sum_{i=1}^{I} (\vec{x}_{i} - \vec{\mu})(\vec{x}_{i} - \vec{\mu})^{T} E[ h_{i} ] }{I} \\\\
\end{eqnarray}

1次元の正規分布データからt分布のパラメータを推定するcode

各ステップの式をまとめる.

E-Step:


\begin{eqnarray}
q_{i}(h_{i})
= Pr(h_{i} | \vec{x}_{i}, \mathbf{\theta}^{[ t ]})
&=& \frac{Pr( \vec{x}_{i} | h_{i}, \mathbf{\theta}^{[ t ] })Pr(h_{i})}{Pr(\vec{x}_{i} | \mathbf{\theta}^{[ t ]})} \\\\
&=& \frac{ \text{Norm}_{x_{i}}
\left[ \mu, \frac{\sigma^{2}}{ h_{i}} \right] \text{Gam}_{h_{i}}\left[ \frac{\nu}{2}, \frac{\nu}{2} \right]
}{Pr(\vec{x}_{i})} \\\\
&=& \text{Gam}_{h_{i}}\left[ \frac{\nu+1}{2},
\frac{(x_{i} - \mu)^{2}(\sigma^{2})^{-1}}{2} + \frac{\nu}{2} \right]  \\
\end{eqnarray}

M-step:


\mathop{\text{argmax}}_{\mathbf{\theta}}
\left[
\sum_{i=1}^{I}\left\{ E\left[ \log{\left[ Pr(\vec{x}_{i} | h_{i}, \mathbf{\theta}) \right] } \right] + E\left[ \log{\left[ Pr(h_{i}) \right] } \right] \right\} \right]

\mu = \frac{\sum_{i=1}^{I} x_{i} E[ h_{i} ] }{\sum_{i=1}^{I} E[ h_{i} ] } \\\\
\sigma^{2}
= \frac{\sum_{i=1}^{I} (x_{i} - \mu)^{2} E[ h_{i} ] }{I}

\begin{eqnarray}
E\left[ \log{\left[ Pr(\vec{x}_{i} | h_{i}, \mathbf{\theta}) \right] } \right]
&=&
\frac{D \cdot E[ \log{h_{i}} ] - D\log{2 \pi} - \log{|\mathbf{\Sigma}|} - (\vec{x}_{i} - \vec{\mu})^{T}\mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})E[ h_{i} ] }{2} \\\\
&=&
\frac{1 \cdot E[ \log{h_{i}} ] - 1 \cdot \log{2 \pi} - \log{\sigma^{2}} - (x_{i} - \mu)^{2}(\sigma^{2})^{-1}E[ h_{i} ] }{2}
\end{eqnarray}

\begin{eqnarray}
E\left[ \log{\left[ Pr(h_{i}) \right] } \right]
&=&
\frac{\nu}{2}\log{\left[ \frac{\nu}{2} \right] } - \log{\Gamma\left[ \frac{\nu}{2}\right] } + \left( \frac{\nu}{2} - 1\right) E[ \log{h_{i}} ] - \frac{\nu}{2}E[ h_{i} ]
\end{eqnarray}

\begin{eqnarray}
E[ \log{h_{i}} ]
&=& \psi \left[ \frac{\nu +D}{2} \right] - \log{\left[ \frac{\nu + (\vec{x}_{i} - \vec{\mu})^{T}\mathbf{\Sigma}^{-1}(\vec{x}_{i} - \vec{\mu})}{2} \right] } \\\\
&=& \psi \left[ \frac{\nu + 1}{2} \right] - \log{\left[ \frac{\nu + (x_{i} - \mu)^{2}(\sigma^{2})^{-1}}{2} \right] }
\end{eqnarray}
  • 真の分布を標準正規分布$N(0, 1)$とし,
  • そこから観測データを100個サンプリングする.
  • 初期分布を$t(-2, 2, \nu = 1)$からEMアルゴリズムを使ってパラメータを推定する.
  • 自由度$\nu$はグリッドサーチで変えながら条件付き期待値が最大になるパラメータを求める.
import scipy as sp
from scipy import stats
from scipy import special
import matplotlib.pyplot as plt

%matplotlib inline

def EStep(data_x, mu, sigma2, nu):
    '''
    return posterior Gamma each data
    '''
    q_h = []
    alpha = (nu+1)/2
    for i in sp.arange(len(data_x)):
        beta = ((data_x[i] - mu)**2/sigma2 + nu)*.5
        q_h.append(stats.gamma(a=alpha, scale=1/beta))
    return q_h


def calc_cond_expectation(data_x, expectations_posterior, mu, sigma2, nu):
    '''
    E[hi] = expectation_posterior
    E[log hi] = digamma[(nu+D)/2] - log[  ]
    '''

    E_hi = expectations_posterior
    E_log_hi = special.digamma((nu+1) / 2) - \
        sp.log((nu + ((data_x - mu)**2)/sigma2) * .5)

    E_posterior_logNormpdf = (E_log_hi - sp.log(2 * sp.pi) -
                              sp.log(sigma2) - (data_x - mu)**2 / sigma2 * E_hi) / 2
    E_posterior_priorGamma = nu * .5 * sp.log(nu * .5) - sp.log(
        special.gamma(nu * .5)) + (.5 * nu - 1) * E_log_hi - nu * .5 * E_hi

    return sp.sum(E_posterior_logNormpdf + E_posterior_priorGamma)


def MStep(data_x, q_h, nu):
    '''
    return
    '''
    I = len(data_x)
    expectations_posterior = sp.empty_like(q_h)
    for i in sp.arange(len(q_h)):
        dist = q_h[i]
        expectations_posterior[i] = dist.mean()

    mu = sp.sum(data_x * expectations_posterior) / sp.sum(expectations_posterior)
    sigma2 = sp.sum((data_x - mu)**2 * expectations_posterior) / I

    cond_expectation = calc_cond_expectation(
        data_x, expectations_posterior, mu, sigma2, nu)
    return cond_expectation, mu, sigma2, nu


n = 100
EM_iterations = 15
nus = sp.arange(1, n+5, 5)

truth_dist = stats.norm(loc=0, scale=sp.sqrt(1))
data_x = truth_dist.rvs(size=n)
# ini_para
mu, sigma2 = -2., 2.

# EM
max_para = {"cond_expectation": -1e20, "mu": mu, "sigma2": sigma2}

for i in sp.arange(EM_iterations):
    mu, sigma2 = max_para["mu"], max_para["sigma2"]

    for nu in nus:
        q_h = EStep(data_x, mu, sigma2, nu)
        c_e, mu, sigma2, nu = MStep(data_x, q_h, nu)

        if max_para["cond_expectation"] < c_e:
            max_para = {
                "cond_expectation": c_e,
                "mu": mu,
                "sigma2": sigma2,
                "nu": nu
            }
            print("i:", i, end=" ")
            print("max:", max_para)


plt.hist(data_x, bins=20, density=True)

x = sp.linspace(-7, 7, 100)
t_dist = stats.t(loc=max_para["mu"], scale=sp.sqrt(max_para["sigma2"]), df=max_para["nu"])
plt.plot(x, t_dist.pdf(x))
plt.tight_layout()
plt.show()
i: 0 max: {'cond_expectation': -262.2832259124987, 'mu': -0.534967953023884, 'sigma2': 0.821448924989282, 'nu': 1}
i: 0 max: {'cond_expectation': -201.08196951409755, 'mu': -0.02662036249833737, 'sigma2': 0.8928237170918831, 'nu': 6}
i: 0 max: {'cond_expectation': -189.87233960822854, 'mu': 0.11100223434262527, 'sigma2': 1.0040720996170245, 'nu': 11}
i: 0 max: {'cond_expectation': -178.58292159707602, 'mu': 0.1302403218320351, 'sigma2': 1.0637091848820932, 'nu': 16}
i: 0 max: {'cond_expectation': -168.18094371837404, 'mu': 0.13009946345391527, 'sigma2': 1.0943402299966092, 'nu': 21}
i: 0 max: {'cond_expectation': -159.1255565412171, 'mu': 0.1286775435488971, 'sigma2': 1.1123900510434954, 'nu': 26}
i: 0 max: {'cond_expectation': -151.29765752987493, 'mu': 0.12763401158513304, 'sigma2': 1.1244249220900722, 'nu': 31}
i: 0 max: {'cond_expectation': -144.46941473104542, 'mu': 0.12689199040076732, 'sigma2': 1.1331130219771408, 'nu': 36}
i: 0 max: {'cond_expectation': -138.435638740986, 'mu': 0.12633866659507098, 'sigma2': 1.1397130086509437, 'nu': 41}
i: 0 max: {'cond_expectation': -133.03832544436656, 'mu': 0.12590917369284998, 'sigma2': 1.1449086549309175, 'nu': 46}
i: 0 max: {'cond_expectation': -128.15924392689732, 'mu': 0.12556560605155995, 'sigma2': 1.1491098777985396, 'nu': 51}
i: 0 max: {'cond_expectation': -123.70919358531602, 'mu': 0.12528427748478907, 'sigma2': 1.152579494972972, 'nu': 56}
i: 0 max: {'cond_expectation': -119.61970292853879, 'mu': 0.12504955453819885, 'sigma2': 1.1554944861073908, 'nu': 61}
i: 0 max: {'cond_expectation': -115.83724776098929, 'mu': 0.1248506723402878, 'sigma2': 1.15797866234179, 'nu': 66}
i: 0 max: {'cond_expectation': -112.31927348944568, 'mu': 0.12467996330386541, 'sigma2': 1.1601213558893626, 'nu': 71}
i: 0 max: {'cond_expectation': -109.03142710599255, 'mu': 0.12453181180519048, 'sigma2': 1.1619886979279859, 'nu': 76}
i: 0 max: {'cond_expectation': -105.94559681225746, 'mu': 0.12440200768289753, 'sigma2': 1.16363072102759, 'nu': 81}
i: 0 max: {'cond_expectation': -103.03849772306813, 'mu': 0.12428733074330173, 'sigma2': 1.1650859930446327, 'nu': 86}
i: 0 max: {'cond_expectation': -100.2906332539778, 'mu': 0.12418527498002875, 'sigma2': 1.166384732376521, 'nu': 91}
i: 0 max: {'cond_expectation': -97.68551969630089, 'mu': 0.12409386041163888, 'sigma2': 1.1675509569732803, 'nu': 96}
i: 0 max: {'cond_expectation': -95.20909843640226, 'mu': 0.12401150159472255, 'sigma2': 1.1686040005059892, 'nu': 101}

ヒストグラムが観測データ, 実線が推定したパラメータでのt分布になる.

EMアルゴリズムで推定したt分布