Epstein Files Full PDF

CLICK HERE
Technopedia Center
PMB University Brochure
Faculty of Engineering and Computer Science
S1 Informatics S1 Information Systems S1 Information Technology S1 Computer Engineering S1 Electrical Engineering S1 Civil Engineering

faculty of Economics and Business
S1 Management S1 Accountancy

Faculty of Letters and Educational Sciences
S1 English literature S1 English language education S1 Mathematics education S1 Sports Education
teknopedia

  • Registerasi
  • Brosur UTI
  • Kip Scholarship Information
  • Performance
Flag Counter
  1. World Encyclopedia
  2. Variation of information - Wikipedia
Variation of information - Wikipedia
From Wikipedia, the free encyclopedia

Measure of distance between two clusterings related to mutual information

In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements). It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.[1][2][3]

Information diagram illustrating the relation between information entropies, mutual information and variation of information.

Definition

[edit]

Suppose we have two partitions X {\displaystyle X} {\displaystyle X} and Y {\displaystyle Y} {\displaystyle Y} of a set A {\displaystyle A} {\displaystyle A}, namely X = { X 1 , X 2 , … , X k } {\displaystyle X=\{X_{1},X_{2},\ldots ,X_{k}\}} {\displaystyle X=\{X_{1},X_{2},\ldots ,X_{k}\}} and Y = { Y 1 , Y 2 , … , Y l } {\displaystyle Y=\{Y_{1},Y_{2},\ldots ,Y_{l}\}} {\displaystyle Y=\{Y_{1},Y_{2},\ldots ,Y_{l}\}}.

Let:

n = ∑ i | X i | = ∑ j | Y j | = | A | {\displaystyle n=\sum _{i}|X_{i}|=\sum _{j}|Y_{j}|=|A|} {\displaystyle n=\sum _{i}|X_{i}|=\sum _{j}|Y_{j}|=|A|}
p i = | X i | / n {\displaystyle p_{i}=|X_{i}|/n} {\displaystyle p_{i}=|X_{i}|/n} and q j = | Y j | / n {\displaystyle q_{j}=|Y_{j}|/n} {\displaystyle q_{j}=|Y_{j}|/n}
r i j = | X i ∩ Y j | / n {\displaystyle r_{ij}=|X_{i}\cap Y_{j}|/n} {\displaystyle r_{ij}=|X_{i}\cap Y_{j}|/n}

Then the variation of information between the two partitions is:

V I ( X ; Y ) = − ∑ i , j r i j [ log ⁡ ( r i j / p i ) + log ⁡ ( r i j / q j ) ] {\displaystyle \mathrm {VI} (X;Y)=-\sum _{i,j}r_{ij}\left[\log(r_{ij}/p_{i})+\log(r_{ij}/q_{j})\right]} {\displaystyle \mathrm {VI} (X;Y)=-\sum _{i,j}r_{ij}\left[\log(r_{ij}/p_{i})+\log(r_{ij}/q_{j})\right]}.

This is equivalent to the shared information distance between the random variables i and j with respect to the uniform probability measure on A {\displaystyle A} {\displaystyle A} defined by μ ( B ) := | B | / n {\displaystyle \mu (B):=|B|/n} {\displaystyle \mu (B):=|B|/n} for B ⊆ A {\displaystyle B\subseteq A} {\displaystyle B\subseteq A}.

Explicit information content

[edit]

We can rewrite this definition in terms that explicitly highlight the information content of this metric.

The set of all partitions of a set form a compact lattice where the partial order induces two operations, the meet ∧ {\displaystyle \wedge } {\displaystyle \wedge } and the join ∨ {\displaystyle \vee } {\displaystyle \vee }, where the maximum 1 ¯ {\displaystyle {\overline {\mathrm {1} }}} {\displaystyle {\overline {\mathrm {1} }}} is the partition with only one block, i.e., all elements grouped together, and the minimum is 0 ¯ {\displaystyle {\overline {\mathrm {0} }}} {\displaystyle {\overline {\mathrm {0} }}}, the partition consisting of all elements as singletons. The meet of two partitions X {\displaystyle X} {\displaystyle X} and Y {\displaystyle Y} {\displaystyle Y} is easy to understand as that partition formed by all pair intersections of one block of, X i {\displaystyle X_{i}} {\displaystyle X_{i}}, of X {\displaystyle X} {\displaystyle X} and one, Y i {\displaystyle Y_{i}} {\displaystyle Y_{i}}, of Y {\displaystyle Y} {\displaystyle Y}. It then follows that X ∧ Y ⊆ X {\displaystyle X\wedge Y\subseteq X} {\displaystyle X\wedge Y\subseteq X} and X ∧ Y ⊆ Y {\displaystyle X\wedge Y\subseteq Y} {\displaystyle X\wedge Y\subseteq Y}.

Let's define the entropy of a partition X {\displaystyle X} {\displaystyle X} as

H ( X ) = − ∑ i p i log ⁡ p i {\displaystyle H\left(X\right)\,=\,-\sum _{i}\,p_{i}\log p_{i}} {\displaystyle H\left(X\right)\,=\,-\sum _{i}\,p_{i}\log p_{i}},

where p i = | X i | / n {\displaystyle p_{i}=|X_{i}|/n} {\displaystyle p_{i}=|X_{i}|/n}. Clearly, H ( 1 ¯ ) = 0 {\displaystyle H({\overline {\mathrm {1} }})=0} {\displaystyle H({\overline {\mathrm {1} }})=0} and H ( 0 ¯ ) = log n {\displaystyle H({\overline {\mathrm {0} }})=\log \,n} {\displaystyle H({\overline {\mathrm {0} }})=\log \,n}. The entropy of a partition is a monotonous function on the lattice of partitions in the sense that X ⊆ Y ⇒ H ( X ) ≥ H ( Y ) {\displaystyle X\subseteq Y\Rightarrow H(X)\geq H(Y)} {\displaystyle X\subseteq Y\Rightarrow H(X)\geq H(Y)}.

Then the VI distance between X {\displaystyle X} {\displaystyle X} and Y {\displaystyle Y} {\displaystyle Y} is given by

V I ( X , Y ) = 2 H ( X ∧ Y ) − H ( X ) − H ( Y ) {\displaystyle \mathrm {VI} (X,Y)\,=\,2H(X\wedge Y)\,-\,H(X)\,-\,H(Y)} {\displaystyle \mathrm {VI} (X,Y)\,=\,2H(X\wedge Y)\,-\,H(X)\,-\,H(Y)}.

The difference d ( X , Y ) ≡ | H ( X ) − H ( Y ) | {\displaystyle d(X,Y)\equiv |H\left(X\right)-H\left(Y\right)|} {\displaystyle d(X,Y)\equiv |H\left(X\right)-H\left(Y\right)|} is a pseudo-metric as d ( X , Y ) = 0 {\displaystyle d(X,Y)=0} {\displaystyle d(X,Y)=0} doesn't necessarily imply that X = Y {\displaystyle X=Y} {\displaystyle X=Y}. From the definition of 1 ¯ {\displaystyle {\overline {\mathrm {1} }}} {\displaystyle {\overline {\mathrm {1} }}}, it is V I ( X , 1 ) = H ( X ) {\displaystyle \mathrm {VI} (X,\mathrm {1} )\,=\,H\left(X\right)} {\displaystyle \mathrm {VI} (X,\mathrm {1} )\,=\,H\left(X\right)}.

If in the Hasse diagram we draw an edge from every partition to the maximum 1 ¯ {\displaystyle {\overline {\mathrm {1} }}} {\displaystyle {\overline {\mathrm {1} }}} and assign it a weight equal to the VI distance between the given partition and 1 ¯ {\displaystyle {\overline {\mathrm {1} }}} {\displaystyle {\overline {\mathrm {1} }}}, we can interpret the VI distance as basically an average of differences of edge weights to the maximum

V I ( X , Y ) = | V I ( X , 1 ¯ ) − V I ( X ∧ Y , 1 ¯ ) | + | V I ( Y , 1 ¯ ) − V I ( X ∧ Y , 1 ¯ ) | = d ( X , X ∧ Y ) + d ( Y , X ∧ Y ) {\displaystyle \mathrm {VI} (X,Y)\,=\,|\mathrm {VI} (X,{\overline {\mathrm {1} }})\,-\,\mathrm {VI} (X\wedge Y,{\overline {\mathrm {1} }})|\,+\,|\mathrm {VI} (Y,{\overline {\mathrm {1} }})\,-\,\mathrm {VI} (X\wedge Y,{\overline {\mathrm {1} }})|\,=\,d(X,X\wedge Y)\,+\,d(Y,X\wedge Y)} {\displaystyle \mathrm {VI} (X,Y)\,=\,|\mathrm {VI} (X,{\overline {\mathrm {1} }})\,-\,\mathrm {VI} (X\wedge Y,{\overline {\mathrm {1} }})|\,+\,|\mathrm {VI} (Y,{\overline {\mathrm {1} }})\,-\,\mathrm {VI} (X\wedge Y,{\overline {\mathrm {1} }})|\,=\,d(X,X\wedge Y)\,+\,d(Y,X\wedge Y)}.

For H ( X ) {\displaystyle H(X)} {\displaystyle H(X)} as defined above, it holds that the joint information of two partitions coincides with the entropy of the meet

H ( X , Y ) = H ( X ∧ Y ) {\displaystyle H(X,Y)\,=\,H(X\wedge Y)} {\displaystyle H(X,Y)\,=\,H(X\wedge Y)}

and we also have that d ( X , X ∧ Y ) = H ( X ∧ Y | X ) {\displaystyle d(X,X\wedge Y)\,=\,H(X\wedge Y|X)} {\displaystyle d(X,X\wedge Y)\,=\,H(X\wedge Y|X)} coincides with the conditional entropy of the meet (intersection) X ∧ Y {\displaystyle X\wedge Y} {\displaystyle X\wedge Y} relative to X {\displaystyle X} {\displaystyle X}.

Identities

[edit]

The variation of information satisfies

V I ( X ; Y ) = H ( X ) + H ( Y ) − 2 I ( X , Y ) {\displaystyle \mathrm {VI} (X;Y)=H(X)+H(Y)-2I(X,Y)} {\displaystyle \mathrm {VI} (X;Y)=H(X)+H(Y)-2I(X,Y)},

where H ( X ) {\displaystyle H(X)} {\displaystyle H(X)} is the entropy of X {\displaystyle X} {\displaystyle X}, and I ( X , Y ) {\displaystyle I(X,Y)} {\displaystyle I(X,Y)} is mutual information between X {\displaystyle X} {\displaystyle X} and Y {\displaystyle Y} {\displaystyle Y} with respect to the uniform probability measure on A {\displaystyle A} {\displaystyle A}. This can be rewritten as

V I ( X ; Y ) = H ( X , Y ) − I ( X , Y ) {\displaystyle \mathrm {VI} (X;Y)=H(X,Y)-I(X,Y)} {\displaystyle \mathrm {VI} (X;Y)=H(X,Y)-I(X,Y)},

where H ( X , Y ) {\displaystyle H(X,Y)} {\displaystyle H(X,Y)} is the joint entropy of X {\displaystyle X} {\displaystyle X} and Y {\displaystyle Y} {\displaystyle Y}, or

V I ( X ; Y ) = H ( X | Y ) + H ( Y | X ) {\displaystyle \mathrm {VI} (X;Y)=H(X|Y)+H(Y|X)} {\displaystyle \mathrm {VI} (X;Y)=H(X|Y)+H(Y|X)},

where H ( X | Y ) {\displaystyle H(X|Y)} {\displaystyle H(X|Y)} and H ( Y | X ) {\displaystyle H(Y|X)} {\displaystyle H(Y|X)} are the respective conditional entropies.

The variation of information can also be bounded, either in terms of the number of elements:

V I ( X ; Y ) ≤ log ⁡ ( n ) {\displaystyle \mathrm {VI} (X;Y)\leq \log(n)} {\displaystyle \mathrm {VI} (X;Y)\leq \log(n)},

Or with respect to a maximum number of clusters, K ∗ {\displaystyle K^{*}} {\displaystyle K^{*}}:

V I ( X ; Y ) ≤ 2 log ⁡ ( K ∗ ) {\displaystyle \mathrm {VI} (X;Y)\leq 2\log(K^{*})} {\displaystyle \mathrm {VI} (X;Y)\leq 2\log(K^{*})}

Triangle inequality

[edit]

To verify the triangle inequality V I ( X ; Z ) ≤ V I ( X ; Y ) + V I ( Y ; Z ) {\displaystyle \mathrm {VI} (X;Z)\leq \mathrm {VI} (X;Y)+\mathrm {VI} (Y;Z)} {\displaystyle \mathrm {VI} (X;Z)\leq \mathrm {VI} (X;Y)+\mathrm {VI} (Y;Z)}, expand using the identity V I ( X ; Y ) = H ( X | Y ) + H ( Y | X ) {\displaystyle \mathrm {VI} (X;Y)=H(X|Y)+H(Y|X)} {\displaystyle \mathrm {VI} (X;Y)=H(X|Y)+H(Y|X)}. It suffices to prove H ( X | Z ) ≤ H ( X | Y ) + H ( Y | Z ) {\displaystyle H(X|Z)\leq H(X|Y)+H(Y|Z)} {\displaystyle H(X|Z)\leq H(X|Y)+H(Y|Z)}The right side has a lower bound H ( X | Y ) + H ( Y | Z ) ≥ H ( X | Y , Z ) + H ( Y | Z ) = H ( X , Y | Z ) {\displaystyle H(X|Y)+H(Y|Z)\geq H(X|Y,Z)+H(Y|Z)=H(X,Y|Z)} {\displaystyle H(X|Y)+H(Y|Z)\geq H(X|Y,Z)+H(Y|Z)=H(X,Y|Z)} which is no less than the left side.

References

[edit]
  1. ^ Arabie, P.; Boorman, S. A. (1973). "Multidimensional scaling of measures of distance between partitions". Journal of Mathematical Psychology. 10 (2): 148–203. doi:10.1016/0022-2496(73)90012-6.
  2. ^ W.H. Zurek, Nature, vol 341, p119 (1989); W.H. Zurek, Physics Review A, vol 40, p. 4731 (1989)
  3. ^ Meilă, Marina (2003). "Comparing Clusterings by the Variation of Information". In Bernhard Schölkopf; Manfred K. Warmuth (eds.). Learning Theory and Kernel Machines. 16th Annual Conference on Computational Learning Theory and 7th Kernel Workshop. Lecture Notes in Computer Science, vol. 2777. Springer. pp. 173–187. doi:10.1007/978-3-540-45167-9_14. ISBN 978-3-540-40720-1. S2CID 4341039.

Further reading

[edit]
  • Meila, M. (2007). "Comparing clusterings—an information based distance". Journal of Multivariate Analysis. 98 (5): 873–895. doi:10.1016/j.jmva.2006.11.013.
  • Kingsford, Carl (2009). "Information Theory Notes" (PDF). Retrieved 22 September 2009.
  • Kraskov, Alexander; Harald Stögbauer; Ralph G. Andrzejak; Peter Grassberger (2003). "Hierarchical Clustering Based on Mutual Information". arXiv:q-bio/0311039.

External links

[edit]
  • Partanalyzer includes a C++ implementation of VI and other metrics and indices for analyzing partitions and clusterings
  • C++ implementation with MATLAB mex files
Retrieved from "https://teknopedia.ac.id/w/index.php?title=Variation_of_information&oldid=1279194816"
Categories:
  • Entropy and information
  • Summary statistics for contingency tables
  • Clustering criteria
Hidden categories:
  • Use dmy dates from November 2019
  • Articles with short description
  • Short description matches Wikidata

  • indonesia
  • Polski
  • العربية
  • Deutsch
  • English
  • Español
  • Français
  • Italiano
  • مصرى
  • Nederlands
  • 日本語
  • Português
  • Sinugboanong Binisaya
  • Svenska
  • Українська
  • Tiếng Việt
  • Winaray
  • 中文
  • Русский
Sunting pranala
url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url url
Pusat Layanan

UNIVERSITAS TEKNOKRAT INDONESIA | ASEAN's Best Private University
Jl. ZA. Pagar Alam No.9 -11, Labuhan Ratu, Kec. Kedaton, Kota Bandar Lampung, Lampung 35132
Phone: (0721) 702022
Email: pmb@teknokrat.ac.id