Information Entropy is a way of measuring data in terms of the amount of uncertainty it resolves. We'll use this perspective to explore techniques for structuring and analysing your data. You will learn practical ideas for how to extract more value from your data and leave with a framework for understanding the value proposition of data-driven products.
Lessons learned from analysis
explained with information theory
Location: Berlin
Date: 2019-09-24
Temperature: 15°
Yes, you'll need a coat
✓
Understand what information is contained in your data
Discover what's really valuable about your data
Learn how to structure and organise it effectively
Source
↓
Encoder
↓
Symbol
↓
Decoder
↓
Destination
Freedom of expression
Measurable quantity
Thirst for knowledge
Source
↓
Encoder
↓
Symbol
↓
Decoder
↓
Destination
Unobserved choices
Measure of information entropy
Resolved uncertainty
So how do we measure choice or uncertainty?
Correctly guessing the result of:
Outcome Probabilities:$$\begin{align} P(H) &= 3/4 \\ P(T) &= 1/4\\ \end{align}$$
Simple binary coding:$$\begin{align} H &= 0 \\ T &= 1\\ \end{align}$$
$$\begin{align} \text{Average bits per toss} = & P(H) \times 1 \text{ bit} + \\ & P(T) \times 1 \text{ bit} \\ = & 1 \text{ bit} \end{align}$$
Outcome Probabilities:$$\begin{align} P(HH) &= 9/16 \\ P(HT) &= 3/16 \\ P(TH) &= 3/16 \\ P(TT) &= 1/16 \\ \end{align}$$
Huffman coding:$$\begin{align} HH &= 0 \\ HT &= 10 \\ TH &= 110 \\ TT &= 111 \\ \end{align}$$
$$\begin{align} \text{Average bits per sequence} = & P(HH) \times 1 \text{ bit } + P(HT) \times 2 \text{ bits } + \\ & P(TH) \times 3 \text{ bits } + P(TT) \times 3 \text{ bits} \\ = & 1.6875 \text{ bits} \\ \text{Average bits per toss} = & 1.6875/2 \\ = & 0.84375 \text{ bits} \\ \end{align}$$
Tells us the theoretical limit for compression
The irreducible information content of a source/ variable
A way of thinking about how valuable data is
Bits of information required to communicate an outcome$$\begin{aligned} I(\text{outcome}) &= log_2 \frac{1}{P(\text{outcome})} \\ &= - log_2 P(\text{outcome}) \end{aligned}$$
Average bits of information per outcome$$H(\text{source}) = \sum_{\text{outcome} \in \text{source}} {-P(\text{outcome}) \log_2 {P(\text{outcome})}}$$
More information
requires
more data
Keep more decimal places than you need to show
Don't convert continuous variables (numbers) to discrete ones (categories)
You can't unsimplify
More data doesn't
mean more information
A way to structure your data to reduce redundancy
Show | Actor |
---|---|
Brooklyn 99 | Stephanie Beatriz, Terry Crews |
Show | Actor |
---|---|
Brooklyn 99 | Stephanie Beatriz |
Brooklyn 99 | Terry Crews |
Show | Rating | Language | Subtitles |
---|---|---|---|
Rick and Morty | 4 | English | Available |
Rick and Morty | 4 | German | Unavailable |
Show | Rating |
---|---|
Rick and Morty | 4 |
Show | Language | Subtitles |
---|---|---|
Rick and Morty | English | Available |
Rick and Morty | German | Unavailable |
Show | Genre | Sub-Genre |
---|---|---|
Doctor Who | Fiction | Sci-Fi |
Show | Sub-Genre |
---|---|
Doctor Who | Sci-Fi |
Sub-Genre | Genre |
---|---|
Sci-Fi | Fiction |
What data scientists want...
Example ID | Feature A | Feature B | Feature C | Classification |
---|---|---|---|---|
... | ... | ... | ... | ... |
... | ... | ... | ... | ... |
... | ... | ... | ... | ... |
Informative variables correlate with the objective
but not with each other
Defining the Question...
If in doubt, follow the money!
e.g. Unitless measures
Describe and explain your data
Aid discovery and interpretation
Track provenance
Handles to manipulate your data
{
"2017": 8000,
"2018": 10000,
"2019": 15000
}
[
{ "date": "2017", "pageviews": 8000 },
{ "date": "2018", "pageviews": 10000 },
{ "date": "2019", "pageviews": 15000 }
]