Improve your data

Lessons learned from analysis, explained with the science of information entropy

These slides were made with reveal.js. Watch out the slides go up and down as well as left and right! You can navigate with arrow keys or hit escape for the overview.

The talk was originally presented at Open Data Manchester on 27/05/2014. A more in depth discussion of these ideas can be found on the Infonomics blog.

Robin Gower / @robsteranium

What makes data good?

TBL's 5 Star Scheme

Increase Information Entropy

Uncertainty - range of possible states

Data resolves uncertainty


Quantity


Clarity


Novelty

More is better than less

Link with code(lists)

BBC - MusicBrainz

http://www.bbc.co.uk/music/artists/5441c29d-3602-4898-b1a1-b77fa23b8e50

Positive network effects

Reference data

MECE principle

Mutually exclusive - no overlaps

Collectively exhaustive - no gaps

Mutually Exclusive, Collectively Exhaustive

Normalise


Efficiency


Reliability


Integrity

Be Precise, allow user to simplify

There's no such thing as unsimplification

Don't categorise continuous variables unless you can't help it

If you have to categorise, do it after data has been collected

Once you've lost precision, you can't get it back

Represent Nothingness Accurately


Not available


Null


Zero

Provide Metadata

Increase Information Entropy

Open more data to resolve more uncertainty

Duplication leads to uncertainty

Normalised data - same variety, but smaller/ clearer

Precise data - more possible states

Accurate nothingness - don't leave questions

Metadata makes your data more certain

Don't interpret or summarise

Improve your data

Lessons learned from analysis, explained with the science of information entropy

The talk was originally presented at Open Data Manchester on 27/05/2014. A more in depth discussion of these ideas can be found on the Infonomics blog.

Robin Gower / @robsteranium