Wednesday, December 1, 2010

Google Refine lets you fix and handle huge, messy sets of data

Google Refine lets you fix and handle huge, messy sets of data
googlerefine

Google has just introduced a new product, and this time it's a PC application (with a browser-based UI). It's called Google Refine, and it solves a problem that is enormous for some people: it lets you take massive sets of "messy data" and massage them into shape so that they're uniform, make sense, and can be statistically analyzed.

The video after the jump shows a very good example, which is based on a CSV file exported from a publicly available data source (a government contract system, in this case). The data is very realistic - descriptions are inconsistent (Firm Fixed Price on some rows and FFP on other rows), and even the number formats are inconsistent (you get 0.78 on one row and a number in the millions on another row).

Google Refine lets you very easily hone in on those inconsistencies and fix them in a myriad of ways. This is an important data tool because those heaps of messy data are often public records, which are available but not transparent; being able to quickly analyze them could expose some very interesting patterns and anomalies in the way that public institutions and governments behave.

[Thanks, Yanksy, for the tip!]

Continue reading Google Refine lets you fix and handle huge, messy sets of data

Filed under: Utilities, Google

Google Refine lets you fix and handle huge, messy sets of data originally appeared on Download Squad on Wed, 17 Nov 2010 10:30:00 EST. Please see our terms for use of feeds.

Read�|�Permalink�|�Email this�|�Comments




Angelina Jolie
Vanessa Hudgens
Danneel Harris
Zoe Saldana

No comments:

Post a Comment