Harvesting the Web: the art of catching data on the fly

Loading Map....

Date/Time
Date(s) - 18/05/2017
6:00 pm

Location
London

Categories


Harvesting the Web: the art of catching data on the fly and how it’s much more than that

Seraphina Anderson

What is data harvesting? In a few words, data harvesting is the process employed to extract data off the internet. Integral with this process are programmatic automation techniques, data wrangling and a certain amount of research. Data is commonly output in structured form, but this may not always be the case.

Seraphina will be talking about the art of data harvesting: a rich landscape of pattern recognition, informal natural language processing, hacking, scraping, crawling and more. She discuss the processes of building crawlers, and how these weave together to create a ‘bespoke sieve’ or automation script for catching absolutely any information one might require: explicit, implicit, structured, unstructured or something in-between. It really isn’t as simple as breadth-first/depth-first search – the process is more organic than this. There are no limits to the creativity one can apply. Examples will be presented to demonstrate the variety of structure in which data can be found, and the engineering required to achieve this. No talk on data harvesting is complete without including fields in which data harvesting is useful, “housekeeping”, and the challenges of “being polite on the web”, not to speak of what it is like to reside in such a ‘grey’ zone.

Seraphina Anderson is a freelance Python Programmer, Data Harvester and automation specialist, with a background in the Visual Arts and an interest in Pure Mathematics and Computer Science. She holds a BA (Hons) in Fine Art Film and Video (Central St. Martins) and an MA in Animation (RCA), and is an affiliate member of the IMA.

Agenda: 6.00pm: Arrival , Networking
6.30pm: Welcome and Opening remarks, Christophe Le Lannou & Tom Khabaza
6.35pm: Harvesting the Web by Seraphina Anderson
7.30pm: Refreshments and More chatting

An event of The Society of Data Miners in association with The Royal Statistical Society

Register via Meetup