Big Data but NOT in Hadoop: Industrial Strength Web Data Collection at Wolters Kluwer
Share this Session:
    Guy Hanan
Data Conversion Manager
Wolters Kluwer


Thursday, May 2, 2013
08:30 AM - 09:20 AM

Level:  Introductory

Wolters Kluwer provides information services to many verticals in 40 countries. Unprecedented growth in regulatory and social data has fueled demand for its products, but also poses significant scalability and complexity challenges in automating data collection. Mediregs, the healthcare unit, has a content database of over 10 million documents that must be constantly refreshed and added to. Their web extraction process, first manual and then automated through PERL scripts, was unable to scale and was being thwarted by dynamic web technologies. This impacted customer experience and revenue.

Mediregs discovered advanced web automation integrated with a data virtualization platform to automate access to websites like a human being would (fill forms, retrieve search results, interact with dynamic web elements, etc.), and retrieve and present information in a structured format, and do this with industrial reliability and performance. This has empowered the company to explore new market opportunities.

Attendees will learn the following from this session:

  1. Expand your view of Big Data and many ways to leverage Web data
  2. Challenges and complexities of dynamic web data extraction process
  3. How advanced web automation works and differs from scripts and desktop scraping tools
  4. Real-world web automation uses, challenges and lessons from Mediregs
  5. Future projects that couple web automation with data integration

Guy Hanan has a BA Stanford University and MBA from Harvard Business School. He has held Senior IT positions at three public companies in the 1980’s (mostly independent) consultant on electronic publishing in 1990’s and 2000’s. Since 2010 he has been employed as a data conversion manager at Mediregs, a unit of Wolters Kluwer Law and Business.

Close Window