An Overview of Web Scraping: Technical Aspects and Exercises

Date

Publisher

Polytechnic University of Puerto Rico

Item Type

Article
  • Total Views Total Views11
  • Total Downloads Total Downloads3

Abstract

Researchers and organizations conducting different types of research can benefit from studying and using Web Scraping in a correct manner to further their research goals. This study serves as a review on some of the web scraping techniques and the legal and ethical implications of web scraping. Technical, legal, and ethical aspects of web scraping are discussed to better understand benefits and risks of using the web scraping process. Three exercises involving Web Scraping techniques are presented. One is performed by using the BeautifulSoup library in Python. The second exercise is performed using the web scraping tool Octoparse. Lastly, web scraping is performed using ParseHub. The three experiences are discussed to provide insight on how the different techniques and programs compare. Key Terms ⎯ BeautifulSoup, Octoparse, ParseHub, Web scraping.

Description

Design Project Article for the Graduate Programs at Polytechnic University of Puerto Rico

Keywords

Citation

Pérez Molano, G. (2023). An Overview of Web Scraping: Technical Aspects and Exercises [Unpublished manuscript]. Graduate School, Polytechnic University of Puerto Rico.

Collections