An Overview of Web Scraping: Technical Aspects and Exercises
| dc.contributor.advisor | Duffany, Jeffrey | |
| dc.contributor.author | Pérez Molano, Gustavo | |
| dc.date.accessioned | 2024-01-11T13:19:49Z | |
| dc.date.available | 2024-01-11T13:19:49Z | |
| dc.date.issued | 2023 | |
| dc.description | Design Project Article for the Graduate Programs at Polytechnic University of Puerto Rico | en_US |
| dc.description.abstract | Researchers and organizations conducting different types of research can benefit from studying and using Web Scraping in a correct manner to further their research goals. This study serves as a review on some of the web scraping techniques and the legal and ethical implications of web scraping. Technical, legal, and ethical aspects of web scraping are discussed to better understand benefits and risks of using the web scraping process. Three exercises involving Web Scraping techniques are presented. One is performed by using the BeautifulSoup library in Python. The second exercise is performed using the web scraping tool Octoparse. Lastly, web scraping is performed using ParseHub. The three experiences are discussed to provide insight on how the different techniques and programs compare. Key Terms ⎯ BeautifulSoup, Octoparse, ParseHub, Web scraping. | en_US |
| dc.identifier.citation | Pérez Molano, G. (2023). An Overview of Web Scraping: Technical Aspects and Exercises [Unpublished manuscript]. Graduate School, Polytechnic University of Puerto Rico. | en_US |
| dc.identifier.uri | http://hdl.handle.net/20.500.12475/1995 | |
| dc.language.iso | en | en_US |
| dc.publisher | Polytechnic University of Puerto Rico | en_US |
| dc.relation.haspart | San Juan | en_US |
| dc.relation.ispartof | Computer Science; | |
| dc.relation.ispartofseries | Fall-2023; | |
| dc.rights.holder | Polytechnic University of Puerto Rico, Graduate School | en_US |
| dc.rights.license | All rights reserved | en_US |
| dc.subject.lcsh | Polytechnic University of Puerto Rico--Graduate students--Research | en_US |
| dc.subject.lcsh | Data mining | en_US |
| dc.subject.lcsh | Python (Computer program language) | |
| dc.subject.other | Web scraping | |
| dc.title | An Overview of Web Scraping: Technical Aspects and Exercises | en_US |
| dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- PUPR_CEAH_SJU_FA23_MCS_Gustavo Perez_Article.pdf
- Size:
- 215.94 KB
- Format:
- Adobe Portable Document Format
- Description:
- PUPR_CEAH_SJU_FA23_MCS_Gustavo Perez_Article
License bundle
1 - 1 of 1
- Name:
- license.txt
- Size:
- 1.63 KB
- Format:
- Item-specific license agreed upon to submission
- Description: