Web Scraping Specialist
Skills
About the Role
You will design, build, and maintain robust web scraping pipelines to reliably extract data from complex websites. You will write, test, and refine scraping code, handle dynamic content and pagination, clean and format extracted data, and store it in efficient databases. You will monitor scraping runs, identify and fix failures, and scale jobs using cloud infrastructure and distributed techniques.
Requirements
- Demonstrated ability to extract data from complex websites
- Proficiency in Python or JavaScript
- Experience with BeautifulSoup, Scrapy, or Selenium
- Knowledge of asynchronous programming, multithreading, and distributed scraping
- In-depth knowledge of HTML, CSS, JavaScript, and the DOM
- Experience with NoSQL databases such as MongoDB or Cassandra
- Experience with cloud services (AWS, Google Cloud, Azure)
- Ability to apply machine learning for data cleaning or categorization
- Participation in open-source projects related to web scraping or data processing
Responsibilities
- Lead data gathering and analysis from online sources
- Write, test and refine code to extract data from websites
- Handle pagination and dynamic content loaded via AJAX
- Clean and format extracted data to meet quality standards
- Store and manage scraped data in appropriate databases
- Monitor scraping processes and resolve operational issues
- Optimize scraping processes for reliability and scale
Benefits
- Remote work
- Equity package
