Urgently Hiring
Web Scraping Specialist
Skills
About the Role
You will build, test, and refine code to extract data from diverse online sources, handling complexities like pagination and dynamic AJAX content. You will clean and format extracted data to meet quality standards and design efficient NoSQL storage solutions. You will deploy and manage scraping jobs on cloud platforms, monitor processes, and resolve issues to maintain continuous data flow. You will apply machine learning methods where useful for data cleaning or categorization and contribute to open source projects related to scraping and data processing.
Requirements
- Portfolio or examples of past web scraping projects demonstrating ability to extract from complex sites
- Proficiency in Python or JavaScript
- Experience with BeautifulSoup Scrapy or Selenium
- Knowledge of asynchronous programming multithreading and distributed scraping
- In depth knowledge of HTML CSS JavaScript and the DOM
- Experience with NoSQL databases such as MongoDB or Cassandra
- Experience with cloud services AWS Google Cloud or Azure for deploying scraping jobs
- Experience applying machine learning algorithms for data cleaning categorization or predictive analysis is a plus
- Active participation in relevant open source projects
Responsibilities
- Write test and refine code that extracts data from various online sources ensuring reliability and efficiency
- Perform data retrieval handling pagination and dynamic AJAX loaded content
- Clean and format extracted data to meet quality standards
- Design and manage databases for scraped data optimizing access speed and data integrity
- Monitor scraping processes and identify and resolve issues to maintain continuous data flow
- Optimize scraping processes for distributed and large scale crawling
Benefits
- benefits package
- equity package
