Urgently Hiring
Research Crawling Engineer
Skills
About the Role
You will design, build, and operate large-scale web data acquisition systems used for research and model development. You will build and maintain distributed crawlers, handle anti-bot systems and dynamic JavaScript-heavy sites, and develop pipelines for cleaning, deduplication, filtering, and normalization. You will construct and maintain research datasets, monitor crawl performance and data quality, iterate quickly, and optimize infrastructure for cost, latency, and reliability. You will collaborate with research teams to align data collection with modeling needs and own end-to-end data acquisition pipelines.
Requirements
- Strong programming experience in Go Rust Python Java or C++
- Experience building web crawlers or large-scale data pipelines
- Solid understanding of HTTP networking and browser behavior
- Familiarity with distributed systems and parallel processing
- Experience working with large datasets (TB–PB scale preferred)
- Ability to debug unstable or adversarial environments
- Experience with headless browsers (e.g., Chrome DevTools Protocol Playwright Puppeteer)
- Familiarity with proxy systems IP rotation and large-scale request orchestration
- Experience running workloads on cloud or bare-metal infrastructure
- Experience with NLP pipelines dataset curation or LLM pretraining data (preferred)
Responsibilities
- Build and maintain large-scale web crawlers across diverse domains
- Design high-throughput fault-tolerant systems for data collection
- Handle anti-bot systems rate limits and dynamic JS-heavy sites
- Develop pipelines for cleaning deduplication filtering and normalisation
- Construct and maintain datasets for research and model training
- Monitor crawl performance coverage and data quality and iterate quickly
- Collaborate with research teams to align data collection with modeling needs
- Optimize infrastructure for cost latency and reliability
- Own end-to-end data acquisition pipelines
Benefits
- Equity package
- Fully remote work
- Benefits package
