San Francisco, CA, US

At AirPR, we are passionate about building software that solves important problems in marketing. We partner with the most valuable companies in the world to transform how they use data and technology to drive marketing and brand decisions. Our software has been used to strategize responses to a brand crisis, discover new content and influencers, and gain an edge in the global online business world.

AirPR Inc. seeks a Software Engineer for our San Francisco, California office:


  • Design, develop, deploy, and manage distributed web crawling software using Python, Java, or Scala to find and process news articles, blog posts, and other content from across the internet.
  • Deploy, scale, and maintain crawler on cloud platforms, including Amazon Web Services and Google Cloud.
  • Develop and maintain software systems to extract article text and structured metadata from webpages as well as convert and annotate unstructured text to machine queryable objects; to monitor effectiveness and error rates of extraction processes; and to ensure crawler reliability and resilience to error conditions, including but not limited to accepting improperly formatted inputs, network problems, and blockages.
  • Develop software to determine and score relevance of article data to customer queries automatically, across different languages and in real-time.
  • Design and develop interfaces to store, search, and transfer article text to other systems based on programmatically generated full text search queries and user input.
  • Develop systems to reliably perform large scale reprocessing of data.
  • Scale and debug large distributed systems, and build monitoring to provide visibility into system performance and other metrics.
  • Perform code profiling, troubleshooting, analysis, and optimization on software systems to determine and eliminate bottlenecks.
  • Perform software operations tasks to manage databases and servers on the Amazon Web Services cloud, or similar cloud providers, and optimize performance and decrease latency.

Minimum Requirements:  

Bachelor’s degree in Computer Science, Information Systems Engineering, or a related field, followed by 5 years of progressive, post-baccalaureate experience in the job offered or in a software development-related occupation.

Special Requirements:

Position requires at least 2 years of experience in each of the following skills:

  1. Build, deploy and monitor distributed data processing applications utilizing Cloud Platforms such as Amazon Web Services, Azure, or Rackspace Cloud.
  2. Designing and developing large scale web crawler systems to find and process news articles, blog posts and similar non-structured content, using open source frameworks such as Apache Nutch or Scrapy.
  3. Developing systems to extract structured data from web pages such as title, content, publication date, author, using Python programming language to transform raw crawled data into machine processable records.
  4. Maintaining and scaling production web crawling platforms and handling common issues such as IP rate limiting, Javascript execution, and malformed sites to ensure health and performance of the system as well as quality and volume of content crawled.
  5. Experience designing and developing systems to perform large scale data processing (log processing, data aggregation, reporting) utilizing Python programming language.
  6. Operations/DevOps expertise in deploying, maintaining, and monitoring distributed systems in Unix/Linux based platforms, using tools such as Fabric, Chef, Ansible, Puppet, Bash.
  7. Identifying, debugging, and resolving performance issues and bottlenecks in data intensive software applications and components using profilers, debuggers and similar tools.
  8. Designing data schemas, writing client applications and troubleshooting query performance for large scale and data intensive software applications using open source relational databases such as MySQL and PostgreSQL.
  9. Experience designing and using full text search systems such as Lucene, Apache Solr, and ElasticSearch in order to store and search article data.

Proof of authorization to work in U.S. is required if hired.  The company is an Equal Opportunity Employer and fully supports affirmative action practices.