The Product Team is at the core of designing and building the future of data collection, management and analysis. We leverage and develop the most modern open source and proprietary technologies in Machine Learning, Distributed Computing and Robotic Process Automation.
Our team is building a suite of machine learning tools to help solve problems in the scientific space. This includes the linking of researchers to their publications, sentiment analysis of citations, record linking, predicting the ROI of a Grant, understanding the landscape of academic research and much more. In order to power this we need the best data availability and integrity. This role will be working to build and maintain this infrastructure.
- Experience with large datasets
- Fundamental/broad understanding of distributed computing
- 2+ years of Data Engineering experience
- Construction of scalable ETL pipelines
- Test driven development
- Excellent Software Engineering skills
- Python, Spark, Hadoop, SQL, Git, Functional Programming
- Strong communication skills - both verbal and written – is a must.
- Familiar with Agile Methodologies and Tools (Jira)
- Computer Science or Engineering Degree