Data Scientist

Athari

Lexington, MA, USA

Published: 6/14/2022

Technology

Full Time

Job Description

 Designs, develops, and implements methods, processes, and systems to consolidate and

analyze diverse data sets including structured and unstructured.

 Develop software programs, algorithms, dashboards, information tools, and queries to

clean, model, integrate and evaluate datasets. Keeps abreast of new analytic

methodologies and technologies.

 Collaborate with functional business units to drive business solutions and direction.

Key Responsibilities include but not limited to:

 Design, implement, and maintain enterprise-scale search solutions using Apache Solr

 Develop and optimize semantic search capabilities using vector embeddings and neural

search models

 Build custom indexers and indexing pipelines that support vector embeddings alongside

traditional text fields

 Implement and tune Approximate Nearest Neighbor (ANN) algorithms for efficient

similarity search at scale

 Design and optimize similarity functions (cosine, dot product, Euclidean) for various

search use cases

 Build hybrid search systems that combine traditional keyword-based search with vector-

based semantic search

 Perform traditional relevancy engineering including query analysis, field weighting,

boosting strategies, and result tuning

 Conduct relevancy analysis using quantitative metrics and qualitative evaluation methods

 Monitor search performance metrics and implement continuous improvements

 Work cross-functionally with product, engineering, and data teams to define search

requirements

Required Qualifications:

 5+ years of hands-on experience with Apache Solr or Lucene in production environments

 Strong expertise in traditional relevancy engineering including query parsing, field

boosting, function queries, and relevance tuning

 Proven experience conducting relevancy analysis using both automated metrics and

manual evaluation techniques

 Strong expertise in vector embeddings and their application to semantic search

 Proven experience building hybrid search systems that combine keyword and vector-

based approaches

 Knowledge of search relevance metrics (NDCG, MRR, precision/recall)

 Excellent problem-solving and analytical skills

 Strong communication skills and ability to work in collaborative environments

Nice to Have:

 Databases and Data Engineering for Big Data

 Elasticsearch

 Statistical Methods

Clearance:

 Candidates should have an active clearance (secret/top secret, etc.) in order to be

considered for this position due to the nature of the work being done. Do not submit

candidates if they do not meet this requirement.

Work Location:

 This position has the ability to work hybrid, remote or onsite. Please list which the

candidate prefers in the write up.

Interview Process:

 1st round interview will be a Zoom with the hiring manager. 2nd round interview will be

a Zoom with additional team members as needed.

Must Have

Data/Reporting

Data Analysis 5 years

R, Python, SQL, and Machine Language Algorithms and Data Analysis. 5 years

Degree Level

Bachelors Degree Yes

Experience

Currently holds a Secret Clearance (OR a higher clearance) Yes

Quantitative relevancy analysis and tuning 5 years

Vector embeddings semantic search 5 years

Programming

C/C++, Java, Python, Bash, SQL, Java Script / HTML / CSS, Matlab 5 years

Software Tools

Apache Solr and creating data pipelines for search products 5 years

Nice to Have

Data/Reporting

Databases and Data Engineering for Big Data 0 years

Elasticsearch 0 years

Statistical Methods 0 years

Duration: 6 Months

Security Clearance Requirement: Yes

Security Clearance Level: Active Secret

Location: Lexington, MA 02421

Pay Range: $75-$100 an hour