Talent Network: Lead Data Scientist
<p> </p><h3 id="about-toptal">About Toptal</h3> <p>Toptal is a global network of top talent in business, design, and technology that enables companies to scale their teams, on-demand. With $200+ million in annual revenue <strong>and team members based around the globe</strong>, Toptal is the <a data-faitracker-click-bind="true" href="https://www.toptal.com/remote-work-playbook">world’s largest fully remote workforce</a>.</p> <p>We take the best elements of virtual teams and combine them with a support structure that encourages innovation, social interaction, and fun. We see no borders, move at a fast pace, and are never afraid to break the mold.</p> <h3 id="job-summary">Job Summary</h3> <p>We are looking for a Senior Data Scientist to join us as the first Data Scientist on a new product we are building. This is a founding role: you will shape the data science function from the ground up, set technical direction, and own the end-to-end delivery of intelligent systems that define how our product creates value. You will tackle open-ended problems involving Task Mining, Process Mining, behavioral workflow analysis, pattern discovery, predictive modeling, and applied GenAI/ML systems. The goal is not just to build models, but to turn raw interaction data into measurable product and business impact: discovered workflows, bottlenecks, optimization opportunities, and scalable foundations for future DS/ML work.</p> <p>This is a remote position. We do not offer visa sponsorship or assistance. Resumes and communication must be submitted in English.</p> <h3 id="responsibilities">Responsibilities</h3> <ul style="list-style-type: disc;"> <li>Act as the founding Data Scientist on the product: define the DS strategy, choose the right tools and frameworks, and establish best practices.</li> <li>Design and build Task Mining and Process Mining solutions that transform raw interaction data into discovered workflows, patterns, bottlenecks, and optimization opportunities.</li> <li>Design, develop, and deploy ML systems and data pipelines for large-scale structured, unstructured, and event/interaction data.</li> <li>Build predictive and pattern-discovery solutions using supervised and unsupervised learning, representation learning, sequence modeling, and LLM/GenAI approaches where appropriate.</li> <li>Establish practical foundations for dataset construction, labeling strategy, offline/online evaluation, monitoring, feedback loops, and human-in-the-loop review where needed.</li> <li>Own projects end-to-end, from problem framing and experimentation through production deployment and iteration. Collaborate closely with engineering on data instrumentation, pipeline design, deployment, and integration of production-ready services.</li> <li>Communicate findings, tradeoffs, and technical concepts effectively to both technical and business stakeholders.</li> </ul> <h3 id="qualifications-and-requirements">Qualifications and Requirements</h3> <ul style="list-style-type: disc;"> <li>5+ years of professional experience in Data Science, Machine Learning, or Applied ML roles.</li> <li>Demonstrated experience operating as the sole or lead Data Scientist on a product or team — owning problems end-to-end without senior DS supervision.</li> <li>Strong experience with supervised and unsupervised ML, modern ML/data tooling, and the judgment to select the right approach for the problem.</li> <li>Practical familiarity with representation learning, sequence modeling, Transformers, LLMs, or GenAI systems where relevant to product use cases.</li> <li>Experience handling large-scale structured, unstructured, event, or interaction datasets.</li> <li>Advanced proficiency in Python and SQL, with hands-on experience using tools such as PyTorch, scikit-learn, pandas/Polars, experiment tracking, and production ML workflows.</li> <li>Experience deploying ML models, data pipelines, or intelligent systems into production.</li> <li>Familiarity with Task Mining, Process Mining, event-log analysis, behavioral analytics, workflow automation, or adjacent domains.</li> <li>Advanced degree in Computer Science, Data Science, AI, Statistics, Mathematics, or a related field is a plus; equivalent practical experience is strongly valued.</li> </ul> <h3 id="what-we-are-looking-for">What We Are Looking For</h3> <ul style="list-style-type: disc;"> <li>A founder’s mindset: full responsibility for outcomes, not just deliverables.</li> <li>Comfort operating in high ambiguity: able to turn unclear product goals, noisy data, and incomplete requirements into an executable roadmap.</li> <li>Strong business sense — connects technical work to commercial impact and measurable product value.</li> <li>Pragmatic technical judgment — knows when to use advanced ML, when to simplify, and when better data, labeling, or evaluation is the real bottleneck.</li> <li>Ability to build foundations for rapid scaling: reusable datasets, pipelines, metrics, evaluation frameworks, and modeling patterns future DS/ML hires can build on.</li> <li>Highly proactive problem solver who acts without waiting for detailed instructions.</li> <li>Excellent communication skills, with the confidence to push back constructively and propose direction.</li> </ul> <h3 id="nice-to-have">Nice to Have</h3> <ul style="list-style-type: disc;"> <li>Previous experience as a first or early Data Scientist at a startup or new product line.</li> <li>Direct experience with Task Mining, Process Mining, workflow intelligence, RPA, or productivity analytics.</li> <li>Experience with LLMs and Generative AI applications, especially evaluation, structured outputs, semantic labeling, summarization, or human-in-the-loop workflows.</li> <li>Experience working with privacy-sensitive behavioral, productivity, or user-interaction data.</li> <li>Experience with product experimentation, causal inference, or measuring the impact of workflow/process interventions.</li> <li>Knowledge of MLOps and distributed processing frameworks, such as Spark.</li> <li>Experience with cloud environments, especially GCP.</li> </ul> <p></p> <p></p><p><br></p><p></p>