We build high-quality training datasets, backed by the experience of evaluating over 26 million developers.
Curate a custom dataset to train your model on specific software development skills.
Check out a sample here
Access a workforce of software development experts to label and annotate your dataset.
Check out a sample here
Test your model performance with a custom evaluation dataset.
Check out a sample here
Each engagement kicks off with a consultation. Whether you know exactly what data you need, or simply have a vague idea of your goal, we’ll walk away with a clear understanding of how your model will improve and what data is needed to get there.
Your dataset is prepared by our SME network. All experts have successfully passed a hands-on technical assessment. You can trust that the same expert network we’ve built over the last decade to create the content to assess human developers will curate a high-quality dataset for your project.
You have the option to have us evaluate your model both using an out-of-sample subset of the dataset as well as our own evaluation methodology. We can work with you to create custom metrics or ensure that your model is meeting metrics we've created through our evaluation research via ASTRA.
Your dataset will then go through quality review. We apply both automated checks and human review. Our tooling automatically looks for quality dimensions such as dataset completeness and redundancy. We’ll remove any rows of data that don't meet quality criteria, such as minimum number of test cases for a given challenge, or poor English grammar in written responses.
Once the finalized dataset is ready, we'll deliver it through a secure transfer method, typically SFTP or a password protected S3 bucket. How you opt to receive your data is entirely up to you.
Find answers to common questions about our dataset creation and model evaluation services across the SDLC.
We focus on the creation & curation of software development datasets for training and evaluating Large Language Models (LLMs). We work across languages and technology stacks, enabling us to mobilize expert software developers to produce rich and complex datasets, across various formats, including code completion examples, code annotations, labelling tasks, and problem-submission pairs.
Our dataset preparation process involves strict checks to address common quality issues such as sparsity, corruption, and redundancy, including de-duplication. We combine human review with an automated quality review process leveraging machine learning.
With the release of HackerRank-ASTRA, we're focused on designing and developing empirically validated evaluation metrics to be used for model evaluation when the models are applied to real-world software development tasks. You can learn more about our evolving evaluation harness and metrics by visiting our HackerRank ASTRA page.
Through comprehensive evaluation and benchmarking of state-of-the-art models across various languages, stacks, and the software development lifecycle, we can rapidly pinpoint limitations in current leading model capabilities. This then guides our dataset creation efforts. You can learn more about our dataset creation methodology here.