Machine scoring meets recruiter scoring

Identifying valuable language in an interview transcript using Natural Language Understanding

Natural language understanding is an algorithm that learns, or is trained to identify, phrase-based language from unstructured, conversational text. This means taking raw text (for example, an interview transcript) and analysing it in such a way that we can extract information from it in a format that a computer understands and can use. This is a challenging problem because human language is so complex with multiple variations in the spoken word. It also needs to take into account the context of the domain and subject matter. It is noteworthy that natural language understanding is not natural language processing. It is looking for phrases, not keyword matching. Natural language machine learning models use semantic knowledge to recognize subject, syntax, context, language patterns, unique definitions, and intent. Using the information identification technique, the model identifies valuable phrases in an interview transcript and predicts them to display a predefined competency language structure that the algorithm is trained to analyze.

Competency Recognition consists of two substeps: Phrases Identification and Phrase Classification. That means that the algorithm first identifies the valuable phrases mentioned in an existing transcript, and only then do we assign them to a particular class in our list of predefined competencies.

Testing the performance and accuracy of a machine algorithm with real-world data

During the interview, a recruiter will listen and observe the candidate’s language and speech to analyze the candidates fit with the job role and desired competencies. Typically, a recruiter will use a scorecard format to score a candidate across these specific measures. So how do we know that the algorithm is performing the same process as effectively and fairly as a recruiter?

Before an algorithm can be evaluated for its precision and accuracy it needs a lot of data and not less than hundreds of thousands of specific, trainable data points to teach the machine the language and subject matter. This training dataset needs to be real-word, actual samples of the type of data that the algorithm will be working with once deployed and launched. In this case we are talking about interview text transcripts taken from audio or video interviews.

The algorithm is tested after the training phase to confirm that it can screen candidates as accurately, or more accurately, and consistently than a recruiter through the reduction of unconscious bias and subjectivity by removing variation from recruiter to recruiter.

Validating the machine training against Recruiter scoring

So how is this validated? As a final step, the machine performance is compared to actual recruiter scoring on the same interview transcripts using a construct validity process. Through this process it can be proven that the scores are within an acceptable, or more than acceptable, range of accuracy.

When working with an algorithm for automated scoring it is recommended that a recruiting team review the scores generated through AI and run this process alongside recruiter scoring for a period of time to test the accuracy of the scores and identify any outliers or differences.

Our system allows for hidden AI for hiring teams that want to have AI running but not be influenced by this score. This data-assisted process will flag any result which has a 20%+ deviation from the recruiter score where the recruiter is flagged to review and consider the AI and recruiter score together. Most teams will use a data-assisted scoring process in the first phase of a rollout before implementing automated scoring. Hiring teams can also opt to request a custom trained dataset based on their specific target competencies and interview transcript dataset from candidates to increase the precision and relevance of the predictions.

This construct validity comparison process is a standard validation method which should be conducted at regular three month intervals through the year to ensure the algorithm continues to perform at the level of precision threshold required.

Finally, there are two ways for a machine model to be trained. The first is that it is trained on a dataset and then deployed as a final, rules-based model where the algorithm will not continue to learn but will only make predictions based on the controlled training that it has learned from. The second is a machine learning model where the machine algorithm dynamically learns from data as it is collected. We recommend a rules-based machine model to ensure a controlled dataset and to eliminate the introduction of bias into a dataset through unsupervised, machine learning.