Constructing a Highlight Classifier with an Attention-based LSTM Neural Network

Radu & Kuehne (2020). FocusVision Worldwide Inc.

Why This Source Matters

This paper does something unusual: it frames a supervised learning problem as an exercise in crystallizing tacit knowledge. The authors argue that when you train a classifier on human-curated highlights from market research video transcripts, you’re not just building a text classifier, you’re aggregating and operationalizing the collective expertise of domain experts.

The technical results are solid (ROC AUC 0.93-0.94), but the conceptual framing is what makes this worth returning to.

Core Hypothesis

“There is a semantic difference between a highlight and a non-highlight.”

The authors demonstrate this is true and can be modeled with high accuracy.

Key Findings

Technical:

  • Attention-based LSTM with custom (BlazingText) embeddings and punctuation achieves best performance
  • Punctuation inclusion was the biggest performance driver for RNNs
  • TFIDF weighting didn’t outperform simple count vectorization, common words matter
  • Attention reduced training time by ~10x without accuracy loss
  • Standalone classifier performance (0.93-0.94 ROC AUC) degrades significantly when applied to large documents via sampling

Methodological:

  • Removed covariate shift between clip/non-clip length distributions using redistribution algorithm
  • Tested four sampling methods for long documents: sequential, h-score non-overlap, weighted h-score, positive summation
  • possum method best for recall; hscore method best for precision

Business context:

  • Industry turnaround: 2.2 hours human labor per 1 hour of video at $800/hour
  • Dataset: 130,000+ videos with 586,000 user-generated clips

The Tacit Knowledge Frame

The authors cite Nonaka’s knowledge creation theory:

“Tacit knowledge is highly personal and hard to formalize. Subjective insights, intuitions and hunches fall into this category of knowledge. Tacit knowledge is deeply rooted in action, procedures, routines, commitment, ideals, values and emotions.”

They argue that market researchers’ clip-selection process is driven by tacit knowledge models. Training a classifier to mimic this process “can be seen as the aggregation and implementation of all the tacit models of the market researchers.”

This reframes what ML models are (not just pattern matchers, but potentially crystallized expertise.

Extracted Content

Atoms:

Molecules:

Organisms:

Limitations Noted by Authors

  • Sampling algorithms introduce significant performance degradation
  • Time-dependent covariate shift, model performance varies by year of data
  • Single round of hyperparameter tuning, so room for improvement
  • BERT/Transformer architectures not tested

Related: 06-atom—tacit-knowledge, 05-molecule—attention-mechanism-concept