Three-Subgraph Knowledge Graph for Qualitative Research

Overview

A knowledge graph model designed to represent qualitative research data (interviews, focus groups, transcripts) in a way that supports analysis at multiple granularities.

The model consists of three loosely coupled subgraphs that can be developed and maintained semi-independently.

The Three Subgraphs

1. Business Context

Grounds each research study to its environment.

Industry → Company → Master Project → Project → Transcript
                                         ↓
                                     Facility (geography, demographics)

This subgraph answers: Who commissioned this research? What business questions does it address? Where was it conducted?

2. Project Artifacts

Defines the textual entities extracted from transcripts.

Transcript → Utterance → Code
                          ↓
               ┌─────────┴─────────┐
           Noun Phrase      Named Entity

Utterances are dialog turns, sequences of words from one speaker bounded by other speakers. They’re the atomic unit for analysis.

Codes are terms and phrases significant to the study’s “aboutness.” They provide the pool for tag recommendations.

Enriches internal data with external knowledge.

Code → Resource (DBpedia URI) → Type → Hypernym

Resources link codes to external descriptions, related terms, images. Types connect to standardized concept classes. Hypernyms provide parent-child relationships.

When to Use This Structure

This framework fits when:

  • You have transcribed interview or focus group data
  • Business context matters (who, why, where)
  • You want to link internal vocabulary to external knowledge bases
  • Analysis requires traversing relationships, not just searching text

Analytical Capabilities

Pattern-based querying: “What moderator utterances mention ‘diabetes’ and what were the responses?”

Cross-document analysis: “Top 5 named entities across all transcripts in this study?”

Code recommendation: Filter codes by entity link confidence, score by graph centrality (Degree, Betweenness, PageRank), recommend top N.

Limitations

  • Requires entity linking infrastructure (e.g., DBpedia Spotlight)
  • Quality depends on transcription and diarization upstream
  • Graph queries require technical skill (though UIs can abstract this)
  • 61% of codes may not link successfully; estimate ~13% are important misses

Related: 06-molecule—knowledge-graph-construction, 06-atom—entity-linking