Large policy and research organizations often face the same problem: they have extensive document collections, but the relationships inside those documents are difficult to see as a system. Reading isolated reports may explain individual topics well while still hiding the structure of how those topics influence one another.
That is where a combination of NLP and data visualization becomes useful.
The challenge
In this project context, the goal was to identify relationships between urban development and the broader set of UN Sustainable Development Goals across a large corpus of documents.
The practical difficulty was not only finding mentions of SDGs. It was recognizing:
- which goal areas were connected
- what type of relationship was described
- how often those links appeared
- which documents contributed the most signal
This is a good example of NLP being used as a discovery tool rather than as a generic classification exercise.
The pipeline
The underlying workflow can be thought of as five stages:
- define domain concepts and seed terms
- identify relevant passages in the document corpus
- classify relationships between concepts
- aggregate those relationships across reports
- visualize the resulting graph or network
That kind of pipeline is still relevant today for policy, research, compliance, and knowledge-management use cases.
Why the combination matters
NLP without visualization can produce a large volume of extracted relationships that are difficult to interpret. Visualization without strong extraction logic often produces attractive but weak diagrams.
Used together, they help organizations:
- move from document reading to system-level pattern discovery
- surface clusters and gaps that are hard to see manually
- compare how different documents or sources contribute to the picture
- communicate complex findings to non-technical stakeholders
This is especially valuable in policy environments where the system is inherently interconnected.
The main technical idea
The original solution combined:
- keyword and concept expansion
- text preprocessing and normalization
- relationship extraction logic
- aggregation across documents
- network-style visualization
The specific tooling can change over time, but the architecture remains useful: extract signal from language, structure it, then make it explorable.
What this kind of system is good for
A relationship-mapping workflow like this is useful in many settings beyond SDGs:
- policy analysis
- compliance and regulation mapping
- scientific literature review
- enterprise knowledge extraction
- strategy and market landscape analysis
The common pattern is a large text corpus where the valuable output is the relationship map rather than a single label.
Conclusion
This project remains a useful example of how NLP and data visualization can work together to make complex document ecosystems more legible. The real value is not in any one extraction method. It is in turning scattered textual evidence into a structure people can actually reason about.
That is still one of the strongest reasons to combine NLP with visual analysis today.
Need Help Turning Machine Learning Ideas Into Production Systems?
ActiveWizards helps teams design practical machine learning, NLP, and computer vision systems that can move from prototype to production.