Projects
Skrutable
My first and most popular project. I wanted a minimalistic but effective workspace where I could assemble in one place all the Sanskrit text-processing functions that mattered to me, which turned out to be transliteration, meter-related calculations, and word splitting. I also added features that have proved useful in my own academic work, like meter-agnostic scansion information. It’s powerful enough to apply to large amounts of text at scale, but also flexible enough to use for one-off situations day-to-day.
Mentioned in blog posts: Skrutable
Pramāṇa NLP
This curated corpus of Sanskrit philosophy texts is my model of simple but informative text digitization for facilitating NLP work. Like my other work, it’s open source and licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, so go ahead, take a look, and download it for your own use if you want.
Mentioned in blog posts: Pramāṇa NLP
Vātāyana
The digital centerpiece of my dissertation work, this text-mining system builds on the Pramāṇa NLP corpus and presents text and insights in an interactive front-end. It uses LDA topic modeling, TF-IDF, and local alignment to automatically find parallel passages, and a database of pre-calculations makes it possible to get an intertextuality summary for even an entire work in seconds.
Mentioned in blog posts: Vātāyana
Pāṇḍitya
I’m a great admirer of the Pandit Prosopographical Database of Indic Texts, which consolidates extensive information on Sanskrit authors, their works, and their interconnections. Pāṇḍitya builds on Pandit by leveraging its data to generate interactive network visualizations of these authors and works. Users can explore entities of interest via autocomplete dropdown menus, customize and resize graphs as needed, and interact with nodes to reposition them, recenter the graph, or navigate directly to entries in Pandit.