Projects

Skrutable

My first and most popular project. I wanted a minimalistic but effective workspace where I could assemble in one place all the Sanskrit text-processing functions that mattered to me, which turned out to be transliteration, meter-related calculations, and word splitting. I also added features that have proved useful in my own academic work, like meter-agnostic scansion information. It’s powerful enough to apply to large amounts of text at scale, but also flexible enough to use for one-off situations day-to-day.

Mentioned in blog posts: Skrutable

Pramāṇa NLP

This curated corpus of Sanskrit philosophy texts is my model of simple but informative text digitization for facilitating NLP work. Like my other work, it’s open source and licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, so go ahead, take a look, and download it for your own use if you want.

Mentioned in blog posts: Pramāṇa NLP

Vātāyana

The digital centerpiece of my dissertation work, this text-mining system builds on the Pramāṇa NLP corpus and presents text and insights in an interactive front-end. It uses LDA topic modeling, TF-IDF, and local alignment to automatically find parallel passages, and a database of pre-calculations makes it possible to get an intertextuality summary for even an entire work in seconds.

Mentioned in blog posts: Vātāyana

Pandit Grapher

A curious project I think has real potential, despite its current lack of a web interface. The PANDiT project brought together a great deal of “prosopographical” information (e.g., names of authors, names of works, and connections between them) in a machine-actionable form, but there was no graphical component to the project. So what I do here is 1) do some simple preprocessing of the PANDiT database, 2) use networkx to create a graph structure for either the whole network or a subset of interest, and then 3) output this for use with better visualization software like Gephi, as pictured here. It only takes a few seconds, and then you have a nice visual aid for understanding the network around author X or work Y radiating outward for N number of steps.