Classify is a tool that lets you easily run Computer Vision or Natural Language services over your own data. All you need is an account with either Microsoft or Google with the relevant services enabled. See account setup instructions: Google Microsoft.
The tool comes with two commands, “classify images” and “classify text”. The first allows you to label and describe a folder full of images and save the results to a csv. Image descriptions are only available using Microsoft’s services at the moment but both return labels relevant to the content of your images.
./classify images --path=/path/to/images \
--vendor=microsoft \
--microsoft-key="INSERT KEY" \
--microsoft-endpoint="INSERT ENDPOINT" \
--output-file="my_labels.csv"
As you can see, the results are not bad but would require some checking. They would be very useful as a first pass over new unlabeled content.
The second command is designed to be used on database exports as part of a processing pipeline to clean and enrich.
./classify text --csv="./samples/samples.csv" \
--id-column="Accession Number" \
--classify-column="Description"
This will add additional fields that are returned from Google’s Natural Language API including identified persons, organisations, locations, events etc.
This is a quick proof of concept but it could easily be adapted to work with any data source. The tool is available on github https://github.com/rowead/classifier