Image-Match: Open-source scalable reverse image search

danso · on March 9, 2016

This is sweet...image-matching is one of those functions that external services do extremely well for the consumer -- e.g. Google Image Search and TinEye -- but not for those who need to have such a service in a private domain, such as an in-house photo library...I've used pHash for comparisons, and have a decent idea of how to build my own classifiers...but pretty much no idea how to do it efficiently and in a structured way to do reverse-image matching.

FWIW, John Resig uses pastec for his work:

http://ejohn.org/blog/image-similarity-search-wanted/

http://ryanfb.github.io/etc/2015/11/03/finding_near-matches_...

gobengo · on March 9, 2016

The LSH wikipedia page is a fun read and relevant https://en.wikipedia.org/wiki/Locality-sensitive_hashing

Here's a real-world use case http://blog.livefyre.com/architecting-sidenotes/

richmarr · on March 10, 2016

You might want to look at Morelikethis queries to boost performance. I worked on a proprietary version of this and at the time Lucene performance dropped off nearly linearly with the number of query terms.

We used MoreLikeThis to reduce our queries count to the 30-40 most statistically interesting terms. The one hiccup being an issue in Lucene [1] where the term cache wasn't operating properly. We just added our own image query term cache and a custom MLT query to leverage it, which gave us a 10x speed bump over any other methods we tried.

The interestingness of the terms is assessed on a per-term basis though, so you might see a relevence drop for some types of image if you set MoreLikeThis to use too few terms.

[1] https://issues.apache.org/jira/browse/LUCENE-1690

rhsimplex · on March 10, 2016

Thank you for the suggestion. I actually did try restricting the terms by measuring correlation between columns -- the idea being that more discriminating terms should be searched first. This did result in modest speedups.

Fortunately or unfortunately, we were already achieving pretty good speed with Elasticsearch so we didn't implement it. However, it didn't occur to me to try a MoreLikeThis query, which should be even simpler -- I will look into it!

richmarr · on March 10, 2016

Cool. Impressive project by the way; I forgot to say that before.

I tried something similar; but with a different approach. I tried creating compound words, a bit like n-grams. I didn't get it working as that was a side-project and I couldn't commit enough time.

mkoryak · on March 9, 2016

Last time I looked into doing some content based image matching in nodejs the best I could find was a node-phash fork that was difficult to get working on osx.

Does anyone know if this has changed since?

pilooch · on March 9, 2016

An implementation of image similarity search based on deep convnets can be found at https://github.com/beniz/deepdetect/tree/master/demo/imgsear...

th0br0 · on March 9, 2016

Uhh... that's a fork. This is the original repo: https://github.com/rhsimplex/image-match

michaelbuckbee · on March 9, 2016

It appears that since your comment was posted the original repo's README was updated with a notice stating it was no longer maintained and that OP's linked repo is actually the correct one.

rhsimplex · on March 10, 2016

Hey, author here. We were using the code internally and it was kind of a mess so I fixed it up under my own repository before forking it into ascribe's repository.

Rest assured, this is the correct one!