Towards a Visipedia: Combining Computer Vision and Communities of Experts (Thesis)

, 2019

Category: Computer Vision

Overall Rating

2.6/5 (18/35 pts)

The thesis correctly identifies persistent, real-world challenges in scaling computer vision (long-tail data, efficient annotation, model deployment).
It offers valuable case studies with domain communities (birding, naturalists) and proposes concepts like explicit worker modeling, online data collection, and leveraging taxonomic structure for model efficiency.
However, the specific technical methods and empirical analyses presented are largely reflective of the computer vision and crowdsourcing paradigms of the mid-2010s.

The paper proposes an online, sequential framework combining human and machine inputs with models of worker skill and image difficulty for complex annotations like bounding boxes and keypoints.
This paper's strength lies in explicitly modeling taxonomic relationships and dependency between worker labels in a large-scale multiclass setting.
The Taxonomic Parameter Sharing (TPS) concept, which uses the inherent taxonomic structure of the output space to inform parameter sharing in the final layer, is a prime example of leveraging domain knowledge for model efficiency.

The core methods, particularly in Chapters II and VII, are tied to the state-of-the-art from the mid-to-late 2010s.
The specific techniques developed... may have lacked sufficient distinctiveness, generality, or robustness compared to concurrent or immediately subsequent work.
The computer vision component integrated into the crowdsourcing (Chapters III and IV) relies on older techniques (linear SVM on fixed VGG features), which are fundamentally less powerful than modern end-to-end deep learning models.

Watch