Towards a Visipedia: Combining Computer Vision and Communities of Experts (Thesis)
Read PDF →, 2019
Category: Computer Vision
Overall Rating
Score Breakdown
- Latent Novelty Potential: 5/10
- Cross Disciplinary Applicability: 6/10
- Technical Timeliness: 5/10
- Obscurity Advantage: 2/5
Synthesized Summary
-
The thesis correctly identifies persistent, real-world challenges in scaling computer vision (long-tail data, efficient annotation, model deployment).
-
It offers valuable case studies with domain communities (birding, naturalists) and proposes concepts like explicit worker modeling, online data collection, and leveraging taxonomic structure for model efficiency.
-
However, the specific technical methods and empirical analyses presented are largely reflective of the computer vision and crowdsourcing paradigms of the mid-2010s.
Optimist's View
-
The paper proposes an online, sequential framework combining human and machine inputs with models of worker skill and image difficulty for complex annotations like bounding boxes and keypoints.
-
This paper's strength lies in explicitly modeling taxonomic relationships and dependency between worker labels in a large-scale multiclass setting.
-
The Taxonomic Parameter Sharing (TPS) concept, which uses the inherent taxonomic structure of the output space to inform parameter sharing in the final layer, is a prime example of leveraging domain knowledge for model efficiency.
Skeptic's View
-
The core methods, particularly in Chapters II and VII, are tied to the state-of-the-art from the mid-to-late 2010s.
-
The specific techniques developed... may have lacked sufficient distinctiveness, generality, or robustness compared to concurrent or immediately subsequent work.
-
The computer vision component integrated into the crowdsourcing (Chapters III and IV) relies on older techniques (linear SVM on fixed VGG features), which are fundamentally less powerful than modern end-to-end deep learning models.
Final Takeaway / Relevance
Watch
