IDS: Image Analytics

Image Analytics

Gather a collection of at least 50 up to 100 images (the more, the better), preferably related to biology, medicine, natural sciences, or, if not, related to your hobby or any interest. If you can not find those, find any image set of your liking on the web or even in your photo album.

Ensure the images are in jpg or png format and not too big to avoid overburdening Orange's embedding server. Note that the image embedders that are available in Orange reduce images to about 200 x 200 pixels, so there is no need to use images in very high resultion. Place the image files in a folder, and use sub-folders, to indicate classes. Load images using the Import Images widget. Check them out with the Image Viewer to make sure they have loaded correctly.

Now apply the skills you have learned in this class to get insight into your image set.

Cluster images using either hierarchical clustering or k-means and comment on the quality and meaningfulness of clusters. Analyze the content of the clusters. Use combination of cluster selection (e.g., select a cluster in the dendrogram, or select images with a specific cluster ID coming out of a k-Means Clustering widget) and Image Viewer to see if clustering makes sense. Comment on results. Hint: use cosine distance in the Distance widget.
Arrange your images into groups (classes) by placing them in appropriate sub-folders. Can associated classes be predicted from image embeddings, that is, from their vector-based representations you get from the Image Embedding widget? Report on cross-validated accuracy; if your image set is small, use 5-fold cross-validation or leave-one-out instead of default 10-fold cross-validation. You can also comment on the types of mistakes your selected learner makes, that is, use the Confusion Matrix widget. Analyze the types of classification mistakes by, for instance, showing a set of images that were misclassified. Comment on results. Hint: use logistic regression as a classifier, no need to compare it to the other classifiers (why?).
Project images into two-dimensional space: use either MDS, or t-SNE. If using MDS, make sure you pass your data through the Distance widget where you should use cosine distance. Report if the projection makes sense by analyzing the results.
For comparison with the results obtained in (3) and for additional illustration, you can include the visualization produced by Image Grid widget. Briefly comment how does this widget place the images on the grid (e.g., check out widget's documentation). Comment on the groups you can spot from this visualization.

Include anything else, that is, any other analysis that you think makes sense and sheds light on your image collection.

Submit the homework as a short report in PDF. The report should include the title of the homework, your name, and your email. Start with a paragraph that introduces and describes your data set, best also including a figure with a sample of class-labeled images (use Image Viewer). The report should be at most three pages long (this limit is strict!). Use 11 pt Calibre or Arial or similar sans-serif font, and 1.2 spacing between lines. Use 6 pt separation between paragraphs.