Computer vision acronyms, as silly as any other computing acronyms and project names.

ossington@mastodon.xyz

Computer vision acronyms, as silly as any other computing acronyms and project names. Genuinely working on a project in which we use both YOLO and COCO, because computers are very serious.

ossington@mastodon.xyz

Anyway, I'm just starting to read the original paper on COCO (Common Objects in Context) and am already amused by "Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old." Does this explain why "teddy bear" is an object category all of its own?

(Why yes, I seem to be lightly live-tooting reading computer science papers, why do you ask?)

ossington@mastodon.xyz

Also, this paper from 2015 is a bit of a callback to the "more innocent" days of computer vision when labelling was done with people earning piecework on Mechanical Turk, instead of in more formalized labelling outsourcing shops... Not actually more innocent, because Mechanical Turk has also always been a creepy and exploitative method of getting microtasks done. Just, the Overton window of creepy has really moved in the last couple of years...

ossington@mastodon.xyz

Okay, "To further augment our set of candidate categories, several children ranging in ages from 4 to 8 were asked to name every object they see in indoor and outdoor environments." So not just that objects should be recognizable to a four-year old, but they actually used real children to come up with categories of objects.

ossington@mastodon.xyz

And one of the joys of using pay-per-microtask platforms to label images: “Since we have 91 categories and a large number of images, asking workers to answer 91 binary classification questions per image would be prohibitively expensive.”

ossington@mastodon.xyz

But also, the nitty-gritty you get in papers like this is very refreshing compared to the popular discourse on "AI." Like, they define how many worker-hours it took to do specific stages of their labelling process. Because it's a scientific paper, this stuff is actually written down with a degree of honesty, rather than being some trade secret. The honesty, at least, is a nice case of "simpler times" ten years ago.

ossington@mastodon.xyz

Now I just want to write about ontology in computer vision datasets

ossington@mastodon.xyz

Or. more to the point, the construction of reality in computer vision

ossington@mastodon.xyz

And though "goat" was included in the list of candidate categories, it didn't make it to the final 91 categories that were selected. This concludes my semi-live reading of the 2015 COCO paper: https://arxiv.org/pdf/1405.0312

ossington@mastodon.xyz

Spoke too soon. "Aardvark" also didn't make it into the final category list.

charette@mstdn.ca

@ossington Please don't use MSCOCO! And I'm speaking as the software developer that maintains YOLO!

See my description of MSCOCO here: https://codeberg.org/CCodeRun/darknet#mscoco-pre-trained-weights

Quote: "The MSCOCO pre-trained weights are provided for demo-purpose only."

People are expected to train their own network. MSCOCO is a horrible network to use. Join us on the YOLO discord if you want to read the past discussions on why MSCOCO is a horrible network to use.

CIRCLE WITH A DOT

Computer vision acronyms, as silly as any other computing acronyms and project names.