The article discusses the developed model for recognizing a clothing brand by image. The model not only predicts the type and brand of clothing, but can also determine their similarity. At the initial stage, a dataset was collected containing images of clothing from various brands with a total volume of 9,000 images. In this work, the ViT (Vision Transformer) neural network architecture was used, a model for working with images, which was presented by experts from Google Brain. The vit-base-patch16-224 model acted as a representative of the transformer architecture. Before training, all images were converted to black and white, and data augmentation was also used: image rotation by a random angle, mirror transformation. All photos have been normalized – pixel coordinates have been adjusted to the interval [0,1].
Keywords: neural network, model, machine learning, Vision Transformer, fashion industry, clothing brand prediction, clothing type prediction, brand similarity determination