After more than 20 years in retail, Swiss-based online marketplace Ricardo wanted to solve some of the problems its users were having when buying and selling items. Ultimately, Ricardo settled on a system based on the Ray open-source Python framework developed by AI vendor Anyscale.
Like most of these marketplaces, Ricardo previously used a category business model, categorizing the item sellers available for sale into different categories. However, sometimes end users clicked on the wrong categories and couldn’t find the items they wanted.
Because Switzerland has four official languages, customers sometimes described the product they wanted in a different language than the website.
To solve these problems, Ricardo wanted to switch from grouping items into categories to grouping items into product types. The retailer wanted to use machine learning to recognize the product type based on the item’s image and title.
Create recommendation models
First, the retailer used the data it had to find more than 300 product types. Next, it had to create different recommendation models that assign attributes to each product type that are displayed when customers type in a specific item they are looking for.
Tobias KaymakLead Data Engineer, Ricardo
“It’s a lot easier to build recommendation models when you have this information about product types and attributes instead of categories because they enable so much later,” said Tobias Kaymak, senior data engineer at Ricardo.
To do this, Ricardo’s team wanted to use natural language processing to pull information from the products’ titles and use Google Vision AI to recognize images.
The first machine learning model the company created was simple, Kaymak said. The real challenge came when the retailer needed to create about 299 more machine learning models for the other product types.
“Doing this 300 times — or deploying 300 microservices and then thinking about the future and having more product types — doesn’t really scale,” Kaymak said.
To address this issue, the team began looking for vendors who could help them complete this scope of work.
Fixed the scaling issue with Anyscale and Ray
Through his research, Kaymak came across an online video about Anyscale, a vendor that uses Ray to enable companies to run distributed computing projects.
Distributed computing projects allow companies to spread AI models and projects across different computers in the cloud.
Anyscale was founded in 2019 and is based in San Francisco. It helps companies accelerate the production of their AI application on any cloud and at any scale. Ray enables developers to scale applications from a laptop to the cloud without requiring complex infrastructure, the vendor says.
“Ultimately, it’s about enabling companies to be successful with AI by scaling their AI applications and producing their AI applications” without having to become experts in building or maintaining an AI infrastructure, said Robert Nishihara, CEO of Anyscale.
Fascinated by Ray and Anyscale, Kaymak contacted the vendor and within three months Ricardo was using the open source product in his Kubernetes cluster.
One aspect of Ray that appealed to Ricardo was that it’s open source.
“We also favor open source because if something happens, open source is usually supported by other people as well,” Kaymak said.
Although other vendors offer open-source products that companies may not need to do any operations for — like Apache Kafka, an open-source framework for building real-time streaming data pipelines — Anyscale Ray is a relatively new and rapidly evolving tool , which is pushing forward the open-source component, Kaymak added.
Also, Ricardo typically uses a FastAPI framework for its services. When Kaymak realized that the FastAPI framework was part of Ray, he recognized it as something he already knew worked.
“I haven’t seen any other framework on the market that offers these components,” Kaymak said.
Ricardo was also one of Anyscale’s first customers on Google Cloud Platform (GCP), although the vendor built its product on AWS, Kaymak said.
“Since we run everything in GCP, we wanted to stay there. They supported us in that and it was quite a nice experience,” he added.
challenges and advances
Using Ray hasn’t been without its challenges, especially since the technology is new, Kaymak said.
One challenge was that Ricardo was one of the few customers using the tool on GCP and wanted to take advantage of GPU acceleration. Trying to work that out with Anyscale while headquartered in a different time zone has been difficult, Kaymak said.
However, Ricardo is happy with using Ray, and the vendor has since offered to manage its cluster with Ray software as a service component, Kaymak said.
After using Anyscale for about six months, Ricardo is now shifting to developing more models with the vendor. Ricardo started with an attribute detection model using Ray, but now it’s building a model with everything encapsulated in Anyscale.
Ricardo also plans to leverage some of Ray’s other features, like training mode and hyperparameter tuning service, now that the retailer offers more than 700 product types.
“We started with the screwdriver; we found the screwdriver very powerful,” Kaymak said. “Now we’ve also found the drill, but there are loads of other things in the toolbox we could use that we haven’t touched yet.”
According to Nishihara, Anyscale charges customers based on usage.