Build high throughput B2B aggregation for speedy import of fashion merchandise SKUs from multiple boutique or wholesalers to sell over the e-channel in near real time.
A high end e-commerce fashion retailer was seeking to replace its existing data acquisition process to source large volumes of merchandise data from multiple sellers and boutiques. This process was slow, rigid and inefficient as well as it suffered from data quality issues and it impeded re-seller’s ability to make merchandise available thru channels for sale quickly.
The proposed solution deployed MapReduce to speed up the data acquisition while offering flexibility in managing variety of data formats. The “Big” data architecture included MongoDB for staging the data with ElasticSearch on three AWS EMR clusters. The ElasticSearch support real time search via REST calls with synonym and phonetic filters for speedy retrieval of relevant records. The solution was executed daily to process over 3 million records and to index half a million records on an average.
Technology stack: Hadoop, MAPR, MongoDB, AWS S3, ElasticSearch