The proposed solution deployed MapReduce to speed up the data acquisition while offering flexibility in managing variety of data formats. The “Big” data architecture included MongoDB for staging the data with ElasticSearch on three AWS EMR clusters. The ElasticSearch support real time search via REST calls with synonym and phonetic filters for speedy retrieval of relevant records. The solution was executed daily to process over 3 million records and to index half a million records on an average.
Technology stack: Hadoop, MAPR, MongoDB, AWS S3, ElasticSearch