Sunday, January 21, 2018

Big Data Analytics

Big Data is a broad umbrella of technologies grouped together to solve problems which may not be easily solved with traditional technologies and systems. These problems can be classified into:

  • High volume Data read/write.
  • Analytics on the entire dataset
  • Dealing with unstructured Multi Format Data

Big Data Services

  • Evaluate current system requirements and recommend a Big Data Solution
  • Put together a detailed road map adoption strategy
  • Design and Architect a framework to fit Big Data into your environment
  • Develop, refactor, migrate and integrate to speed ROI and minimize risk
  • Help you choose the right mix of Big Data technologies
  • Configurate and Optimize your cluster
  • Train your resources and bring them up to speed

More capacity is directly proportional to more hardware with no need for entire cluster reconfiguration for adding nodes. With technologies around big data, we can handle large volumes of multi-dimensional data and can scale the infrastructure for growth in a predictable linear fashion.

Jean Martin has built low latency, real time solutions with IBM Netezza, Oracle Exadata, Terradata and Informatica. We can help you to quickly configure and customize your Big Data appliance, to enable Analytics and reporting. We have built Direct and Indirect analytics on Big Data appliances, Data Visualizations and Forecasting solutions.

We have expertise with Amazon’s Elastic Compute cloud, Elastic Map Reduce, Simple Storage Service and CloudSearch

Elastic compute cloud is a very cost effective option. To adopt Big Data, we pay for what we use and we can increase/decrease capacity as needed. Our customers use Amazon’s cloud services for ad-hoc, volume processing, as well as a hybrid cloud solution, combined with their own hosted cluster.

We have customers using our Cloudera based Hadoop analytics cluster for Complex Event Processing (CEP) with Esper, Graph Analysis and real time analytics with with HBase. Our Analytics cluster feeds a cloud based cluster for view only systems.


Success Stories



Our Big Data Solutions Architecture


Elastic storage HDFS

Many of our customers are adopting Hadoop in a very basic form to offset their storage needs.


The most basic use of a Hadoop cluster can be to store large amount of data. Hadoop with its underlying Distributed file system (HDFS) offers the most robust and cheapest cost per Gigabyte expandable storage. This storage is built with cluster of inexpensive commodity hardware and it is designed to withstand hardware failures. Your data remains safe even after node failures since each bit of data is replicated to different nodes three times by default, as more space becomes available HDFS expands itself to make use of that space. You can also scale up as the need grows by adding additional hardware with no reconfiguration required for the entire cluster.


Since the cluster of storage is built with servers, we get free processing power with it. We can utilize the tools from the Hadoop echo system to query the stored data, we can use Apache Hive for intuitive SQL like interface to mine the data and also we have a choice to use Apache Pig which appeals to programmers who prefer scripting.

Once we store the raw data in HDFS it can be mined in parallel, then we can move the refined data to our information warehouse, and if we figure out a new dimension for our data at a later point of time, we can mine the old data as well and re-extract it to our information warehouse in acceptable time.


Data Warehouse and Hadoop

Hadoop and Data warehouse go hand in hand with Hadoop storing all the data including noise. This information is pushed to the warehouse in incremental iterations over the learning period.


Hadoop cluster stores the raw data streams which can be of value from the logs etc of the authoritative data of this system. The data which can be fetched from other authoritative sources should be fetched directly from those sources, unless we want to take advantage of cached copies of those data on our Hadoop cluster for faster retrieval, but this comes with a risk of referring stale data as the authoritative source might have more recent data.


The ETL routines fetch data from Hadoop as well and other sources and sends it to the data warehouse where it is accessed by various systems as well as by the BI tools. The power of Hadoop lies in solving two big problems which usually are not handled very well with current data warehouse systems, one of them is the capacity to handle very large volume set and the other is to run complex analytics and not just the subset but on the entire data. Hadoop compliments your data warehouse in both these aspects.



Social Media and Hadoop

Low latency high volume data handling solutions are typically suited for Hadoop and HBase. With hybrid RDBMS solutions we can solve problems dealing with predictive analysis to almost real time aggregation.



The web throttle is captured by a Data Aggregator which can be run cached, each of which handles request and writes to an intermediate JMS. The messages are persisted for reliability and are consumed by multiple listener sync instances, there the sync instances writes streaming data to Hadoop/HBase and the raw data is persisted. Analytics Map-Reduce jobs runs on this data and transforms it within Hadoop and also writes to a cluster of MySQL which is then used by the Portal to serve it, this is a highly scalable architecture which can be used to handle millions of impressions per day.


Service Request

Jean Martin Inc. 551 5th Avenue, Suite 1425 New York, NY 10176.
Any and all information submitted will be reviewed only by authorized representatives of Jean Martin Inc. and kept in the strictest confidentiality.
powered by fox contact
Scroll to Top