Our Architecture
Our interactive analytics platform, SigView, is powered by a homegrown Apache Spark Based analytical engine which was built to query/analyze 100s of terabytes of data under 5 seconds with an extremely low hardware requirement.
Our Architecture
Powering Interactive Analytics At Scale
Our interactive analytics platform, SigView, is powered by a homegrown Apache Spark Based analytical engine which was built to query/analyze 100s of terabytes of data under 5 seconds with an extremely low hardware requirement.
Intelligent Data Storage Database Intelligent Data Storage
By storing data intelligently we want to minimize the amount of data accessed by each query to minimize disk I/O. For this, we store data in HDFS in Parquet format, which is a columnar data format. This avoids reading of columns, which are not needed by the query. Moreover, it provides a number of compression & encoding techniques.
Intelligent Data Processing Database Intelligent Data Processing
To accelerate query performance, the system generates an intelligent mix of specific materialized views (for frequently accessed data) as well as primary indexes (Reverse indexes, Bloom filters, Range Partitioning) on the ingested raw data and stores them as segments distributed across the cluster.
What Sets Us Apart?
  • Sub 5 second response times at 250+ TB
  • No Specialized Hardware Required
  • Hadoop Native – No Data movement
  • Real Time Ingestion
Schedule A Call Today
What Sets Us Apart?
  • High Availability setup
  • Low time to deployment
  • No pre-aggregation or cubing required
  • High Availability setup
  • Low time to deployment
  • No pre-aggregation or cubing required
Local caching Database Local caching
The downside of keeping data in the disk is the time spent in I/O. Moreover, we understand there are few columns, indexes and metadata files which are more popular than others. To prevent wasting time in I/O requests, frequently accessed column and index blocks, as well as small metadata files, are automatically cached resulting in superior performance.
Scale-Out Architecture Database Scale-Out Architecture
The back-end architecture has been designed to be completely scale-out. Which means the number of nodes can be easily scaled up as the number of concurrent users increase or the volume of data increases. Correspondingly at low load, the number of nodes can be decreased to save on infrastructure cost.
Query & Index cache Database Query & Index cache
Analysts often have repeating (or similar) queries or start with same basic questions and eventually dive deep into unique data cuts interactively. Our engine recognizes such behavior and starts automatically caching result sets – both intermediate and near final ones. The adaptive cache is also incremental. If part of the query is addressable from the cache and the rest from disk then, only the delta will be computed and merged with the cached results to find the final output. It will subsequently replace the old cached results with new ones.
Hadoop Native Database Hadoop Native
Since the architecture is completely Hadoop native. So you do not have to copy or move your data between clusters. Moreover, we can leverage the same Hadoop cluster for our workload so you do not have to maintain a parallel infrastructure for your analytics requirements. We can natively work on top of any of the standard Hadoop distributions like CDH, Mapr and HDP as well as native Hadoop.