In the eternal race between function and data, for a while, it appeared that function was supreme. After all, it was not so long ago that IT teams world over were focused on SOA and BPM. Data was relegated to a third or fourth-tier in an n-tier architecture for IT solutions. State-less servers were the order of the day.
Evenly distributed objects that gave some semblance of respect for data by encapsulating them with function in a two-tier client-server architecture appeared to exist in a distant past. And then Big Data happened. It was the handling of large amounts of data that was all of a sudden more important and not the tiny amount of computation performed on each data object, or so it seemed.
In the race to offer Big Data solutions, many large and small companies now offer compute farms integrated with terabytes of storage - with function or, shall we say, intelligence slowly related to "tier-3" or "tier-4" in large "map-reduce" data frameworks.
From a business perspective, where all investments have as their single most important goal - to contribute to business objectives - nothing really has changed; ROI reigns supreme.
Now we present a small research challenge - can an IT organization build a 1 Teraflop Big Data platform that can give the benefit of a 10 Teraflop map-reduce system? Can this be done by making the right tradeoffs between data-centricity and function-centricity? Is it possible to leverage the advanced support for distributed computing around fairly complex objects offered by mature distributed computing platforms that support open-standards like CORBA from OMG to make these trade-offs?
DSP and Stream Programmers know what exactly this means. When you look at an FIR filter with n coefficients, each computation reuses n-1 data items from the previous computation. This is a basic principle that is leveraged by efficient implementations of FIR filters that compute and re-compute a linear function on an incremental data stream in a many-core stream compute processor.
This very same principle can be leveraged by a distributed streaming object grid. An efficient high performance distributed computing platform such as SANKHYA Varadhi may just offer the right base for creating an Incremental Big Data platform.
This is the challenge that we are taking up with the Teraflop Streaming Object Gril project. This is mainly an effort to create a research platform for evaluating different architectural strategies to solve computation and I/O intensive problems and devise algorithms that can deliver higher "incremental" computing throughput. Sankhya is looking for co-creation and collaboration opportunities. Here is a call for collaboration:
http://www.sccpcc.org/cpp2013-01.html
If you are interested in participating, please contact [email protected] OR [email protected] Stay tuned for updates from this program.
Follow Mr. Bulusu on twitter at @GopiBulusu.
Comments