Mining the Contact Center for BI Gems

by Gaurav Patil on Friday 23 July 2010


The use of analytics in contact centers enables management to extract critical business intelligence that would otherwise be lost. By analyzing and categorizing recorded conversations between companies and their customers, useful information can be discovered relating to strategy, product, process, customer satisfaction/retention, and operational issues. This information gives decision-makers insight and the ability to react quickly to customer sentiment and behavior, as well as customer response to both outbound and nascent market and brand positioning.

Syndicated via RSS From: http://www.crmbuyer.com

Get free white papers delivered direct to your inbox from IT Knowledge Hub! Register now for cutting edge webcasts, reports, and white papers in your area of expertise.


Google’s Dremel – Or, Can MapReduce Itself Handle Fast, Interactive Querying?

by Tasso Argyros on Monday 19 July 2010

Every year or so Google comes out with an interesting piece of infrastructure, always backed by claims that it’s being used by thousands of people on thousands of servers and processes petabytes or exabytes of web data. That alone makes Google papers interesting reading. :)

This latest piece of research just came out on Google’s Research Buzz page. It’s about a system called Dremel (note: Dremel is a company building hardware tools which I happened to use a lot when I was building model R/C airplanes as a kid). Dremel is an interesting move by Google which provides a system for interactive analysis of data. It was created because it was thought that native MapReduce has too much latency for for fast interactive querying/analysis. It uses data that sits on different storage systems like GFS or BigTable. Data is modeled in a columnar, semi-structured format and the query language is SQL-like with extensions to handle the non-relational data model. I find this interesting – below is my analysis of what Dremel is and the big conclusion.

Main characteristics of the system:

Data & Storage Model
• Data is stored in a semi-structured format. This is not XML, rather it uses Google’s Protocol Buffers. Protocol Buffers (PB) allow developers to define schemas that are nested.
• Every field is stored in its own file, i.e. every element of the Protocol Buffers schema is columnar-ized. Columnar modeling is especially important for Dremel for two specific reasons:
- Protocol Buffer data structures can be huge (> 1000 fields).
- Dremel does not offer any data modeling tools to help break these data structures down. E.g. there’s nothing in the paper that explains how you can take a Protocol Buffers data structure and break it down to 5 different tables.
• Data is stored in a way that makes it possible to recreate the orignial “flat” schema from the columnar representation. This however requires a full pass over the data – the paper doesn’t explain how point or indexed queries would be executed.
• There’s almost no information about how data gets in the right format, how is it stored, deleted, replicated, etc. My best guess is that when someone defines a Dremel table, data is copied from the underlying storage to the local storage of Dremel nodes (“leaf nodes”) and at the same time is replicated across the leaf nodes. Since data in Dremel cannot be updated, once a table is replicated it doesn’t need to be deleted, which probably simplifies the design & implementation.

Interface
Query interface is SQL-like but with extensions to handle the semi-structured, nested nature of data. Input of queries is semi-structured, and output is semi-structured as well. One needs to get used to this since it’s significantly different from the relational model.
• Tables can be defined from files, e.g. stored in GFS by means of a “DEFINE TABLE” command.
The data model and query language makes Dremel appropriate for developers; for Dremel to be used by analysts or database folks, a different/simpler data model and a good number of tools (for loading, changing the data model etc) would be needed.

Query Execution
Queries do NOT use MapReduce, unlike Hadoop query tools like Pig & Hive.
• Dremel provides optimizations for sequential data access, such as async I/O & prefetching.
• Dremel supports approximate results (e.g. return partial results after reading X% of data – this speeds up processing in systems with 100s of servers or more since you don’t have to wait for laggards).
• Dremel can use replicas to speed up execution if a server becomes too slow. This is similar to the “backup copies” idea from the original Google MapReduce paper.
There seems to be a tree-like model of executing queries, meaning that there are intermediate layers of servers between the leaf nodes and the top node (which receives the user query). This is useful for very large deployments (e.g. thousands of servers) since it provides some intermediate aggregation points that reduce the amount of data that needs to flow to any single node.

Performance & Scale
Compared to Google’s native MapReduce implementation, Dremel  is two orders of magnitude faster in terms of query latency. As mentioned above, part of the reason is that the Protocol Buffers are usually very large and Dremel doesn’t have a way to break those down except for its columnar modeling. Another reason is the low latency of Google’s MapReduce implementation.
• Following Google’s tradition, Dremel was shown to scale reasonably well to thousands of servers although this was demonstrated only over a single query that parallelizes nicely and from what I understand doesn’t reshuffle much data. To really understand scalability, it’d be interesting to see benchmarks with a more complex workload collection.
• The paper mentions little to nothing about how data is partitioned across the cluster. Scalability of the system will probably be sensitive to partitioning strategies, so that seems like a significant omission IMO.

So the big question – can MapReduce itself handle fast, interactive querying?
• There’s a difference between the MapReduce paradigm, as an interface for writing parallel applications, and a MapReduce implementation (two examples are Google’s own MapReduce implementation, which is mentioned in the Dremel paper, and open-source Hadoop). MapReduce implementations have unique performance characteristics.
• It is well known that Google’s MapReduce implementation & Hadoop’s MapReduce implementation are optimized for batch processing and not fast, interactive analysis. Besides the Dremel paper, look at this Berkeley paper for some Hadoop numbers and an effort to improve the situation.
Native MapReduce execution is not fundamentally slow; however Google’s MapReduce and Hadoop happen to be oriented more towards batch processing. Dremel tries to overcome that by building a completely different system that speeds interactive querying. Interestingly, Aster Data’s SQL-MapReduce came about to address this in the first place and offers very fast interactive queries even though it uses MapReduce. So the idea that one needs to get rid of MapReduce to achieve fast interactivity is something I disagree with – we’ve shown this is not the case with SQL-MapReduce.

Syndicated via RSS From: http://www.asterdata.com/blog

Get free white papers delivered direct to your inbox from IT Knowledge Hub! Register now for cutting edge webcasts, reports, and white papers in your area of expertise.


Your Business is Going Social

by Blog: Krish Krishnan on Monday 12 July 2010

Believe it or not, your Business has gone social. If you have any presence on the internet, you already will have added facebook, digg, delicoius, twitter and more to your website. Whether you are Walmart or Exxon or just The Store Next Door, the social media has changed the way your customers and suppliers work with you. The “long tail” is slowly transcending to a “fat tail”. Collaborative computing and crowdsourcing are becoming common terms in business today. These are not buzzwords anymore and will be the future.

Today there in research and development in major software organizations from IBM to Google to Apple about the crowdsourcing model of innovation. Infact Apple had adopted to this model and the success of iPhone is a very shining example. So did Google with Android platform, today there are scores of applications for this platform all over the world.

Software as we know it and Business as we knew it will change in the next five years to a total model of crowd driven innovation and growth model. Your Business is going Social, adopt to the new ways of the new world.

Syndicated via RSS From: http://www.b-eye-network.in/blogs/krishnan/

Get free white papers delivered direct to your inbox from IT Knowledge Hub! Register now for cutting edge webcasts, reports, and white papers in your area of expertise.


EMC to Acquire Greenplum

by Blog: Krish Krishnan on Wednesday 7 July 2010

An interesting move from EMC. First they signed a deal with ParAccel to provide stroage technology and now are acquiring Greenplum. This deal signifies the new war of worlds – HP, IBM, CISCO, EMC and Oracle will now fight a battle royale over Cloud Computing and in special mention DWBI and Analytics in the Cloud.

There are several things to understand in this move

1. Majir mover advantage for a commercial cloud provision – apart from Amazon
2. EMC establishing its presence as a player in the DWBI and Analytics space
3. The fact that BI in the cloud is a reality
4. The new platform for DWBI and Analytics may move out of SQL
5. DWBI as we know it will change

As we start moving from a SQL intensive DWBI to a non-SQL intensive BI, we will start looking at these kinds of platforms for a new alternative. Good news for all technology enthusiasts and Analytical BI users.

Another fact is now the DW Appliances Space has very few pure-play vendors – Netezza vs. AsterData is certainly the next battle

 

Syndicated via RSS From: http://www.b-eye-network.in/blogs/krishnan/

Get free white papers delivered direct to your inbox from IT Knowledge Hub! Register now for cutting edge webcasts, reports, and white papers in your area of expertise.


Boosting the Bottom Line in Real-Time

by Brian Goffman on Monday 21 June 2010


As a marketer, you need a control panel to make decisions. The world is moving faster and faster every day, enabling literally billions of interactions about your company and products around the globe. What you need is real-time intelligence. The term “real-time” used to only apply to Wall Street and market quotes. Now it applies to marketing, and how its channels can help drive more traffic and improve overall campaign efforts. In this context, real-time intelligence is an extension of online marketing as a means to provide measurable impact to a company’s bottom line.

Syndicated via RSS From: http://www.crmbuyer.com

Get free white papers delivered direct to your inbox from IT Knowledge Hub! Register now for cutting edge webcasts, reports, and white papers in your area of expertise.

Copyright © 2010 IT Knowledge Hub LLC | Advertise | Contact | Privacy Policy | Terms of Use | Register