Couchbase and the future of NoSQL databases
Couchbase is aNoSQL, document-oriented databasefor building interactive applications. Trends in the open source database industry show positive growth asNoSQL is used forweb, mobile, and theInternet of Things (IoT).
In this interview, Arun Gupta , VP of Developer Advocacy at Couchbase, shares his views on how open source has made an impact onthe database industry, and the challenges that lie ahead for the NoSQL industry. Also, findout which open source tools and methodologies Couchbase has adopted.
Arunhasauthored more than 2,000 blog posts on technology as well as several books.He's been named a JavaOne Rock Starfor three years in a row, and hefounded the Devoxx4Kids chapter in the U.S. where he continues topromote technology education among children. Among his recognized titles are:Java Champion, a JUG Leader, NetBeans Dream Team member,andDocker Captain.
What's it like to work for Couchbase?
Well, I've built and led developercommunities for 10+ years at Sun, Oracle, and Red Hat, soI have experience in leadingcrossfunctionalteams to develop and execute strategy, planning, and execution of content, andmarketing campaigns and programs. I've also led engineering teams at Sun, and I’m a foundingmember of the Java EE team.
At Couchbase, adeveloper advocate helps developers become effective users of a technology, product, API,or platform. This can be done by sharing knowledge about the product using the medium wheredevelopers typically hangout. Some of the more common channels include blogs, articles,webinars, and presentations at conferences and meetups. Answering questions on forums andStack Overflow, conversations on social media, and seeking contributors for open sourceprojects are some other typical activities that a developer advocate performs on a regularbasis.How has open source impacted the database industry, and NoSQL databases inparticular?
Simply put, open source has revolutionized the database industry.
For several decadesdatabases were closed source and under proprietary licenses. Users were at the mercy of ahandful of big vendors like IBM, Oracle, Microsoft, Teradata, and SAP. Enterprises were heldcaptive to release schedules and licensing schemes that often had little to do with providingtechnical value, and were more about maximizing the vendor's profit margin. There were a fewopen source databases like mysql, PostgreSQL, and SQLite that offered an alternative to thebig vendors, but for the most part these products lacked the maturity, feature depth, and stabilitythat most enterprises were looking for. Until recently, it appeared that the established vendorshad a lockon the database industry and that the barrier to entry was so high that otherdatabase providers would be forever relegated to be niche players.
But two things occurred: open source database products matured, and distributedcomputing happened.
Although enterprises continue to run a lot of "the big five"RDBMSproducts, industry analysts like Gartner, Forrester, and 451 Research as well as softwareranking websites like dbengines.com have clearly stated that mature open source databasesare no longer niche products. The new innovations in distributed computing leverage modern,high performance,commodity hardware to provide distributed, scalable data processing (forexample, Hadoop and Spark) and distributed, scalable data storage (for example, HDFS andNoSQL).
The vast majority of the software products developed over the last 10 years thatprovide distributed computing are open source. These new products take advantage ofadvanced technical features, agile development methodologies, and an open sourcecollaborative licensing model to deliver rapid technological innovation as well as lower cost ofoperations to their users.
With regard to NoSQL databases, virtually all of these products are available under an opensource license (with a few notable exceptions like Amazon DynamoDB, MarkLogic, and a fewothers). And the number of open source NoSQL databases continues to grow. What's differentabout the NoSQL space, however, is not just the number of products that are available, but howvibrant and active the open source community is. What's amazing about this is that the adoptionrate among Fortune 1000 companies continues to grow exponentially andthese companies arecontributing back to the community. Also,amazing is the mission critical,nonnichenature of the NOSQL usecases, as well as the resulting quality and maturity of these products.As we continue to push our information society forward and expand technology in the form of things likemachine learning, what challenges do you see in the next few years for the NoSQLdatabase industry?
Probably the two biggest challenges are around technology integration and maintaining productfocus, quality, and scalability.
From an integration perspective, you have multiple technologies (for example, the Hadoopstack, Spark, IoT, and NoSQL) that are rapidly evolving, in a space where there are very fewactual standards. On the one hand, you need to choose where to focus your resources and tryto identify which trends, products, and APIs are going to become widely adopted, while at thesame time adjusting to the rapid changes ofthose same trends, products, and APIs. Also, we hear from enterprise users that inefficient or poorly executed integrations result inwasted time and effort due to performance and scalability problems. You could argue that opensource NoSQL databases should provide every single integration possible, but customers aren’treally looking for every kind of integration―they are looking for integrations that work.
From a product quality and scalability perspective, NoSQL databases provide a highperformance, high throughput alternative to the RDBMS onesizefitsalldatabase approach bydelivering focused functionality that is specifically designed to provide fast access to operationaldata. Integrating NoSQL with new technologies like machine learning for example, requires amore interactive, realtime,operational approach, rather than a bulk, batch processing approachthat is more common with RDBMS and file system integrations.
Additionally, as NoSQL databases become more feature and integration rich, there is a real riskof losing some of the performance and scalability advantages that NoSQL brings to the table.Avoiding that risk doesn't occur by chance or by luck―it has to be planned for and architectedinto the product.
For example, Couchbase's core database architecture is based on ourMultiDimensionalScaling (MDS) and Database Change Protocol (DCP). MDS and DCP enableCouchbase to provide a service orientedarchitecture (SOA) for data management. With MDS,Couchbase can continuously add data management services, which can be individuallyconfigured and resourced, without affecting the throughput or behavior of other services. Thisallows customers to scale each service independently (scaling up, scaling out, or both) whilealso providing workload isolation. This SOA-basedapproach to data management has allowedCouchbase to introduce full SQL for JSON (N1QL), Global Secondary Indexing, CrossDatacenter Replication (XDCR), built-inFull Text Search, and high performancestreamingintegration with Spark, Kafka, and Hadoop.Which open source tools andmethodologies has Couchbase adopted?
One of the core values at Couchbase is that we are an open source software company. Webelieve that open source is about much more than simply making the source code available―it's about how we develop our software, and how we engage with the greater Couchbasecommunity.This affects