Learn how to secure your Solr data in a policy-based, fine-grained way.

Data security is more important than ever before. At the same time, risk is increasing due to the relentlessly growing number of device endpoints, the continual emergence of new types of threats, and the commercialization of cybercrime. And with Apache Hadoop already instrumental for supporting the growth of data volumes that fuel mission-critical enterprise workloads, the necessity to master available security mechanisms is of vital importance to organizations participating in that paradigm shift.

Fortunately, the Hadoop ecosystem has responded to this need in the past couple of years by spawning new functionality for end-to-end encryption, strong authentication, and other aspects of platform security. For example, Apache Sentry provides fine-grained, role-based authorization capabilities used in a number of Hadoop components, including Apache Hive, Apache Impala (incubating), andCloudera Search (an integration of Apache Solr with the Hadoop ecosystem). Sentry is also able to dynamically synchronize the HDFS permissions of data stored within Hive and Impala by using ACLs that derive from Hive GRANT s.

In this post, you’ll learn how to secure Solr data by controlling read/write access via Sentry (backed up by the strong authentication capabilities of Kerberos) and access it programmatically from Java applications and Apache Flume. This operation applies to many industry use cases where Solr is the backing data layer in multi-tenant, Java-based web applications associated with frequent updates that happen in the background.


Our example assumes that:

Solr is running in a Cloudera-powered enterprise data hub, with Kerberos and Sentry also deployed. A web app needs to access a Solr collection programmatically using Java. The Solr collection is updated in real-time via Flume and a MorphlineSolrSink.

Sentry authorizations for Hive and Impala can be stored in either a dedicated database or a file in HDFS (the policy provider is pluggable). In the below example, we’ll configure role-based access control via the file-based policy provider.

Create the Solr Collection

First, we’ll generate a collection configuration set called poems :

solrctlinstancedir --generatepoems

We are assuming that your Solr client configuration automatically comprises settings for solrctl such that it can locate Apache ZooKeeper and the Solr nodes. If that is not the case, you might have to instruct the solrctl command on its location explicitly, for example:

solrctl --zkzookeeper-host1:2181,zookeeper-host1:2181,zookeeper-host1:2181/solr --solrhttp://your.datanode.net:8983/solr

Edit poems/conf/schema.xml to reflect a smaller number of fields per document. (A simple id and text field will suffice.) Also, confirm that copy-fields are removed from the sample schema:

Be sure to use the secured solrconfig.xml :

cppoems/conf/solrconfig.xmlpoems/conf/solrconfig.xml.original cppoems/conf/solrconfig.xml.securepoems/conf/solrconfig.xml

Push the configuration data into Apache ZooKeeper:

solrctlinstancedir --createpoemspoems

Create the collection:

solrctlcollection --createpoems Secure the poems Collection using Sentry

The policy shown below establishes four Sentry roles based on the admin , operators , users , and techusers groups.

Administrators are entitled to all actions. Operators are granted update and query privileges. Users are granted query privileges. Tech users are granted update privileges. [groups] cloudera_hadoop_admin = admin_role cloudera_hadoop_operators = both_role cloudera_hadoop_users = query_role cloudera_hadoop_techusers = update_role [roles] admin_role = collection = *->action=* both_role = collection = poems->action=Update, collection = poems->action=Query query_role = collection = poems->action=Query update_role = collection = poems->action=Update

Add the content of the listing to a file called sentry-provider.ini . Rename the groups according to the corresponding groups in your cluster.

Put sentry-provider.ini into HDFS:

hdfsdfs -mkdir -p/user/solr/sentry hdfsdfs -putsentry-provider.ini /user/solr/sentry hdfsdfs -chown -Rsolr /user/solr

Enable Sentry policy-file usage in the Solr service in Cloudera Manager:

Solr -> Configuration → Service Wide → Policy File Based Sentry → Enable Sentry Authorization = True

Restart Solr (only needed once for enabling Sentry integration):

Solr → Actions → Restart

Add Data to the Collection via curl

Use curl to add content:

kinit curl --negotiate -u : -s \ http://your.datanode.net:8983/solr/poems/update?commit=true -H "Content-Type: text/xml" --data-binary \ '1Mary had a little lamb, the fleece was white as snow.2The quick brown fox jumps over the lazy dog.'

Use curl to perform an initial query and verify Solr’s function:

curl --negotiate -u : -s \ http://your.datanode.net:8983/solr/poems/get?id=1 Accessing the Collection via Java

Next, we’ll make sure that the web app can access the collection whenever needed.

Add the following code to a Java file called SecureSolrJQuery.java :

importorg.apache.solr.client.solrj.SolrServerException; importorg.apache.solr.client.solrj.impl.HttpSolrServer; importorg.apache.solr.client.solrj.SolrQuery; importorg.apache.solr.client.solrj.SolrServer; importorg.apache.solr.client.solrj.response.QueryResponse; importorg.apache.solr.common.SolrDocumentList; importjava.net.MalformedURLException; class SecureSolrJQuery { public static void main(String[] args) throwsMalformedURLException, SolrServerException { String queryParameter = args.length == 1? args[0] : "*"; String urlString = "http://your.datanode.net:8983/solr/poems"; SolrServersolr = new HttpSolrServer(urlString); SolrQueryquery = new SolrQuery(); query.set("q", "text:"+queryParameter); QueryResponseresponse = solr.query(query); SolrDocumentListresults = response.getResults(); for (int i = 0; i < results.size(); ++i) { System.out.println(results.get(i)); } } }

Create a JAAS config ( jaas-cache.conf ) to use the Kerberos ticket cache (that is, your existing ticket from kinit ):

Client { com.sun.security.auth.module.Krb5LoginModulerequired useTicketCache=true debug=false; };

Later, you’ll see how to achieve the same goal with a keytab to make authentication happen non-interactively.

Using the Code

Compile the java class:

CP=` find /opt/cloudera/parcels/CDH/lib/solr/ |grep "\.jar"|tr '\n' ':'` CP=$CP:`hadoopclasspath` javac -cp $CPSecureSolrJQuery.java

Create a shell script called query-solrj-jaas.sh to run the query code:

CP=` find /opt/cloudera/parcels/CDH/lib/solr/ |grep "\.jar"|tr '\n' ':'` CP=$CP:`hadoopclasspath` java -Djava.security.auth.login.config=`pwd`/jaas-cache.conf -


主题: SolrJavaHiveHadoopHDFSZooKeeper
tags: solr,Solr,Sentry,poems,collection,Apache,query,your
本文标题:How-to: Secure Apache Solr Collections and Access Them Programmatically

技术大类 技术大类 | 数据库(综合) | 评论(0) | 阅读(208)