Tuesday, October 30, 2012

Considerations for a Manual Partitioning Strategy in Accumulo

On the Accumulo Users mailing list, Adam F. suggested:
1. Parallelism and balance at ingest time. You need to find a happy medium between too few partitions (not enough parallelism) and too many partitions (tablet server resource contention and inefficient indexes). Probably at least one partition per tablet server being actively written to is good, and you'll want to pre-split so they can be distributed evenly. Ten partitions per tablet server is probably not too many -- I wouldn't expect to see contention at that point.

2. Parallelism and balance at query time. At query time, you'll be selecting a subset of all of the partitions that you've ever ingested into. This subset should be bounded similarly to the concern addressed in #1, but the bounds could be looser depending on the types of queries you want to run. Lower latency queries would tend to favor only a few partitions per node.

3. Growth over time in partition size. Over time, you want partitions to be bounded to less than about 10-100GB. This has to do with limiting the maximum amount of time that a major compaction will take, and impacts availability and performance in the extreme cases. At the same time, you want partitions to be as large as possible so that their indexes are more efficient.

One strategy to optimize partition size would be to keep using each partition until it is "full", then make another partition. Another would be to allocate a certain number of partitions per day, and then only put data in those partitions during that day. These strategies are also elastic, and can be tweaked as the cluster grows.

In all of these cases, you will need a good load balancing strategy. The default strategy of evening up the number of partitions per tablet server is probably not sufficient, so you may need to write your own tablet load balancer that is aware of your partitioning strategy.

Tuesday, October 16, 2012

How Many Accumuo Loggers Are Needed Related To Slaves (tserver) Nodes?

From the Accumulo User mailing list on Oct 10, 2012, John V says:
The scripts start loggers on all of the slave nodes. You can arbitrarily start/stop loggers, but we highly recommend doing 1:1 between them and tservers because the method for assigning tserver->logger does make assumptions depending on that 1:1 correlation.

How Does the Accumulo Client Find the Correct Tablet Server?

From the Accumulo User mailing list on Oct 10, 2012, John V says:
Depending on the Instance your using, the client may hit HDFS first to get some instance information. Then either instance will hit Zookeeper to get the root tablet location. It will then scan that to find the appropriate !METADATA tablet, but will aggressively cache everything. The client will then scan (with heavy caching again) the !METADATA tablets necessary to find the tablet(s) for the range of writes. And then the writes will occur to appropriate tablets discovered in the !METADATA scan.

Tuesday, October 02, 2012

How To Run Multiple Instances of Accumulo on One Hadoop Cluster

On the Accumulo User mailing list, Kristopher K. asked:
I built 1.5 from source last night and wanted to try it out on my existing Hadoop cluster without overwriting my current 1.4 set. Is there a way to specify the /accumulo directory in HDFS such that you can run multiple instances?

Eric N. replied:
From the monitoring user interface, see the Documentation link, then Configuration, see the first property:


You'll also change all the port numbers from the defaults. And there's a port number in conf/generic_logger.xml that points to the logging port on the monitor.

For example, here are some entries from my conf/accumulo-site.xml file:







And conf generic_logger.xml:

<!-- Send all logging data to a centralized logger -->
<appender name="N1" class="org.apache.log4j.net.SocketAppender">
<param name="remoteHost" value="${org.apache.accumulo.core.host.log}"/>
<param name="port" value="1560"/>
<param name="application" value="${org.apache.accumulo.core.application}:${org.apache.accumulo.core.ip.localhost.hostname}"/>
<param name="Threshold" value="WARN"/>

How Accumulo Compresses Keys and Values

From the Acccumulo User mailing list, Keith T said:
There are two levels of compression in Accumulo. First redundant
parts of the key are not stored. If the row in a key is the same as
the previous row, then its not stored again. The same is done for
columns and time stamps. After the relative encoding is done a block
of key values is then compressed with gzip.

As data is read from an RFile, when the row of a key is the same as
the previous key it will just point to the previous keys row. This is
carried forward over the wire. As keys are transferred, duplicate
fields in the key are not transferred.

General consensus seemed to favor double compression - compression both at the application level (i.e., compress the values) and let Accumulo compress as well (i.e., the relative encoding).

In support of double compression, Ameet K. said:
I've switched to double compression as per previous posts and
its working nicely. I see about 10-15% more compression over just
application level Value compression.