carryel's Blog
Introduction to GrizzlyMemcached
Sometimes we use caches for speeding up by alleviating database load.
And the Memcached is the bestknown in-memory key-value store(cache). For using Memcached, you need clients and many clients already exist. You can also find Memcached clients based on Java.
Though there are already good Memcached clients which have optimized Memcached operations a long time, I would like to introduce GrizzlyMemcached based on Grizzly framework which is very scalable and gives high performance.
Main features
Improving and supporting bulk operations such as getMulti and setMulti as well as basic operations of Memcached
- Using high performance connection pool
- Using Grizzly Framework for I/O
- Using only Memcached binary protocol
- Supporting setMulti, deleteMulti, getsMulti and casMulti as well as getMulti
Supporting failover/failback of Memcached
- Using consistent hashing
- Allowing Memcached's changes dynamically
- Providing an option for enabling/disabling failover/failback
Synchronizing many clients for preventing stale cache data automatically when Memcacheds are failed, removed and added dynamically
- Using ZooKeeper
- Using the Barrier for synchronizing Memcached's list
Considerations
I/O Model
It is very important that clients as well as servers should have stable and robust I/O base.
Grizzly NIO framework has high performance, scalability and stability and it can be integrated into various modules easily.
So GrizzlyMemcached uses Grizzly NIO framework for sending/parsing/receiving packets corresponding to Memcached's binary protocol.
Grizzly NIO framework also provides several I/O strategies.
I chose the same-thread IOStrategy with default and it showed good results in my benchmark because GrizzlyMemcached is not server but client(but, you can change it as your needs in configuration).
Connection Model
Some Memcached clients such as SpyMemcached and XMemcached use only one connection about requests of multi threads.
If multi threads share one connection, the client can optimize a set of continuous single get/set operations into a bulk operation like getMulti by using the request queue because a bulk opertaion is very fast and effective than many single operations.
But, one connection can also lack scalability if many/large requests of many threads are queued concurrently.
So some Memcached clients such as JavaMemcached use many connections(a connection per a thread) and pool of connections.
This is trade-off issue(more scalable but less effective than one connection model).
Finally, I chose "a connection per a thread" model because our company(Kakao) already has experienced a connection's overload. Most of cases were that hundreds of threads had requested many different kinds of keys simultaneously.
Stale cache data
Sometimes Memcacheds can be failed/added/removed or some Memcached clients can meet temporal network failures.
Of course clients use consistent hasing algorithm for choosing Memcacheds so they minimize side effects of Memcacheds' changes if a specific Memcached is failed because only keys of the failure's Memcached will be distributed to living Memcacheds.
Then, is the consistent hashing algorithm enough?
If you are using many clients with Memcacheds, you can't avoid stale cache data issue. If you need to build additional Memcacheds in real environments, all Memcached clients should share the same configuration of Memcached's list at the same time in order to minimize stale data.
Assuming that A, B are Memcacheds and there are hundreds of Memcached clients which know only A, B.
If new Memcached C should join the existing configuration set, some clients know A, B and C but others know only A, B while new configurations are being applied.
For preventing this issue, I chose the central configuration with ZooKeeper.
If the central configuration will be changed, all GrizzlyMemcacheds will detect and receive it(1 phase, prepare stage). If all GrizzlyMemcacheds receive it successfully, it will be applied simultaneously at the specific system time(2 phase, commit stage).
(I assumed all clients' system times are synchronized)
Benchmark
Test Information
- Memcached and client machines
- CPU: Intel Xeon 3.3G, 8 Processors
- Memory: 16G
- OS: Linux SentOS
- JDK: 1.6
- Network: 1Gbit
- Server/Clients versions
- Memcached(v1.4.13)
- GrizzlyMemcached, SpyMemcached(v2.7.3), JavaMemcached(v2.6.0) and XMemcached(v1.3.5)
Senario
- packets
- 32, 64, 128, 256 and 512 bytes
- operations
- get, set, getMulti and setMulti(which is supported by only GrizzlyMemcached)
- threads
- 1, 50, 100, 200 and 400
- Etc
- multi keys are 200, Loop counts are 200(loops per a thread)
Result










You can see the benchmark codes and results here
Examples of Use
Simple usecase
// creates a singleton CacheManager
final GrizzlyMemcachedCacheManager manager = new GrizzlyMemcachedCacheManager.Builder().build();
// gets the cache builder
final GrizzlyMemcachedCache.Builder<String, String> builder = manager.createCacheBuilder("user");
// initializes Memcached's list
builder.servers(initialServerSet);
// creates the cache
final MemcachedCache<String, String> userCache = builder.build();
// if you need to add more Memcached
//userCache.addServer(ADDITIONAL_MEMCACHED_ADDRESS);
// cache operation
final boolean result = userCache.set("name", "foo", expirationTimeoutInSec, false);
final String value = userCache.get("name", false);
//...
// clean
manager.removeCache("user");
manager.shutdown();
ZooKeeper usecase
// gets the cache manager builder
final GrizzlyMemcachedCacheManager.Builder managerBuilder = new GrizzlyMemcachedCacheManager.Builder();
// setup zookeeper server
final ZooKeeperConfig zkConfig = ZooKeeperConfig.create("cache-manager", DEFAULT_ZOOKEEPER_ADDRESS);
zkConfig.setRootPath(ROOT);
zkConfig.setConnectTimeoutInMillis(3000);
zkConfig.setSessionTimeoutInMillis(30000);
zkConfig.setCommitDelayTimeInSecs(60);
managerBuilder.zooKeeperConfig(zkConfig);
// create a cache manager
final GrizzlyMemcachedCacheManager manager = managerBuilder.build();
final GrizzlyMemcachedCache.Builder<String, String> cacheBuilder = manager.createCacheBuilder("user");
// setup memcached servers
final Set<SocketAddress> memcachedServers = new HashSet<SocketAddress>();
memcachedServers.add(MEMCACHED_ADDRESS1);
memcachedServers.add(MEMCACHED_ADDRESS2);
cacheBuilder.servers(memcachedServers);
// create a user cache
final GrizzlyMemcachedCache<String, String> cache = cacheBuilder.build();
// ZooKeeperSupportCache's basic operations
if (cache.isZooKeeperSupported()) {
final String serverListPath = cache.getZooKeeperServerListPath();
final String serverList = cache.getCurrentServerListFromZooKeeper();
cache.setCurrentServerListOfZooKeeper("localhost:11211,localhost:11212");
}
// ...
// clean
manager.removeCache("user");
manager.shutdown();
You can also see various unit test codes for more GrizzlyMemcached's examples here
Pom.xml
<dependency>
<groupId>org.glassfish.grizzly</groupId>
<artifactId>grizzly-memcached</artifactId>
<version>1.0</version>
</dependency>
GrizzlyMemcached is released with v1.0(2012/03/21). And it has a different repository from Grizzly project.
Here are sources and git information.
<a href="http://java.net/projects/grizzly/sources/memcached/show">http://java.net/projects/grizzly/sources/memcached/show</a>
git://java.net/grizzly~memcached (read-only)
Just try to check out sources and experience it.
And any feedbacks, questions and thoughts/opinions are all welcome!
Grizzly mailing: users@grizzly.java.net or dev@grizzly.java.net
- Login or register to post comments
- Printer-friendly version
- carryel's blog
- 2304 reads
Introduction to Grizzly-Thrift and sharing the benchmarking results
This page is for introducing Grizzly-Thrift server/client modules and sharing various benchmarking results.
Object serialization/deserialization of Java comes expensive. For improving this lack, we sometimes used to use other frameworks for RPC such as Protobuf and Thrift which support various programming languages, RPC and own data structures.
Especilally, Thrift has already provided various transport types. Basically, there are TSimpleServer, TThreadPoolServer, TNonblockingServer and THsHaServer for server and there is TSocket for client.
But Thrift's transport layer can be replaced for performance improvement by other NIO frameworks. So I tried to make another transports based on Grizzly and benchmark it experimentally. It's Grizzly-Thrift server/client module.
Grizzly framework is for building scalable and robust servers using NIO and also offering extended framework components: Web Framework (HTTP/S), Bayeux Protocol, Servlet, HttpService OSGi and Comet.
Therefore it was not difficult for me to support Thrift server/client using Grizzly.
- Grizzly-Thrift Server/Client Modules -
Grizzly-Thrift included in Grizzly version 2.2 which released at 2011/12/20 but it was moved into different repository after Grizzly v2.2.2.
You can review and download sources as the following.
<a href="http://java.net/projects/grizzly/sources/thrift/show">http://java.net/projects/grizzly/sources/thrift/show</a>
git://java.net/grizzly~thrift (read-only)
For using Grizzly-Thrift server, you should add ThriftFrameFilter and ThriftServerFilter to Grizzly transport. For using Grizzly-Thrift client, you should add ThriftFrameFilter and ThriftClientFilter to Grizzly transport.
ThriftFrameFilter encodes/decodes Thrift's TFramedTransport which is composed of frame-length header(4bytes) and body. And ThriftServerFilter/ThriftClientFilter interconnects the user's processor/handler with TGrizzlyServerTransport/TGrizzlyClientTransport which extends Thrift's TTransport.
Here are examples for Thrift server/client based on Grizzly.
--- Grizzly-Thrift Server ---
<pre>
final FilterChainBuilder serverFilterChainBuilder = FilterChainBuilder.stateless();
final user-generated.thrift.Processor tprocessor = new user-generated.thrift.Processor(new user-generated.thrift.Handler());
serverFilterChainBuilder.add(new TransportFilter()).add(<strong>new ThriftFrameFilter()</strong>).add(<strong>new ThriftServerFilter(tprocessor)</strong>);
final TCPNIOTransport grizzlyTransport = TCPNIOTransportBuilder.newInstance().build();
grizzlyTransport.setProcessor(serverFilterChainBuilder.build());
grizzlyTransport.bind(port);
grizzlyTransport.start();
--- Grizzly-Thrift Client ---
<pre>
final FilterChainBuilder clientFilterChainBuilder = FilterChainBuilder.stateless();
clientFilterChainBuilder.add(new TransportFilter()).add(<strong>new ThriftFrameFilter()</strong>).add(<strong>new ThriftClientFilter()</strong>);
final TCPNIOTransport grizzlyTransport = TCPNIOTransportBuilder.newInstance().build();
grizzlyTransport.setProcessor(clientFilterChainBuilder.build());
grizzlyTransport.start();
final Future<Connection> future = grizzlyTransport.connection(ip, port);
final Connection connection = future.get(10, TimeUnit.SECONDS);
final TTransport tGrizzlyTransport = <strong>TGrizzlyClientTransport.create(connection)</strong>;
final TProtocol tprotocol = new TBinaryProtocol(tGrizzlyTransport); // or TCompactProtocol
user-generated.thrift.client.Client client = new user-generated.thrift.Client(tprotocol);
// ... user specific client call
If you are already familiar to Thrift, this is easy. If you aren't, I recommend that you review Thrift's tutorial examples first. (See the JavaServer.java and JavaClient.java in Thrift's tutorial.)
Grizzly-Thrift modules already include basic unit tests based on Thrift's tutorial. (See the ThriftTutorialTest.java in Grizzly-Thrift.)
If you are using maven in your project, here are pom.xml's dependencies.
--- pom.xml ---
<pre>
...
<dependency>
<groupId>org.glassfish.grizzly</groupId>
<artifactId>grizzly-framework</artifactId>
<version>2.2.3</version>
</dependency>
<dependency>
<groupId>org.glassfish.grizzly</groupId>
<artifactId>grizzly-thrift</artifactId>
<version>1.0</version>
</dependency>
...
In addition, Grizzly provides various IO strategies such as worker-thread, same-thread and leader-follower so I also tried to test Grizzly-Thrift modules with each IO strategy. If you would like to know more Grizzly's IO strategies, please see the this.
- Benchmarking -
I also benchmarked various Thrift Server-Client modules which are TSocketServer/Client, TThreadpoolServer, TTNonblockingServer, Netty Server/Client and Grizzly Server/Client. I used business operations based on Thrift's tutorial for test but modified a bit logic for packet size.
Test Information
- Server Type/Client Type: TServer-TSocketClient vs TServer-NettyClient vs TServer-GrizzlyClient vs GrizzlyServer-TSocketClient vs GrizlzyServer-GrizzlyClient vs etc...
- Message Size: About 3M Bytes, 3K Bytes, 300 Bytes
- Thrift Protocol: Binary, Compact
- Client Connections: 20~1000
- Server and Client Test Machine Information
- CPU: Intel Xeon 3.3G, 8 Processors, 8 * 4 Cores
- Memory: 16G
- OS: Linux SentOS
- JDK: 1.6.0_29
- Network: 1G
- Versions: Thrift v0.7.0, Grizzly v2.2, Netty v4.0.0, Netty Tools v1.2.8. Most of all are the lastest version(2011/12).
- Scenario
- After 1min warming-up, testing 5min and collecting total results
Benchmarking Results
- 3M + Compact + 40 Connections
Server Types TSocket Client Netty Client Grizzly Client TServer 8,637 478 8,510 TThreadPoolServer 11,221 2,273 11,220 TNonblockingServer 11,223 1,832 11,221 Netty 11,220 2,311 11,220 Grizzly 11,221 1,765 11,225 - Netty client had the performance issue unfortunately, so I would exclude it for next benchmarking.
- 3M + Binary + 40 Connections
Server Types TSocket Client Grizzly Client TThreadPoolServer 11,219 11,215 TNonblockingServer 11,221 11,221 Netty 11,213 11,221 Grizzly 11,220 11,222
In 3M test, Compact/Binary and Server/Client tests were meaningless with regards to performance.
- 3K + Compact + 40 Connections
Server Types Grizzly Client TThreadPoolServer 8,283,705 TNonblockingServer 5,801,319 Netty 9,058,550 Grizzly 8,964,358 Grizzly(SameIO) 9,098,590 - TNonblockingServer had the performance issue. And Netty and Grizzlys' results were better than Thrift server modules'.
- 3K + Binary + 40 Connections
Server Types TSocket Client Grizzly Client TThreadPoolServer 7,619,693 8,163,692 TNonblockingServer 5,444,630 6,032,290 Netty 8,254,168 8,930,896 Grizzly 8,204,097 8,833,978 Grizzly(SameIO) 8,257,918 8,960,497 - Grizzly client module had better performance than TSocket client so I would use only Grizzly client for next benchmarking.
In 3K test, Compact protocol was better than Binary protocol. And Netty and Grizzlys' results were better than Thrift server modules' so I would use only Netty and Grizzly server for next benchmarking.
- 300Bytes + Compact + 20 Connections
Server Types Grizzly Client Netty 10,269,876 Grizzly(SameIO) 10,349,440 Grizzly(LeaderF) 9,654,216
- 300Bytes + Compact + 40 Connections
Server Types Grizzly Client Netty 14,569,820 Grizzly(SameIO) 14,770,452 Grizzly(LeaderF) 13,674,641
- 300Bytes + Compact + 60 Connections
Server Types Grizzly Client Netty 15,783,774 Grizzly(SameIO) 15,962,425 Grizzly(LeaderF) 15,227,426
- 300Bytes + Compact + 80 Connections
Server Types Grizzly Client Netty 16,964,578 Grizzly(SameIO) 16,712,315 Grizzly(Worker) 15,890,537 Grizzly(LeaderF) 16,252,280
- 300Bytes + Compact + 100 Connections
Server Types Grizzly Client Netty 15,879,803 Grizzly(SameIO) 15,781,153 Grizzly(Worker) 16,136,977 Grizzly(LeaderF) 16,437,650
- 300Bytes + Compact + 120 Connections
Server Types Grizzly Client Netty 15,904,968 Grizzly(SameIO) 15,985,106 Grizzly(Worker) 16,097,609 Grizzly(LeaderF) 16,164,636
- 300Bytes + Compact + 150 Connections
Server Types Grizzly Client Netty 15,952,442 Grizzly(SameIO) 16,109,154 Grizzly(Worker) 16,261,584 Grizzly(LeaderF) 15,923,040
- 300Bytes + Compact + 500 Connections
Server Types Grizzly Client Netty 12,463,442 Grizzly(SameIO) 12,499,963 Grizzly(Worker) 12,461,131 Grizzly(LeaderF) 12,532,517
- 300Bytes + Compact + 1000 Connections
Server Types Grizzly Client Netty 11,867,630 Grizzly(SameIO) 11,903,400 Grizzly(Worker) 11,906,507 Grizzly(LeaderF) 11,812,262
In many connections(more than 120 connections), most of servers didn't receive proper requests of client because the client machine of this environment used too much resouces such as high CPU usages. So I think that more client machines are needed to calculate meaningful data of more connections. In 100 connections, Netty and Grizzly's same-thread IO strategy's throughput decreased but Grizzly's woker-thread IO and leader-follower IO strategies' throughput increased.
In my test cases and environments, worker-thread IO strategy and leader-follower IO strategy were more effective than same-thread IO strategy if servers should have more than 100 connections.
- Conclusion -
- Results of 300Bytes + Compact + 40 Connections
Server Types TSocket Client Netty Client Grizzly Client TServer 741,417 604,558 TThreadPoolServer 14,731,560 12,747,230 TNonblockingServer 6,060,111 6,723,402 Netty 14,749,519 14,569,820 Grizzly(SameIO) 14,931,745 9,066,525 14,770,452
- Results of 3KBytes + Compact + 40 Connections
Server Types TSocket Client Netty Client Grizzly Client TServer 631,300 526,341 TThreadPoolServer 7,708,088 8,283,705 TNonblockingServer 5,264,995 5,801,319 Netty 8,372,804 9,058,550 Grizzly(SameIO) 8,381,352 3,718,431 9,098,590
- 300Bytes + Compact + 100 Connections
Server Types Grizzly Client Netty 15,879,803 Grizzly(SameIO) 15,781,153 Grizzly(Worker) 16,136,977 Grizzly(LeaderF) 16,437,650
- Server Module
- Grizzly same-thread IO strategy was best in a few connections. Grizzly leader-follower IO strategy was best in many connections.
- CPU Usages: Netty==GrizzlySameIO < GrizzlyLeaderFollowerIO < GrizzlyWorkerIO
- Client Module
- In small packets, TSocket was best. In larget packets, Grizzly client was best.
- Thrift Protocol
- In this scenario, Compact protocol was best.
Finally, I decided that our company, Kakao would use worker-thread IO strategy for Grizzly-Thrift server in real fields because it was very stable. For client I decided that I would use same-thread IO Strategy because of efficiency.
If you are already using Thrift or have a plan to use Thrift for RPC, just try to apply Grizzly-Thrift! 
(I think these results are only for reference so you could meet different results according to your environments and benchmarking logic).
- Login or register to post comments
- Printer-friendly version
- carryel's blog
- 2487 reads



