http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-3
In Part 1 and Part 2 of this series we built a comet application using mochiweb, and learned how to route messages to connected users. We managed to squeeze application memory down to 8KB per connection. We did ye olde c10k test, and observed what happened with 10,000 connected users. We made graphs. It was fun, but now it’s time to make good on the claims made in the title, and turn it up to 1 million connections.
This post covers the following:
- Add a pubsub-like subscription database using Mnesia
- Generate a realistic friends dataset for a million users
- Tune mnesia and bulk load in our friends data
- Opening a million connections from one machine
- Benchmark with 1 Million connected users
- Libevent + C for connection handling
- Final thoughts
One of the challenging parts of this test was actually being able to open 1M connections from a single test machine. Writing a server to accept 1M connections is easier than actually creating 1M connections to test it with, so a fair amount of this article is about the techniques used to open 1M connections from a single machine.
Getting our pubsub on
In Part 2 we used the router to send messages to specific users. This is fine for a chat/IM system, but that there are sexier things we could do instead. Before we launch into a large-scale test, let’s add one more module - a subscription database. We want the application store who your friends are, so it can push you all events generated by people on your friends list.
My intention is to use this for Last.fm so I can get a realtime feed of songs my friends are currently listening to. It could equally apply to other events generated on social networks. Flickr photo uploads, Facebook newsfeed items, Twitter messages etc. FriendFeed even have a realtime API in beta, so this kind of thing is definitely topical. (Although I’ve not heard of anyone except Facebook using Erlang for this kind of thing).
Implementing the subscription-manager
We’re implementing a general subscription manager, but we’ll be subscribing people to everyone on their friends list automatically - so you could also think of this as a friends database for now.
The subsmanager API:
- add_subscriptions([{Subscriber, Subscribee},...])
- remove_subscriptions([{Subscriber, Subscribee},...])
- get_subscribers(User)
subsmanager.erl
-module(subsmanager). -behaviour(gen_server). -include("/usr/local/lib/erlang/lib/stdlib-1.15.4/include/qlc.hrl"). -export([init/1, handle_call/3, handle_cast/2, handle_info/2, terminate/2, code_change/3]). -export([add_subscriptions/1, remove_subscriptions/1, get_subscribers/1, first_run/0, stop/0, start_link/0]). -record(subscription, {subscriber, subscribee}). -record(state, {}). % state is all in mnesia -define(SERVER, global:whereis_name(?MODULE)). start_link() -> gen_server:start_link({global, ?MODULE}, ?MODULE, [], []). stop() -> gen_server:call(?SERVER, {stop}). add_subscriptions(SubsList) -> gen_server:call(?SERVER, {add_subscriptions, SubsList}, infinity). remove_subscriptions(SubsList) -> gen_server:call(?SERVER, {remove_subscriptions, SubsList}, infinity). get_subscribers(User) -> gen_server:call(?SERVER, {get_subscribers, User}). %% init([]) -> ok = mnesia:start(), io:format("Waiting on mnesia tables..\n",[]), mnesia:wait_for_tables([subscription], 30000), Info = mnesia:table_info(subscription, all), io:format("OK. Subscription table info: \n~w\n\n",[Info]), {ok, #state{} }. handle_call({stop}, _From, State) -> {stop, stop, State}; handle_call({add_subscriptions, SubsList}, _From, State) -> % Transactionally is slower: % F = fun() -> % [ ok = mnesia:write(S) || S <- SubsList ] % end, % mnesia:transaction(F), [ mnesia:dirty_write(S) || S <- SubsList ], {reply, ok, State}; handle_call({remove_subscriptions, SubsList}, _From, State) -> F = fun() -> [ ok = mnesia:delete_object(S) || S <- SubsList ] end, mnesia:transaction(F), {reply, ok, State}; handle_call({get_subscribers, User}, From, State) -> F = fun() -> Subs = mnesia:dirty_match_object(#subscription{subscriber='_', subscribee=User}), Users = [Dude || #subscription{subscriber=Dude, subscribee=_} <- Subs], gen_server:reply(From, Users) end, spawn(F), {noreply, State}. handle_cast(_Msg, State) -> {noreply, State}. handle_info(_Msg, State) -> {noreply, State}. terminate(_Reason, _State) -> mnesia:stop(), ok. code_change(_OldVersion, State, _Extra) -> io:format("Reloading code for ?MODULE\n",[]), {ok, State}. %% first_run() -> mnesia:create_schema([node()]), ok = mnesia:start(), Ret = mnesia:create_table(subscription, [ {disc_copies, [node()]}, {attributes, record_info(fields, subscription)}, {index, [subscribee]}, %index subscribee too {type, bag} ]), Ret. Noteworthy points:
- I've included qlc.hrl, needed for mnesia queries using list comprehension, using an absolute path. That can't be best practice, it wasn't finding it otherwise though.
get_subscribersspawns another process and delegates the job of replying to that process, usinggen_server:reply. This means the gen_server loop won't block on that call if we throw lots of lookups at it and mnesia slows down.rr(”subsmanager.erl”).in the example below allows you to use record definitions in the erl shell. Putting your record definitions into arecords.hrlfile and including that in your modules is considered better style. I inlined it for brevity.
Now to test it. first_run() creates the mnesia schema, so it’s important to run that first. Another potential gotcha with mnesia is that (by default) the database can only be accessed by the node that created it, so give the erl shell a name, and stick with it.
$ mkdir /var/mnesia $ erl -boot start_sasl -mnesia dir '"/var/mnesia_data"' -sname subsman (subsman@localhost)1> c(subsmanager). {ok,subsmanager} (subsman@localhost)2> subsmanager:first_run(). ... {atomic,ok} (subsman@localhost)3> subsmanager:start_link(). Waiting on mnesia tables.. OK. Subscription table info: ...snipped... {ok,<0.105.0>} (subsman@localhost)4> rr("subsmanager.erl"). [state,subscription] (subsman@localhost)5> subsmanager:add_subscriptions([ #subscription{subscriber=alice, subscribee=rj} ]). ok (subsman@localhost)6> subsmanager:add_subscriptions([ #subscription{subscriber=bob, subscribee=rj} ]). ok (subsman@localhost)7> subsmanager:get_subscribers(rj). [bob,alice] (subsman@localhost)8> subsmanager:remove_subscriptions([ #subscription{subscriber=bob, subscribee=rj} ]). ok (subsman@localhost)9> subsmanager:get_subscribers(rj). [alice] (subsman@localhost)10> subsmanager:get_subscribers(charlie). [] We’ll use integer Ids to represent users for the benchmark - but for this test I used atoms (rj, alice, bob) and assumed that alice and bob are both on rj’s friends list. It’s nice that mnesia (and ets/dets) doesn’t care what values you use - any Erlang term is valid. This means it’s a simple upgrade to support multiple types of resource. You could start using {user, 123} or {photo, 789} to represent different things people might subscribe to, without changing anything in the subsmanager module.
Modifying the router to use subscriptions
Instead of addressing messages to specific users, ie router:send(123, "Hello user 123"), we’ll mark messages with a subject - that is, the person who generated the message (who played the song, who uploaded the photo etc) - and have the router deliver the message to every user who has subscribed to the subject user. In other words, the API will work like this: router:send(123, "Hello everyone subscribed to user 123")
Updated router.erl:
-module(router). -behaviour(gen_server). -export([start_link/0]). -export([init/1, handle_call/3, handle_cast/2, handle_info/2, terminate/2, code_change/3]). -export([send/2, login/2, logout/1]). -define(SERVER, global:whereis_name(?MODULE)). % will hold bidirectional mapping between id <--> pid -record(state, {pid2id, id2pid}). start_link() -> gen_server:start_link({global, ?MODULE}, ?MODULE, [], []). % sends Msg to anyone subscribed to Id send(Id, Msg) -> gen_server:call(?SERVER, {send, Id, Msg}). login(Id, Pid) when is_pid(Pid) -> gen_server:call(?SERVER, {login, Id, Pid}). logout(Pid) when is_pid(Pid) -> gen_server:call(?SERVER, {logout, Pid}). %% init([]) -> % set this so we can catch death of logged in pids: process_flag(trap_exit, true), % use ets for routing tables {ok, #state{ pid2id = ets:new(?MODULE, [bag]), id2pid = ets:new(?MODULE, [bag]) } }. handle_call({login, Id, Pid}, _From, State) when is_pid(Pid) -> ets:insert(State#state.pid2id, {Pid, Id}), ets:insert(State#state.id2pid, {Id, Pid}), link(Pid), % tell us if they exit, so we can log them out %io:format("~w logged in as ~w\n",[Pid, Id]), {reply, ok, State}; handle_call({logout, Pid}, _From, State) when is_pid(Pid) -> unlink(Pid), PidRows = ets:lookup(State#state.pid2id, Pid), case PidRows of [] -> ok; _ -> IdRows = [ {I,P} || {P,I} <- PidRows ], % invert tuples ets:delete(State#state.pid2id, Pid), % delete all pid->id entries [ ets:delete_object(State#state.id2pid, Obj) || Obj <- IdRows ] % and all id->pid end, %io:format("pid ~w logged out\n",[Pid]), {reply, ok, State}; handle_call({send, Id, Msg}, From, State) -> F = fun() -> % get users who are subscribed to Id: Users = subsmanager:get_subscribers(Id), io:format("Subscribers of ~w = ~w\n",[Id, Users]), % get pids of anyone logged in from Users list: Pids0 = lists:map( fun(U)-> [ P || { _I, P } <- ets:lookup(State#state.id2pid, U) ] end, [ Id | Users ] % we are always subscribed to ourselves ), Pids = lists:flatten(Pids0), io:format("Pids: ~w\n", [Pids]), % send Msg to them all M = {router_msg, Msg}, [ Pid ! M || Pid <- Pids ], % respond with how many users saw the message gen_server:reply(From, {ok, length(Pids)}) end, spawn(F), {noreply, State}. % handle death and cleanup of logged in processes handle_info(Info, State) -> case Info of {'EXIT', Pid, _Why} -> handle_call({logout, Pid}, blah, State); Wtf -> io:format("Caught unhandled message: ~w\n", [Wtf]) end, {noreply, State}. handle_cast(_Msg, State) -> {noreply, State}. terminate(_Reason, _State) -> ok. code_change(_OldVsn, State, _Extra) -> {ok, State}. And here’s a quick test that doesn’t require mochiweb - I’ve used atoms instead of user ids, and omitted some output for clarity:
(subsman@localhost)1> c(subsmanager), c(router), rr("subsmanager.erl"). (subsman@localhost)2> subsmanager:start_link(). (subsman@localhost)3> router:start_link(). (subsman@localhost)4> Subs = [#subscription{subscriber=alice, subscribee=rj}, #subscription{subscriber=bob, subscribee=rj}]. [#subscription{subscriber = alice,subscribee = rj}, #subscription{subscriber = bob,subscribee = rj}] (subsman@localhost)5> subsmanager:add_subscriptions(Subs). ok (subsman@localhost)6> router:send(rj, "RJ did something"). Subscribers of rj = [bob,alice] Pids: [] {ok,0} (subsman@localhost)7> router:login(alice, self()). ok (subsman@localhost)8> router:send(rj, "RJ did something"). Subscribers of rj = [bob,alice] Pids: [<0.46.0>] {ok,1} (subsman@localhost)9> receive {router_msg, M} -> io:format("~s\n",[M]) end. RJ did something ok This shows how alice can a receive a message when the subject is someone she is subscribed to (rj), even though the message wasn’t sent directly to alice. The output shows that the router identified possible targets as [alice,bob] but only delivered the message to one person, alice, because bob was not logged in.
Generating a typical social-network friends dataset
We could generate lots of friend relationships at random, but that’s not particularly realistic. Social networks tend to exhibit a power law distribution. Social networks usually have a few super-popular users (some Twitter users have over 100,000 followers) and plenty of people with just a handful of friends. The Last.fm friends data is typical - it fits a Barabási–Albert graph model, so that’s what I’ll use.
To generate the dataset I’m using the python module from the excellent igraph library:
fakefriends.py:
import igraph g = igraph.Graph.Barabasi(1000000, 15, directed=False) print "Edges: " + str(g.ecount()) + " Verticies: " + str(g.vcount()) g.write_edgelist("fakefriends.txt") This will generate with 2 user ids per line, space separated. These are the friend relationships we’ll load into our subsmanager. User ids range from 1 to a million.
Bulk loading friends data into mnesia
This small module will read the fakefriends.txt file and create a list of subscription records.
readfriends.erl - to read the fakefriends.txt and create subscription records:
-module(readfriends). -export([load/1]). -record(subscription, {subscriber, subscribee}). load(Filename) -> for_each_line_in_file(Filename, fun(Line, Acc) -> [As, Bs] = string:tokens(string:strip(Line, right, $\n), " "), {A, _} = string:to_integer(As), {B, _} = string:to_integer(Bs), [ #subscription{subscriber=A, subscribee=B} | Acc ] end, [read], []). % via: http://www.trapexit.org/Reading_Lines_from_a_File for_each_line_in_file(Name, Proc, Mode, Accum0) -> {ok, Device} = file:open(Name, Mode), for_each_line(Device, Proc, Accum0). for_each_line(Device, Proc, Accum) -> case io:get_line(Device, "") of eof -> file:close(Device), Accum; Line -> NewAccum = Proc(Line, Accum), for_each_line(Device, Proc, NewAccum) end. Now in the subsmanager shell, you can read from the text file and add the subscriptions:
$ erl -name router@minifeeds4.gs2 +K true +A 128 -setcookie secretcookie -mnesia dump_log_write_threshold 50000 -mnesia dc_dump_limit 40 erl> c(readfriends), c(subsmanager). erl> subsmanager:first_run(). erl> subsmanager:start_link(). erl> subsmanager:add_subscriptions( readfriends:load("fakefriends.txt") ). Note the additional mnesia parameters - these are to avoid the ** WARNING ** Mnesia is overloadedmessages you would (probably) otherwise see. Refer to my previous post: On bulk loading data into Mnesiafor alternative ways to load in lots of data. The best solution seems to be (as pointed out in the comments, thanks Jacob!) to set those options. The Mnesia reference manual contains many other settings under Configuration Parameters, and is worth a look.
Turning it up to 1 Million
Creating a million tcp connections from one host is non-trivial. I’ve a feeling that people who do this regularly have small clusters dedicated to simulating lots of client connections, probably running a real tool like Tsung. Even with the tuning from Part 1 to increase kernel tcp memory, increase the file descriptor ulimits and set the local port range to the maximum, we will still hit a hard limit on ephemeral ports.
When making a tcp connection, the client end is allocated (or you can specify) a port from the range in/proc/sys/net/ipv4/ip_local_port_range. It doesn’t matter if you specify it manually, or use an ephemeral port, you’re still going to run out.
In Part 1, we set the range to “1024 65535” - meaning there are 65535-1024 = 64511 unprivileged ports available. Some of them will be used by other processes, but we’ll never get over 64511 client connections, because we’ll run out of ports.
The local port range is assigned per-IP, so if we make our outgoing connections specifically from a range of different local IP addresses, we’ll be able to open more than 64511 outgoing connections in total.
So let’s bring up 17 new IP addresses, with the intention of making 62,000 connections from each - giving us a total of 1,054,000 connections. Safely over the 2^32 mark:
$ for i in `seq 1 17`; do echo sudo ifconfig eth0:$i 10.0.0.$i up ; done If you run ifconfig now you should see your virtual interfaces: eth0:1, eth0:2 … eth0:17, each with a different IP address. Obviously you should chose a sensible part of whatever address space you are using.
All that remains now is to modify the floodtest tool from Part 1 to specify the local IP it should connect from… Unfortunately the erlang http client doesn’t let you specify the source IP. Neither does ibrowse, the alternative http client library. Damn.
Crazy Idea.. At this point I considered another option: bringing up 17 pairs of IPs - one on the server and one on the client - each pair in their own isolated /30 subnet. I think that if I then made the client connect to any given server IP, it would force the local address to be other half of the pair on that subnet, because only one of the local IPs would actually be able to reach the server IP. In theory, this would mean declaring the local source IP on the client machine would not be necessary (although the range of server IPs would need to be specified). I don’t know if this would really work - it sounded plausible at the time. In the end I decided it was too perverted and didn’t try it.
I also poked around in OTP’s http_transport code and considered adding support for specifying the local IP. It’s not really a feature you usually need in an HTTP client though, and it would certainly have been more work.
Note: gen_tcp lets you specify the source address, so I ended up writing a rather crude client usinggen_tcp specifically for this test:
floodtest2.erl
-module(floodtest2). -compile(export_all). -define(SERVERADDR, "10.1.2.3"). % where mochiweb is running -define(SERVERPORT, 8000). % Generate the config in bash like so (chose some available address space): % EACH=62000; for i in `seq 1 17`; do echo "{ {10,0,0,$i}, $((($i-1)*$EACH+1)), $(($i*$EACH)) }, "; done run(Interval) -> Config = [ { {10,0,0,1}, 1, 62000}, { {10,0,0,2}, 62001, 124000}, { {10,0,0,3}, 124001, 186000}, { {10,0,0,4}, 186001, 248000}, { {10,0,0,5}, 248001, 310000}, { {10,0,0,6}, 310001, 372000}, { {10,0,0,7}, 372001, 434000}, { {10,0,0,8}, 434001, 496000}, { {10,0,0,9}, 496001, 558000}, { {10,0,0,10}, 558001, 620000}, { {10,0,0,11}, 620001, 682000}, { {10,0,0,12}, 682001, 744000}, { {10,0,0,13}, 744001, 806000}, { {10,0,0,14}, 806001, 868000}, { {10,0,0,15}, 868001, 930000}, { {10,0,0,16}, 930001, 992000}, { {10,0,0,17}, 992001, 1054000}], start(Config, Interval). start(Config, Interval) -> Monitor = monitor(), AdjustedInterval = Interval / length(Config), [ spawn(fun start/5, [Lower, Upper, Ip, AdjustedInterval, Monitor]) || {Ip, Lower, Upper} <- Config ], ok. start(LowerID, UpperID, _, _, _) when LowerID == UpperID -> done; start(LowerID, UpperID, LocalIP, Interval, Monitor) -> spawn(fun connect/5, [?SERVERADDR, ?SERVERPORT, LocalIP, "/test/"++LowerID, Monitor]), receive after Interval -> start(LowerID + 1, UpperID, LocalIP, Interval, Monitor) end. connect(ServerAddr, ServerPort, ClientIP, Path, Monitor) -> Opts = [binary, {packet, 0}, {ip, ClientIP}, {reuseaddr, true}, {active, false}], {ok, Sock} = gen_tcp:connect(ServerAddr, ServerPort, Opts), Monitor ! open, ReqL = io_lib:format("GET ~s\r\nHost: ~s\r\n\r\n", [Path, ServerAddr]), Req = list_to_binary(ReqL), ok = gen_tcp:send(Sock, [Req]), do_recv(Sock, Monitor), (<%
浙公网安备 33010602011771号