Thu, 24 Jun 2010 Comments (5)
Yesterday I wrote some code for parsing an XML file with Erlang's built-in SAX parser. It turned out that the events generated by the SAX parser, combined with the pattern matching in Erlang yielded a really elegant solution.
I'm building a financial information website that is focussed on commodities, so I want to display some graphs with historical commodity prices. There a great source of financial information, Xignite, which is a paid service that offers good APIs (include REST) to obtain market data. The data is delivered in XML format.
Here's an extract of the XML for the Gold future price:
<?xml version="1.0" encoding="utf-8"?>
<FutureQuotes … >
…
<Quotes>
<FutureQuote>
…
<Date>6/22/2010</Date>
<Open>1234.03</Open>
<High>1242.6</High>
<Low>1231.3</Low>
<Last>1239.9</Last>
<Settle>1239.9</Settle>
<Volume>0</Volume>
<OpenInterest>0</OpenInterest>
<PreviousClose>1233.59</PreviousClose>
<Change>6.31</Change>
<PercentChange>0.512</PercentChange>
<Currency>USD</Currency>
</FutureQuote>
<FutureQuote>
…
</FutureQuote>
</Quotes>
</FutureQuotes>
The elements that I'm interested in at this point are "Date", "Open", "High", "Low" and "Last". Let's see what happens when we throw the built-in SAX parser at it:
xmerl_sax_parser:file(
"data/test.xml",
[{event_fun, fun(Event, _Location, _State)->
io:format("~p~n",[Event])
end}]).
The code above simple means "Parse this XML file with the SAX parser and use this anonymous function every time an event is generated". You will notice below how the SAX parser generates events for each start and end element, characters and ignorable white space:
…
{startElement,"http://www.xignite.com/services/","Date",{[],"Date"},[]}
{characters,"12/6/2006"}
{endElement,"http://www.xignite.com/services/","Date",{[],"Date"}}
{ignorableWhitespace,"\n "}
{startElement,"http://www.xignite.com/services/","Open",{[],"Open"},[]}
{characters,"62.2"}
{endElement,"http://www.xignite.com/services/","Open",{[],"Open"}}
{ignorableWhitespace,"\n "}
{startElement,"http://www.xignite.com/services/","High",{[],"High"},[]}
{characters,"62.2"}
{endElement,"http://www.xignite.com/services/","High",{[],"High"}}
{ignorableWhitespace,"\n "}
{startElement,"http://www.xignite.com/services/","Low",{[],"Low"},[]}
{characters,"62.2"}
{endElement,"http://www.xignite.com/services/","Low",{[],"Low"}}
{ignorableWhitespace,"\n "}
…
Now I'll show you how I extracted the "Date" value for each quote into a list of key-value pairs. I've added the initial event state to the parser, which is a tuple with two elements. The first element will be the list of quotes (as key-value tuples), and the second element will contain the last set of parsed characters (initially an empty string/list);
run(FileName)->
{ok,{Quotes, _}, _}= xmerl_sax_parser:file(
FileName,
[{event_fun, fun event/3},
{event_state,{[],""}}]),
Quotes.
For the start elements, I'm interested in the "FutureQuote" element. When this element is encountered, I add a new, empty set of key-value pairs to the existing list of quotes. The second element of the state remains an empty string (the return value from the event function is the new state):
%%Start"FutureQuote" creates a new, empty key-value list
%%for the quote
event(_Event ={startElement, _,"FutureQuote", _, _},
_Location,
_State ={Quotes, _})->
{[[]|Quotes],""};
When the "characters" element is encountered, we store it in the second element of the parser state:
%%Characters are stores in the parser state
event(_Event ={characters,Chars},
_Location,
_State ={Quotes, _})->
{Quotes,Chars};
Now, when we encounter the end element of the "Date" tag, the parser state will have the last encounter characters as the second element. So we add that as the key-value pair to the quote at the head of the list:
%%For the "Date"event,use the lastset of characters encountered
%%for the "Date" property
event(_Event ={endElement, _,"Date", _},
_Location,
_State ={[Quote|Rest],Chars})->
Updated=[{"Date",Chars}|Quote],
{[Updated|Rest],undefined};
Finally, for all other events, we just pass on the current state:
%%Catch-all.Pass state on as-is
event(_Event, _Location,State)->
State.
And now when I run this against the test XML file containing two quotes, the result is a list of key-value pairs with the "Date" values:
1> populate1:run("data/test.xml").
[[{"Date","12/6/2006"}],[{"Date","12/7/2006"}]]
Almost there! The last step is to generalise the elements we are interesed in with a macro. I'm using a macro because it allows me to use the pattern matching directly in the function definitions, as opposed to examining a list of fields of interest on each event. Here we go with the complete module:
-module(populate).
-export([run/1]).
run(FileName)->
{ok,{Quotes, _}, _}= xmerl_sax_parser:file(
FileName,
[{event_fun, fun event/3},
{event_state,{[],""}}]),
Quotes.
%%For the end field event,use the lastset of characters
%% encountered as the value for that field
-define(QUOTE_VALUE(Title),
event(_Event ={endElement, _,Title, _},
_Location,
_State ={[Quote|Rest],Chars})->
Updated=[{Title,Chars}|Quote],
{[Updated|Rest],undefined}).
%%Start"FutureQuote" creates a new, empty key-value list
%%for the quote
event(_Event ={startElement, _,"FutureQuote", _, _},
_Location,
_State ={Quotes, _})->
{[[]|Quotes],""};
%%Characters are stores in the parser state
event(_Event ={characters,Chars},
_Location,
_State ={Quotes, _})->
{Quotes,Chars};
?QUOTE_VALUE("Date");
?QUOTE_VALUE("Open");
?QUOTE_VALUE("High");
?QUOTE_VALUE("Low");
?QUOTE_VALUE("Last");
%%Catch-all.Pass state on as-is
event(_Event, _Location,State)->
State.
And the result:
2> populate:run("data/test.xml").
[[{"Last","62.2"},
{"Low","62.2"},
{"High","62.2"},
{"Open","62.2"},
{"Date","12/6/2006"}],
[{"Last","62.5"},
{"Low","62.5"},
{"High","62.5"},
{"Open","62.5"},
{"Date","12/7/2006"}]]
Voila! I hope that if you love pattern matching, you're nodding. And if you don't (yet), that you try it out for yourself. There are situations such as this where it enables you to write very elegant and maintainable code. But be warned, you'll never want to program without it again!
