<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>GATE Cloud</title>
<meta name="copyright"
content="GATE Team, University of Sheffield - gate.ac.uk"/>
<link rel="stylesheet" type="text/css" media="screen, projection, print"
href="gslidy/slidy.css"/>
<script src="gslidy/slidy.js"
type="text/javascript"></script>
</head>
<body>
<div class="background">
<img id="head-icon" alt="" align="right" src="http://gate.ac.uk/sale/images/gate4/logo-colour.png" width="150"/>
</div>
<div class="slide">
<h1 class="cow-title-heading">GATE Cloud</h1>
<table> <tr><td>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Hamish Cunningham, <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Valentin Tablan, <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Ian Roberts, <br> <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
University of Sheffield <br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<a class="cow-url" href="http://gate.ac.uk/">http://gate.ac.uk/</a></p>
</td><td>
<p>&nbsp;&nbsp;&nbsp;&nbsp;
<img src="http://gate.ac.uk/sale/images/gate4/splash.png" alt=""GATE"" width="280" height="215" align="top" border="0"></p>
</td><td>
<p><b>Contents</b></p>
<ol>
<li>GATE</li>
<li>GATE Cloud</li>
<li>Futures</li>
</ol>
</td></tr>
</table>

<p></div><div class="slide"></p><h1 class="cow-heading">The GATE Family</h1>
<p>Experimental apparatus, software infrastructure, and technology transfer
vehicle for text analysis.</p>
<ul>
<li>an <b>architecture</b></li>
<li>an IDE: <b>GATE Developer</b>: an integrated development environment for language
processing components bundled with a widely used Information Extraction system
and a comprehensive set of
<a class="cow-url" href="http://gate.ac.uk/gate/doc/plugins.html">other plugins</a></li>
<li>a framework: <b>GATE Embedded</b>: an object library optimised for inclusion in
diverse applications giving access to all the services used by GATE Developer
and more
<ul>
<li>used worldwide by thousands of scientists, companies, teachers and students
(<u>&gt;30k downloads per year at present</u>, not counting SVN)</li>
<li>open source (LGPL), 100% java</li>
</ul></li>
<li>a web app: <b>GATE Teamware</b> a collaborative annotation environment for
factory-style semantic annotation projects built around a workflow engine
<ul>
<li>a process: not &quot;get this software and it will revolutionise your life&quot; but
&quot;this is how to implement robust and maintainable services&quot;</li>
</ul></li>
<li><b>GATE Cloud</b>: a parallel and distributed service infrastructure running on
Amazon EC2</li>
<li><b>GATE M&iacute;mir</b>: (Multi-paradigm Information Management Index and Repository) a
scaleable multiparadigm index built on <a class="cow-url" href="http://www.ontotext.com/">Ontotext</a>'s
<a class="cow-url" href="http://www.ontotext.com/owlim/">semantic repository family</a>, GATE's
annotation structures database plus full-text indexing from
<a class="cow-url" href="http://mg4j.dsi.unimi.it/">MG4J</a></li>
<li>and finally...
<ul>
<li>related tools from Ontotext (OWLIM, KIM, Linked Data endpoints)</li>
<li><a class="cow-url" href="http://gate.ac.uk/gatewiki/cow/">GATE Wiki</a></li>
<li>a <b>community</b></li>
</ul></li>
</ul>
<p></div><div class="slide"></p><h1 class="cow-heading">GATE Cloud (1): the marketing BS</h1>
<p>Cloud computing means many things in many contexts. On <b>GATECloud.net</b> it
means:</p>
<ul>
<li><b>zero fixed costs</b>: you don't buy software licences or server hardware, just
pay for the compute time that you use</li>
<li><b>near zero startup time</b>: in a matter of minutes you can specify, provision
and deploy the type of computation that used to take months of planning</li>
<li><b>easy in, easy out</b>: if you try it and don't like it, go elsewhere! you can
even take the software with you, it's all open source</li>
<li><b>someone else takes the admin load</b>:
<ul>
<li><a class="cow-url" href="http://gate.ac.uk/">the GATE team</a> from the <a class="cow-url" href="http://www.shef.ac.uk/">University of Sheffield</a> make sure you're running the best of breed
technology for text, search and semantics</li>
<li>cloud providers' data center managers (e.g. at <a class="cow-url" href="http://aws.amazon.com/">Amazon Inc.</a>) make sure the hardware and operating platform for your work
is scaleable, reliable and cheap</li>
</ul></li>
</ul>
<p></div><div class="slide"></p><h1 class="cow-heading">GATE Cloud (2): engineering</h1>
<ul>
<li><b>parallel</b> execution engine of automatic annotation processes + <b>distributed</b>
execution of parallel engine</li>
<li><b>scalability</b>: automatic scaling of processor swarms running on top of AWS EC2</li>
<li><b>flexibility</b>: parameters configure behaviour, select the GATE application
being executed, the input protocol used reading documents, the output protocol
used for exporting the resulting annotations, ...</li>
<li><b>robustness</b>: jobs run unattended over large data sets
<ul>
<li>extensively tested and profiled (no memory leaks)</li>
<li>errors and exceptions that occur during processing are trapped and reported</li>
<li>if the process crashes (e.g. hardware failure), can be restarted and resumes
execution where it left off</li>
</ul></li>
</ul>
<p><img src="http://gate.ac.uk/sale/talks/gate-course-may11/gatecloud.net-intro/images/annotation-job.png" alt="A cloud annotation job" width="700"></p>
<p></div><div class="slide"></p><h1 class="cow-heading">GATE Cloud (3): a research perspective</h1>
<ul>
<li>something like the <em>facility</em> that the Information Retrieval Facility was
trying to set up for IR more generally</li>
<li>host a growing family of experimental system configurations, data sets,
results</li>
<li>currently biased heavily towards information extraction (perhaps some
mileage in adding more mainstream IR)</li>
<li>persistence and reuse of experimental setups: virtualisation makes it
possible to store not just data but the entire compute platform operable for
particular experiments or analyses</li>
</ul>
<p></div><div class="slide"></p><h1 class="cow-heading">Futures</h1>
<p>Lessons</p>
<ul>
<li>big data is challenging!</li>
<li>the terms &quot;general purpose&quot; and &quot;large scale&quot; don't sit well together</li>
<li>clouds are moving quickly; how to bet on a winner?</li>
<li>academic clouds need to be as stable as AWS, at least at the API level</li>
<li>commercial clouds not yet a replacement for all academic facilities (e.g.
the cost of an EC2 VM continuously for 1 year ~= cost of buying the
hardware)</li>
</ul>
<p>Plans</p>
<ul>
<li>Amazon Web Services limitations
<ul>
<li>S3 costs are prohibitively high for multi-TB data sets</li>
<li>upload and download are slow (quickest way: a box full of discs...)</li>
<li>move towards free-for-research cloud</li>
<li>make similar facilities available on academic computing services</li>
</ul></li>
<li>need shared repositories for large text collections
<ul>
<li>repeatability of experiments</li>
<li>reducing costs (data on cloud already)</li>
<li>anywhere availability</li>
</ul></li>
</ul>
<p></div><div class="slide"></p><h1 class="cow-heading">Links</h1>
<p>More information</p>
<ul>
<li>GATE home page: <a class="cow-url" href="http://gate.ac.uk/">http://gate.ac.uk/</a></li>
<li>these slides: <a class="cow-url" href="http://gate.ac.uk/hamish/talks/cloud-epsrc-sep-2011-slidy.html">http://gate.ac.uk/hamish/talks/cloud-epsrc-sep-2011-slidy.html</a></li>
<li>GATECloud.net: <a class="cow-url" href="http://gatecloud.net/">http://gatecloud.net/</a></li>
</ul>

</div> </body></html>
