Ville Tuulos
Erlang hacker at Nokia Research and initiator of the Disco Project
Nokia
Ville Tuulos is a researcher with Nokia Research in Palo Alto. He has been working with large data sets since 1999, building solutions for statistical information retrieval. After several misguided attempts to orchestrate highly distributed systems in C and Python, he found Erlang in 2006. He is also co-author of the book "Mobile Python - Rapid application development on the mobile platform". In 2007 he started to build Disco, an Erlang / Python implementation of the Map/Reduce framework for distributed computing. Disco is now used by Nokia and others for quick prototyping of data-intensive software, using hundreds of gigabytes of real-world data.
- Ville at Nokia Research
- Ville at University of Helsinki
- Disco Project
Ville Tuulos is Giving the Following Talks
Discodex: intuitive data indexing
Disco combines the strengths of Erlang and Python to enable rapid development of massively parallel computational pipelines. Disco implements the MapReduce framework, making it a powerful platform for doing distributed computing on immense datasets.
The first step to building a system driven by data, is indexing the data in such a way that it is accessible in logarithmic or constant time. Such random access is crucial for building online systems, but also valuable in optimizing many other applications which rely upon lookups into the data.
`Discodex` builds on top of Disco,abstracting away some of the most common operations for organizing piles of raw data into distributed, append-only indices and querying them. By adopting erlang-style immutability of data structures, itis possible to index and query billions of data items efficiently. Discodex adopts a similar strategy to Disco in achieving this goal: making the interface so embarrassingly simple and intuitive, that development time is never an excuse for not building an index.
In this talk we discuss the architecture of this awesome, open-source tool (with Erlang at its heart), and how to use it. We also provide a real-world example of using Discodex for data insight at Nokia, and the reason we built it in the first place.
The first step to building a system driven by data, is indexing the data in such a way that it is accessible in logarithmic or constant time. Such random access is crucial for building online systems, but also valuable in optimizing many other applications which rely upon lookups into the data.
`Discodex` builds on top of Disco,abstracting away some of the most common operations for organizing piles of raw data into distributed, append-only indices and querying them. By adopting erlang-style immutability of data structures, itis possible to index and query billions of data items efficiently. Discodex adopts a similar strategy to Disco in achieving this goal: making the interface so embarrassingly simple and intuitive, that development time is never an excuse for not building an index.
In this talk we discuss the architecture of this awesome, open-source tool (with Erlang at its heart), and how to use it. We also provide a real-world example of using Discodex for data insight at Nokia, and the reason we built it in the first place.