SGD’s scalability is limited by its inherently sequential nature; it is difficult to par- allelize.

In this work, we propose a simple strategy for eliminating the overhead associated with locking:run SGD in parallel without locks。

which allows processors access to shared memory with the possibility of over- writing each other’s work. We show that when the associated optimization problem issparse, meaning most gradient updates only modify small parts of the decision variable, thenHogwild!achieves a nearly optimal rate of convergence.

MapReduce

As many large data sets are currently pre-processed in a MapReduce-like parallel-processing framework, much of the recent work on parallel SGD has focused naturally on MapReduce implementations. MapReduce is a powerful tool developed at Google for extracting information from huge logs (e.g., “find all the urls from a 100TB of Web data”) that was designed to ensure fault tolerance and to simplify the maintenance and programming of large clusters of machines [9]. But MapReduce is not ideally suited for online, numerically intensive data analy- sis. Iterative computation is difficult to express in MapReduce, and the overhead to ensure fault tolerance can result in dismal throughput.

Multicore systems

Multicore systems have significant performance advantages, including (1) low latency and high throughput shared main memory (a processor in such a system can write and read the shared physical memory at over 12GB/s with latency in the tens of nanoseconds); and (2) high bandwidth off multiple disks (a thousand-dollar RAID can pump data into main memory at over 1GB/s).

A typical MapReduce setup will read incoming data at rates less than tens of MB/s due to frequent check- pointing for fault tolerance. The high rates achievable by multicore systems move the bottlenecks in parallel computation to synchronization (or locking) amongst the processors [2,13].

Thus, to enable scalable data analysis on a multicore machine, any performant solution must minimize the overhead of locking.

results matching ""

    No results matching ""