Implicit Parallelism through Deep Language Embedding-Reference-Cited by-同舟云学术

Implicit Parallelism through Deep Language Embedding

Published:2016-06-02 Issue:1 Volume:45 Page:51-58
ISSN:0163-5808
Container-title:ACM SIGMOD Record
language:en
Short-container-title:SIGMOD Rec.

Author:

Alexandrov Alexander¹,Katsifodimos Asterios¹,Krastev Georgi¹,Markl Volker¹

Affiliation:

1. TU Berlin

Abstract

Parallel collection processing based on second-order functions such as map and reduce has been widely adopted for scalable data analysis. Initially popularized by Google, over the past decade this programming paradigm has found its way in the core APIs of parallel dataflow engines such as Hadoop's MapReduce, Spark's RDDs, and Flink's DataSets. We review programming patterns typical of these APIs and discuss how they relate to the underlying parallel execution model. We argue that fixing the abstraction leaks exposed by these patterns will reduce the cost of data analysis due to improved programmer productivity. To achieve that, we first revisit the algebraic foundations of parallel collection processing. Based on that, we propose a simplified API that (i) provides proper support for nested collection processing and (ii) alleviates the need of certain second-order primitives through comprehensions -- a declarative syntax akin to SQL. Finally, we present a metaprogramming pipeline that performs algebraic rewrites and physical optimizations which allow us to target parallel dataflow engines like Spark and Flink with competitive performance.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2949741.2949754

Reference23 articles.

1. Cascading Project. http://www.cascading.org/. Cascading Project. http://www.cascading.org/.

2. Emma Language. http://www.emma-language.org/. Emma Language. http://www.emma-language.org/.

3. The Stratosphere platform for big data analytics

4. Implicit Parallelism through Deep Language Embedding

5. Jaql

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Preprocessor for Creation of Large-Scale Computational Applications;2024 IEEE 25th International Conference of Young Professionals in Electron Devices and Materials (EDM);2024-06-28

2. Automatic Decomposition of a Sequential Algorithm for MapReduce Frameworks;2022 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON);2022-11-11

3. A survey on machine learning in array databases;Applied Intelligence;2022-08-12

4. TraNCE;Proceedings of the VLDB Endowment;2021-07

5. Declarative Data Analytics: A Survey;IEEE Transactions on Knowledge and Data Engineering;2021-06-01