- For each execution we have thousands of data: hadoop counters, performance metrics, etcetera.
- We have thousands of executions: this implies too much data impossible to analyze "by hand".
This sections presents the benchmark execution repository. It features more than 4000 executions and counting.
This tool allows you to browse, filter, search, and select distinct executions to compare and analyse its execution details.
This section presents the evolution of several metrics within an execution. For instance, how many MB are read for each of the execution's jobs tasks.
The Hadoop Job Counters sections allows to browse the counters output at each of the Hadoop executions, filter them, and to order by a specific counter the selected runs (or all).
This section allows the user to see the best configuration found for a given benchmark. It also allows to find it filtering some parameters such as kind of cluster, number of mappers, block size, etcetera.
The Configuration Improvement sections evaluates the SPEED-UP improvement by different Hardware and Software configurations of the Hadoop executions.
The page allows to filter and group results according to their execution configurations.
The Cost/Performance evaluation tool, presents a cloud of points of the different Hadoop executions and evaluates the cost-effectiveness of both
The Parameter Evaluation presents a column chart to see how much time it took executions to ran for each possible value of a selected parameter. In the
picture an example of how the number of mappers affect the execution time is given.
The Performance Charts sections allows first to get a visual glance at the full Hadoop execution of the selected runs by analyzing each of the Hadoop phases: map, merge, shuffle, and reduce.
The Performance Metrics section shows all the performance metrics collected during benchmark's executions, allowing the user to see how many CPU, Network, I/O and Memory was used.
Frameworks & tools used
Two main tools are used:
This tools altogether allows us to set up servers from scratch in an automatized way. A production-ready server is currently created and setted up in 5 minutes.
This is done in two ways depending on the environment:
Benchmarks using apache benchmark (ab) with 10.000 requests, 50 max concurrent, requesting benchmark executions data.
Retrieving executions data:
|Average # requests per second||Average time to serve requests||Standard deviation to serve requests||Median||25||1516 ms||1996 ms||495 ms|
Retrieving executions front-page (it is done concurrently to retrieving data):
|Average # requests per second||Average time to serve requests||Standard deviation to serve requests||Median||216.46||178 ms||230 ms||240 ms|
populated from slide_config.json