Information about the job

When you run your mapreduce job you are obviously mostly interested in the result of the job. However, mapreduce jobs may take a long time and it may be useful for you to get some information on where this time is spend, how many processes were used, how many temporary files were created, etc. You can use this information to tweak your mapreduce algorithm or to fine tune the behavior of the job by using its tuning settings.

We provide you all the information about a finished job in the Yothalot::Result class. This Yothalot::Result class holds results on the behavior of a job. An instance of this class that holds all the information of the job is returned by the wait() method of the Yothalot::Job class. The class provides you all the information on the behaviour of the job you want and probably some extra. However, if there is relevant information that you would like to have but is currently not provided, please contact us by sending an email to info@copernica.com

The Yothalot::Result class

The Yothalot::Result class provides the general information of the job, i.e. the time when the job was started and the runtime. The class also gives you access to classes that hold statistics on the individual steps of the mapreduce algorithm, i.e. the mapper step, the reducer step, and the finalizer step. The interface of the Yothalot\Result class looks as follows:

namespace Yothalot {
class Result
{
public:
    /**
     *  Get the time when the job was started (in Unix time)
     */
    double started();

    /**
     *  Get the runtime of the job (in seconds)
     */
    double runtime();

    /**
     *  Get the statistics class of the mappers of the job
     */
    Stats mappers();

    /**
     *  Get the statistics class of the reducers of the job
     */
    Stats reducers();

    /**
     *  Get the statistics class of the finalizers of the job
     */
    Stats finalizers();
};
}

Using this class is simple. You can call wait() from your job and retrieve the results and call the members that you are interested in. You can use it e.g. like this:

/**
 *  call wait on your job and get the results
 */
Yothalot::Result result = job.wait();

/**
 *  print some of the results
 */
std::cout << "The job started on: " << result.started() << std::endl;
std::cout << "The runtime was:    " << result.runtime() << std::endl;

The Yothalot::Stats class

A mapreduce job has three basic steps. A mapper step, a reducer step and a finalizer, or writer, step. Information on each step is stored in the Yothalot\Stats class and can be retrieved via the above listed members mappers(), reducers(), and finalizers(). The interface of this class looks as follows.

namespace Yothalot {
class Stats
{
public:
    /**
     *  get the time the first mapper, reducer, or finalizer was started (in Unix time)
     */
    double first();

    /**
     *  get the time the last mapper, reducer, or finalizer was started (in Unix time)
     */
    double last();

    /**
     *  get the time the last mapper, reducer, or finalizer was finished (in Unix time)
     */
    double finished();

    /**
     *  get the running time of the mappers, reducers, or finalizers (in seconds)
     */
    double runtime();

    /**
     *  get the running time of the fastest mapper, reducer, or finalizer (in seconds)
     */
    double fastest();

    /**
     *  get the running time of the slowest mapper, reducer, or finalizer (in seconds)
     */
    double slowest();

    /**
     *  get the number of mapper, reducer, or finalizer processes
     */
    int64_t processes();

    /**
     *  get an object with information on the input used by the mappers,
     *  reducers, or finalizers
     */
    DataStats input();

    /**
     *  get an object with information on the output generated by the mappers,
     *  reducers, or finalizers
     */
    DataStats output();
};
}

All the members can be called on an object you get from calling mappers(), reducers(), and finalizers(), or you can call them directly as listed below.

/**
 *  call wait on your job and get the results
 */
Yothalot::Result result = job.wait();

/**
 *  print some results on the mappers and reducers
 */
std::cout << "The first mapper started on: " << result.mappers().first() << std::endl;
std::cout << "The runtime of the fastest reducer was: " result.reducers().fastest() << std::endl;

As you can see you can get quite some details about the job. You can e.g. see which step is spending the most time. This information may help you to adjust your algorithm for the job or fine tune the job behavior.

The Yothalot\DataStats class

The mapper, reducer, and finalizer steps may produce temporary files that are used by the step itself or are used to pass information from one step to the next one. You can use the Yothalot\DataStas object to get insight in the amount of temporary files and their sizes that are consumed and created in each step. The interface of Yothalot\DataStats class is as follows:

namespace Yothalot {
class DataStats
{
public :
    /**
     *  get the number of files
     */
    int64_t files();

    /**
     *  get the number of bytes
     */
    int64_t bytes();
};
}

You can use it like:

/**
 *  call wait on your job and get the results
 */
Yothalot::Result result = job.wait();

/**
 *  print some of the results on temporary files and bytes
 */
std::cout << "The number of temporary bytes produced by the mapper is:  " << result.mappers().output().bytes() << std::endl;
std::cout << "The number of temporary files consumed by the reducer is: " << result.reducers().input().files()." << std::endl;

The information on the number of files and their sizes may again help you to adjust your algorithm and use the tuning settings to increase the performance of your mapreduce job.