Sunday, April 26, 2009

Split autopar in a more clearer way


1. Testing the reason why autopar failed after Graphite pass


I test with flag:
set args -O2 -fgraphite -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-final_cleanup ../../gcc/testsuite/gcc.dg/autopar/parallelization-1.c


the autopar part will fail at function parallelize_loops:
 FOR_EACH_LOOP (li, loop, 0)
{
htab_empty (reduction_list);
if ((/* Do not bother with loops in cold areas. */
optimize_loop_nest_for_size_p (loop)
/* Or loops that roll too little. */
|| expected_loop_iterations (loop) <= n_threads
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The expected_loop_iterations (loop)<= n_threads fails. I think this might be caused by the not correct edge->count and edge->frequency when create_empty_loop_on_edge in translate_clast. And optimize_loop_nest_for_size_p (loop) failed at some testcase, this might be caused by not correctly update of loop->header->frequency in Graphite.
/* TODO: Fix frequencies and counts.  */
freq = EDGE_FREQUENCY (entry_edge);
cnt = entry_edge->count;
So in the patch splitting autopar in a more clearer way We simply bypass this checking. It should be fixed maybe later.

2. Prepare the patch for splitting autopar

In a previous patch, I simply mark all the innermost loop parallel (introduce a bool flag can_be_parallel in loop structure). In this patch, we simply bypass the failed checking when this flag is set. And split autopar in a more clearer way : 1. Checking data dependency part 2. Code generation part
Now it is something like:
  FOR_EACH_LOOP (li, loop, 0)
{
htab_empty (reduction_list);
if (/* Do not bother with loops in cold areas. */
optimize_loop_nest_for_size_p (loop)
/* And of course, the loop must be parallelizable. */
|| !can_duplicate_loop_p (loop)
|| loop_has_blocks_with_irreducible_flag (loop)
/* FIXME: the check for vector phi nodes could be removed. */
|| loop_has_vector_phi_nodes (loop))
continue;

/* FIXME: Bypass this check as graphite doesn't update the
count and frequency correctly now */
if (!loop->can_be_parallel
&& (expected_loop_iterations (loop) <= n_threads
/* Do not bother with loops in cold areas. */
|| optimize_loop_nest_for_size_p (loop)))
continue;
if (!try_get_loop_niter (loop, &niter_desc))
continue;
if (!try_create_reduction_list (loop, reduction_list))
continue;
if (!loop->can_be_parallel && !loop_parallel_p (loop))
continue;
changed = true;
gen_parallel_loop (loop, reduction_list, n_threads, &niter_desc);
3. Plan
  • Regression test for this patch on trunk
  • Write testcases for code generation part, make sure it works correct after Graphite

Tuesday, April 21, 2009

A general plan for this project

I will be working with great Graphtie developers this summer, try to implement the project parallel code generation in Graphite, you can find a short description about this project here. And also you can find my application here where I removed the personal information.

This blog will mainly focus on this summer project: 1. plans 2. what I have done 3. related Graphite internals

A general plan for what I will be doing for the next few weeks during summer of code:

  1. Mark the innermost loop parallel [done]
  2. Try to schedule autopar pass after Graphite, and enable code generation if flag_graphite_force_parallel is set
    • There should be some discussion with Razya about her plan about the autopar part
    • But before that, I'll try to schedule autopar first
  3. I may try to write testcases for the loops that should be parallel, from easy to hard, and check autopar's code generation part, make sure this works correctly as we expected.
    • The testcases is important. There should be some detailed discussion maybe with Sebastian and Konrad. To see what kind of loop we can/decide to handle.
    • Check autopar's code generation with flag_graphite_force_parallel set with these testcases, report bugs if it goes wrong.
  4. Try to write code for deciding if a loop can be parallel with data dependency test under this polyhedral model.
    • Try to understand the interface of data dependency test
    • Write code, if data dependency success, mark the loop parallel