Sunday, April 26, 2009

Split autopar in a more clearer way


1. Testing the reason why autopar failed after Graphite pass


I test with flag:
set args -O2 -fgraphite -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-final_cleanup ../../gcc/testsuite/gcc.dg/autopar/parallelization-1.c


the autopar part will fail at function parallelize_loops:
 FOR_EACH_LOOP (li, loop, 0)
{
htab_empty (reduction_list);
if ((/* Do not bother with loops in cold areas. */
optimize_loop_nest_for_size_p (loop)
/* Or loops that roll too little. */
|| expected_loop_iterations (loop) <= n_threads
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The expected_loop_iterations (loop)<= n_threads fails. I think this might be caused by the not correct edge->count and edge->frequency when create_empty_loop_on_edge in translate_clast. And optimize_loop_nest_for_size_p (loop) failed at some testcase, this might be caused by not correctly update of loop->header->frequency in Graphite.
/* TODO: Fix frequencies and counts.  */
freq = EDGE_FREQUENCY (entry_edge);
cnt = entry_edge->count;
So in the patch splitting autopar in a more clearer way We simply bypass this checking. It should be fixed maybe later.

2. Prepare the patch for splitting autopar

In a previous patch, I simply mark all the innermost loop parallel (introduce a bool flag can_be_parallel in loop structure). In this patch, we simply bypass the failed checking when this flag is set. And split autopar in a more clearer way : 1. Checking data dependency part 2. Code generation part
Now it is something like:
  FOR_EACH_LOOP (li, loop, 0)
{
htab_empty (reduction_list);
if (/* Do not bother with loops in cold areas. */
optimize_loop_nest_for_size_p (loop)
/* And of course, the loop must be parallelizable. */
|| !can_duplicate_loop_p (loop)
|| loop_has_blocks_with_irreducible_flag (loop)
/* FIXME: the check for vector phi nodes could be removed. */
|| loop_has_vector_phi_nodes (loop))
continue;

/* FIXME: Bypass this check as graphite doesn't update the
count and frequency correctly now */
if (!loop->can_be_parallel
&& (expected_loop_iterations (loop) <= n_threads
/* Do not bother with loops in cold areas. */
|| optimize_loop_nest_for_size_p (loop)))
continue;
if (!try_get_loop_niter (loop, &niter_desc))
continue;
if (!try_create_reduction_list (loop, reduction_list))
continue;
if (!loop->can_be_parallel && !loop_parallel_p (loop))
continue;
changed = true;
gen_parallel_loop (loop, reduction_list, n_threads, &niter_desc);
3. Plan
  • Regression test for this patch on trunk
  • Write testcases for code generation part, make sure it works correct after Graphite

No comments: