Saturday, July 25, 2009
We've got our (Graphite) Developers' Maillist.
Status about about autopar in Graphite:
1. It now works.
Autopar in Graphite can triggered by flag -fgraphite-force-parallel -ftree-parallelize-loops=4(thread number). Dependency checking is done during Graphite pass and code generation part will be done in autopar pass.
2. It now can parallel loops with if statement.
3. It now can work with some of the triangle loops.
Yes, some of the triangle loops we can't parallel restricted by the SCoP detection part. It will raise "number of iterations not known." For details, you could refer to this post.
4. It now only works with innermost loop. Due to a tests(the bottom of this page) for the runtime of autopar, it seems that we should always parallel the outermost first. The support for outerloop code generation will be soon developed by Razya.
Currently, I'm working on passing alias information and use this information to filter unnecessary dependency checking between 2pdrs. You can refer to this page and this page. Now in Graphite, it is implemented in a conservative way.
Monday, May 25, 2009
Data dependency algorithms under polyhedral model vs. normal algorithms
Theoretically[1]:
Data dependency algorithms in trunk is mostly with I test and Omega test. Both of them are limited. Here is a comparison between them.
- Precise level
- I test (GCD test and Banerjee test): Only evaluates the ability to prove or disprove dependence between statements.
- Omega test: Can give a precise level (Tell which iteration of statements are in dependent), but less precise when there is one or more integer symbolic constant are present.
- Polyhedral dependence: Give the right precision level ( Will tell which iteration of statements are in dependent).
- Triangle loops
- I test and Omega test: Can not handle.
- Polyhedral dependence: Can handle.
- conditionals with boolean expressions of affine inequalities
- I test and Omega test: Can not handle.
- Polyhedral dependence: Can handle.
- Cost
- I test: It cost less and is always preferred because they capture most dependency under a low computational cost.
- Omega test: Worst exponential complexities, but works good in practice.
- Polyhedral dependence: High cost.
Reference: [1] Violated Dependence Analysis by Nicolas Vasilache, Albert Cohen, Cedric Bastoul, Sylvain Girbal.
Practically (Implement in Graphite):
/* FIXME: Need to be added. */
Tuesday, May 19, 2009
Testing result of run-time after loop-parallelization
using gladder to get access. But it's lucky that blogger has a email
post functionality, so, it's fine.
I have tested some simple loops which parallelized by autopar
to see how much promotion we can got. This test results now
is available in gcc wiki:
http://gcc.gnu.org/wiki/AutoparRelated.
Mainly include:
1. One nested loop, number of threads vs. run-time, with
not all the code parallel.
2. One nested loop, number of threads vs. run-time, almost
all code got parallel.
3. Two nested loops, with innermost parallel, number of threads
vs. run-time. (It got worse after parallel.)
4. Two nested loops, with outer loop parallel, number of threads
vs. run-time. (It's better).
The 4 figures include comparing of:
- Runtime of openmp directive inserted c code.
- Runtime of Graphite autoparallel c code.
- Runtime of non-parallel c code.
For this week, I'm planning to do some work with
the already-done data dependency work. Maybe firstly write
some test cases for the data dependency checking to see what
kinds of loops we could/want parallelize.
Thursday, May 7, 2009
Tips for reading code
Using cscope instead of grep
In gcc code base, the definition of a function is always writen like:
intSo that when you wanner find a certain definition, you can use something like:
some_func(int *)
{
...
}
grep *.c ^some_funcThis is not always convenient especially when you want to find definition in a code based not styled like gcc style. With cscope, you can do more...
- Find global definition
- Find functions calling a function
- Find egrep pattern (I like this, it's super nice)
- More...
Yes, this happens when you trying to find the definition of struct which marked by GTY(()), which is related to a sophisticated memory management techniques. But whenever their is a question, there is always a solution just not far away. We can simply find these definitions with cscope's finding egrep pattern method, then you can find patterns like
struct GTY ((chain_next ("%h.next"))) loop
. It's easy, hmm :)Cscope supported emacs and it's integrated in a kind interface. But I think it'll be better if it can do something like etags works in emacs -- Go back as farther from where I start the find command as it can do... I like this very much, you may want to go back to your place after you read the definition's definition's definition...
Sometimes we need to use find+grep
Sometimes it's more convenient to using unix's super good cmd tool like find grep etc. to search the code base.
If you want to find in gcc/testsuite 's expect file (*.exp) a certain pattern, say DEFAULT_CFLAGS. You can simply use:
find . -iname "*.exp" -exec grep DEFAULT_CFLAGS {} \;
^L
:)Monday, May 4, 2009
write testsuites for autopar's code generation part
testsuites/gcc.dg/graphite/graphite_autopar
. These test cases will PASS all the 9 checkings (3 checkings per test). Unfortunately, the reduction tests (I didn't include in the testsuites) will fail when collecting reduction information. I'll work with Razya to findout why.Basic plans:
- Regression test for the patch(split autopar) for trunk.
- Try to merge/update Graphite's autopar code generation part with trunk (This part need some time, maybe I'll try to findout if we can do this appropriately first).
- Testsuites of autopar in trunk didn't actually run (just compile and analysis dump file), so maybe I'll submit a patch for that.
Sunday, April 26, 2009
Split autopar in a more clearer way
1. Testing the reason why autopar failed after Graphite pass
I test with flag:
set args -O2 -fgraphite -ftree-parallelize-loops=4 -fdump-tree-parloops-details -fdump-tree-final_cleanup ../../gcc/testsuite/gcc.dg/autopar/parallelization-1.c
the autopar part will fail at function
parallelize_loops
:FOR_EACH_LOOP (li, loop, 0)The
{
htab_empty (reduction_list);
if ((/* Do not bother with loops in cold areas. */
optimize_loop_nest_for_size_p (loop)
/* Or loops that roll too little. */
|| expected_loop_iterations (loop) <= n_threads
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
expected_loop_iterations (loop)<= n_threads
fails. I think this might be caused by the not correct edge->count
and edge->frequency
when create_empty_loop_on_edge
in translate_clast
. And optimize_loop_nest_for_size_p (loop)
failed at some testcase, this might be caused by not correctly update of loop->header->frequency in Graphite./* TODO: Fix frequencies and counts. */So in the patch splitting autopar in a more clearer way We simply bypass this checking. It should be fixed maybe later.
freq = EDGE_FREQUENCY (entry_edge);
cnt = entry_edge->count;
2. Prepare the patch for splitting autopar
In a previous patch, I simply mark all the innermost loop parallel (introduce a bool flag can_be_parallel in loop structure). In this patch, we simply bypass the failed checking when this flag is set. And split autopar in a more clearer way : 1. Checking data dependency part 2. Code generation part
Now it is something like:
FOR_EACH_LOOP (li, loop, 0)3. Plan
{
htab_empty (reduction_list);
if (/* Do not bother with loops in cold areas. */
optimize_loop_nest_for_size_p (loop)
/* And of course, the loop must be parallelizable. */
|| !can_duplicate_loop_p (loop)
|| loop_has_blocks_with_irreducible_flag (loop)
/* FIXME: the check for vector phi nodes could be removed. */
|| loop_has_vector_phi_nodes (loop))
continue;
/* FIXME: Bypass this check as graphite doesn't update the
count and frequency correctly now */
if (!loop->can_be_parallel
&& (expected_loop_iterations (loop) <= n_threads
/* Do not bother with loops in cold areas. */
|| optimize_loop_nest_for_size_p (loop)))
continue;
if (!try_get_loop_niter (loop, &niter_desc))
continue;
if (!try_create_reduction_list (loop, reduction_list))
continue;
if (!loop->can_be_parallel && !loop_parallel_p (loop))
continue;
changed = true;
gen_parallel_loop (loop, reduction_list, n_threads, &niter_desc);
- Regression test for this patch on trunk
- Write testcases for code generation part, make sure it works correct after Graphite
Tuesday, April 21, 2009
A general plan for this project
I will be working with great Graphtie developers this summer, try to implement the project parallel code generation in Graphite, you can find a short description about this project here. And also you can find my application here where I removed the personal information.
This blog will mainly focus on this summer project: 1. plans 2. what I have done 3. related Graphite internals
A general plan for what I will be doing for the next few weeks during summer of code:
- Mark the innermost loop parallel [done]
- Try to schedule autopar pass after Graphite, and enable code generation if flag_graphite_force_parallel is set
- There should be some discussion with Razya about her plan about the autopar part
- But before that, I'll try to schedule autopar first
- I may try to write testcases for the loops that should be parallel, from easy to hard, and check autopar's code generation part, make sure this works correctly as we expected.
- The testcases is important. There should be some detailed discussion maybe with Sebastian and Konrad. To see what kind of loop we can/decide to handle.
- Check autopar's code generation with flag_graphite_force_parallel set with these testcases, report bugs if it goes wrong.
- Try to write code for deciding if a loop can be parallel with data dependency test under this polyhedral model.
- Try to understand the interface of data dependency test
- Write code, if data dependency success, mark the loop parallel