While refactoring code for a Java Barnes-hut simulation, i came across an explicit unrolling of loops. At a high level, the bodies are stored in an Oct-tree and so each cluster would have eight neighbors for each of the subspace in three dimensions. Once the tree has been constructed, the nodes are compacted. This means, since the neighbors for the Oct-tree are stored in an array, that any non-existent neighbors are removed so a pass over all the neighbors can break on first non-existent neighbor. In order to compute the force acting on each body, we recursively iterate over the tree. This means at each node of the Oct-tree, we have to iterate over all the neighbors, and that means going up to 8 nodes. The simplest way would be to have a loop go over all the bodies and breaking on the first non-existent neighbor. Hence, something like :

```GNode child = null;
for(int i=0;i<8;i++){
child = octree.getNeighbor(nn, i, MethodFlag.NONE);
if(child==null)
break;
RecurseForce(leaf, child, dsq, epssq);
}
```

The other change made was to encapsulate all the coordinate referencing code into a point class. So, there are four variants :

1. Original (without point3, and with explicit loop unrolling)
2. Point3 ( with point3 class, with explicit loop unrolling)
3. For (without point3 class, without explicit loop unrolling)
4. Naive (with point3 class, without explicit loop unrolling)

As a developer, you would write the Naive version. So, comparing the performance of the four variants, here is what we get.

BH Runtimes.

After reading a paper by Maurice Herlihy on transactional boosting, i wrote down the example mentioned in the paper.  The concept is quite simple, given a concurrent data structure and some “regularity” conditions (commutativity and inverses) you can create a high-level transactional data structure. The advantage is that instead of having crude read/write sets to do conflict checking, you have higher level “abstract locks” that can do the conflict detection/checking at the method invocation level.

You should go over the paper and the code to see what this means, but it should be easy to understand with the working example. The higher level idea is instead of saving a whole snapshot of the object and restoring it in case of an abort, you just specify “what” needs to done (hence the undo action) in case of an abort. So in the case of an insert, the appropriate undo is a remove.

It is quite interesting to see how automated tools can speed up development, not only in the context of professional development but also when you just want to do stuff ( or for assignments :() I recently took a graduate compilers course, which was quite a lot of work for me, but it was actually because i didnt know the right tools to provide the right abstractions. So, now i will try to explain how to use CUP/JFlex to get a quick and clean compiler up and running.

# Lexer

The job of a lexer is to read the input files as a stream of characters and produce a stream of tokens. Each token has an associated value which provides a higher level function. So, just as an example, instead of having the later stages deal with 33+43 as a stream of characters, the lexer would say that this is a stream of 3 tokens: INT1 + INT2 , where INT1 has value 33 and INT2 has value 43. JFlex is a tool to generate a lexer using a standard set of rules, and it makes the job really easy. You can find more information about JFlex from http://www.jflex.de. There are 3 pars of a JFlex file:

1. User code: This is the code you want to be copied as such in your Lexer java file. Generally this would include the imports and packages you would need in your lexer code.
2. Options and declarations: The user code section is ended with %% on a line. After that, you can specify options for the JFLex that determine how the lexer code is generated. Generally you would have the following set of options:
• %cup directs JFLex to generate code that would be used by CUP parser (more on this later).
• %class ClassName directs JFlex to name the generated lexer as ClassName
• %line enables you to access the line of the token via yyline
• %column is similar to %line, and you can access the column via yycolumn
• %unicode specifies the character set of the source file, generally, unicode is used.
• %debug is an optional flag that is very useful if your parser (not you lexer, but even for a lexer) is failing and you cannot tell which token it fails on. It will generate debug information about each token as it is passed on to the parser so you can tell exactly what the problem is.
3. And following these is optional user code. Here you would include code that you would like to include in the class definition of the lexer. So, any data members you would like to use the lexer to maintain would be declared here. You can also have member functions. One of the most common things to do here is to include
```%{
private Symbol symbol(int type) {
return new Symbol(type, yyline, yycolumn);
}
private Symbol symbol(int type, Object value) {
return new Symbol(type, yyline, yycolumn, value);
}
%}
```

The two functions above are used in the lexer code to return symbols of the appropriate type and value. These would return instances of java_cup.runtime.Symbol when invoked by CUP based on the type of the symbol and any associated values with it. Having these in the declaration sections helps prevent editing the generated lexer since it would be ready to use once generated. All the code in this section is to be included in %{ and %} appearing on separate lines by themselves.
Following this is the macro declaration section, where you would define macros for terminals. These are regular expressions which would be used in the actual lexical rules later. The reason for having separate macros is to make the lexical rules more readable. These cannot be recursive (nor mutually recursive) and would be reported as errors by JFlex. A macro is specified by a name for the macro, followed by a = sign, followed by the regular expression the macro name should replace. For instance :

```LineTerminator = \r|\n|\r\n
InputCharacter = [^\r\n]
WhiteSpace     = {LineTerminator} | [ \t\f]
Identifier = [:jletter:] [:jletterdigit:]*
```

Represents some macros for the different white spaces. You can make use of predefined classes to make the macros shorter. For instance, a common definition for identifiers is give above as well. This definition makes use of predefined classes jletter and jletterdigit.
The final part of this section is used to define the state variables which represent the state of the lexer should maintain. This can be any number of state variables separated by comma. The definition starts with %state followed by any number of comma separated state variables. So a common example would be to have a STRING state, which would be set upon entry to a double quoted region, and cleared upon exit. This way, the contents inside the string are not parsed as actual tokens, instead treated as a string.
3. Lexical rules.
This is the core of the lexer, since all we have done so far is to set up how the lexer actually does stuff, and how we want it to facilitate other tools (CUP). Now we will define the actual rules used to convert the stream of characters to a stream of tokens. There are two types of lexical rules; those that require the lexer to be in a particular state, and those that don’t.
• First, let us see how we can toggle between states. This is done by calling the function yybegin(STATE). So, when reading a double quote, to enter into STRING state, you would have a rule as follows:

```\" { string.setLength(0); yybegin(STRING); }
[\sourcecode]
•	For rules that require the lexer to be in a particular state, you would have that state precede the rule. So, if you just want to concatenate all the characters you read into a buffer once you are in STRING state (remember, upon entering a region enclosed by double quotes we would enter STRING state). The rule would be

{
\"  { yybegin(YYINITIAL);
return symbol(sym.STRING_LITERAL,
string.toString());    }
[^\n\r\"\\]+ { string.append( yytext() ); }
\\t { string.append('\t'); }
\\n { string.append('\n'); }
\\r { string.append('\r'); }
\\\"{ string.append('\"'); }
\\  { string.append('\\'); }
}
```

• For rules that should be applied regardless of the state of the lexer, you need not specify an initial state.
Initially the lexer is in YYINITIAL state, and all the rules that should be applied by the lexer are included in a block.
One thing that you might like to do is to store values for different tokens as they are packaged as a single token by the lexer. This is useful for instance in a parser where you would like some information about the actual token text. To do this you can pair the symbol identifier (type) with the value of the text. For instance, in the case of an identifier, the rule would be

```{Identifier} {return symbol(sym.IDENTIFIER, new String(yytext())); }
```

And in your parser (more on this later) you would access the “value” of the token.
That was how you would basically put up a lexer.
Now that we have a lexer up and running, we can move forward to generate a parser that uses the tokens generated by the lexer to verify if the sentence is indeed from a grammar of expression. Keep in mind that there would be two regular expressions here, one for defining the tokens themselves, and another that describes how the tokens can be combined to produce larger constructs.

# CUP

CUP is a recursive descent parser generator. It can be integrated with JFlex very easily as described above. Now we will take a look at how to specify the grammar to CUP in a nice way. A CUP specification has four parts.

1. Preliminary and miscellaneous declarations specifying how parser is to be generated, and supply parts of runtime code. This includes import and package statements, initialization code as well as linking up with the scanner. When using with JFlex, you do not need to specify the initialization code, or scanner code since the %cup flag in flex file takes care of that.
2. Declaring terminals and non-terminals and associate classes with them.Here you would declare the terminals and non terminals to be used in your grammar. The terminals are defined first, followed by the non-terminals. For each terminal/non-terminal you can also define the type of object/value associated with that terminal/non-terminal. This type would determine what return value the parser would expect for each of these terminals/non-terminals.
3. Precedence and associativity of terminals. Here you would define the precedence and associativity of the tokens (for instance operators). One key thing to do here is to add an associativity rule for “ELSE” tokens that handles the dangling else.
4. Grammar. This is the core of the parser since it specifies how the non-terminals are expressed in terms of other non-terminals or terminals. You would also place semantic actions (such as building the AST) in this section.

Example:
Now we shall consider an extended example of how to write the CUP specification for a parser. As described above, the first part is trivial, and you would generally include the following statements in it.

```import java_cup.runtime.*;
import compiler.ast.*;
import java.util.*;
```

This list would include all the packages you would use in your parser, as well as the package statement if desired. Now in the second section, you would describe the terminals and non-terminals. For instance:

```terminal int  STRUCT, CONST, ELSE, WHILE, IF, ABSTRACT, BOOLEAN;
terminal int  BREAK, NEQ, EQ, EQEQ, PLUS, MINUS, ASTERICK, MOD;
terminal int  FSLASH, BSLASH, LESS, LESSEQ, GREATER, GREATEREQ;
terminal int  LPAREN, RPAREN, LBRACE, RBRACE,  LSQPAREN, RSQPAREN;
terminal int  COMMA, SEMICOLON, STRING_LITERAL, VOID ;
terminal int  SYM_INT, INTEGER_LITERAL, SYM_LONG;
terminal String IDENTIFIER;

/*Non-terminals*/
non terminal  Node program, procedure_declaration;
non terminal  Type type;
non terminal  Node procedure_heading, procedure_body, statement_sequence;
non terminal  Node identifier, formal_parameter, formal_parameter_section;
non terminal  Node statement, assignment_statement, procedure_call, if_statement;
non terminal  Node while_statement, expression, expression_list, designator;
non terminal  Node constant_expression, constant_declaration;
non terminal  Node structure_type, field_list;
non terminal  Node equality_expression, simple_expression, term, factor, dummystart;
non terminal  Node comparator_operator, factor_operator, else_statement, expression_listS;
non terminal String identifier_arrayS, identifier_array;

```

Here, you would describe the type of the node (terminal or non terminal) and then the class associated with that node. This class is used by CUP to return appropriate objects for each node. In my opinion, it is easier to first build an acceptor (leaving the classes empty) and then add actions. This way you can be sure that the grammar is correct, and can add actions to it later (which would do something with the program).
Now that you have the terminals and non-terminals, you would specify any precedence/associativity rules for the grammar. This is done as follows;

```precedence left PLUS, MINUS;
precedence left ASTERICK, FSLASH, MOD;
precedence right ELSE;
```

The last precedence rule is required to handle the dangling else problem.
Now we come to the core of the parser specification. This part has the grammar in the form of productions. Each production (or rule) can have an action associated with it. This section starts with an optional “start with” statement directing CUP which is the start non-terminal. If you want the top level non-terminal to be PROGRAM_NODE, you would have a statement start with PROGRAM_NODE;. The basic form of the rule is as follows:
Non-terminal ::= some_node some_node | some_node some_node ;
So, for instance, for a program you would have something like

```program::= var_const_declarationS procedure_declarationS;
var_const_declarationS::= constant_declarationS variable_declarationS;
procedure_declarationS::=
|
procedure_declarationS procedure_declaration;
```

The last rule for procedure_declarationS shows an example of an epsilon production, where the non-terminal goes to a null node. To extend this example so we have some actions executed by the parser when it actually applies the reduction (in other words, when it successfully applies a rule) we can associate semantic actions with the rules, one with each rule. So the above grammar now becomes:

```program::= var_const_declarationS procedure_declarationS  {: System.out.println("Program"); :};
var_const_declarationS::=constant_declarationS variable_declarationS {: System.out.println("Const-Var Decls"); :};
procedure_declarationS::= {: System.out.println(“End of ProcedureList”); :}
|procedure_declarationS procedure_declaration {: System.out.println("ProcedureList"); :};
```

This would associate java code snippets specified between the {: :} that is executed when the rule is applied by the parser. So, good, but when you actually want to do something useful with the semantic actions (instead of just printing out which rule was applied) you need to refer to the terminals and non-terminals in the rule. Suppose you wanted to create a list of all the variables declared in a given type declaration, so for some statement like
int a, b, c[100];
You would like to associate a, b, and c with integer types. To do this you would keep a list of all the values, represented by an identifier_list. The (incomplete) rules for such a production would be as follows:

```variable_declarationS::=
|
variable_declarationS variable_declaration {: System.out.println("VarsList"); :};
variable_declaration ::=
type identifier_list SEMICOLON {: System.out.println ("VarDecl"); :}  ;
constant_declarationS ::=
|
constant_declarationS constant_declaration {::};
identifier_array ::= IDENTIFIER identifier_arrayS {:System.out.println("Array/ID");:};
identifier_arrayS ::=
|  LSQPAREN constant_expression RSQPAREN {:System.out.println("Index");:};
identifier_list::=
identifier_array identifier_listS {::};
identifier_listS ::=
|  COMMA identifier_list {: System.out.println ("ID-Arry"); :}   ;

type ::=      SYM_INT|SYM_LONG {: System.out.println("Type"); :};
constant_declaration ::=
CONST type IDENTIFIER EQ constant_expression SEMICOLON {: System.out.println ("ConstDecl"); :}   ;
```

The rule of interest, that would illustrate the problem is
identifier_array ::= IDENTIFIER identifier_arrayS {:System.out.println(“Array/ID”);:};
Since you are creating an array of identifiers, you would want to store the identifier in a data structure as well as the remaining portion of the sentence in that data structure. This would mean you would need some way to reference the terminals/non-terminals in the production to build up a resultant value; need to access the values of IDENTIFIER and identifier_arrayS to compute the values of identifier_array. The problem is more subtle when you have a recursive rule such as
constant_declarationS ::=
|
constant_declarationS constant_declaration {::};
where you have the same non-terminal nodes on both sides of the production, so there has to be some way to distinguish between the non-terminal on the left and on the right. To do this, we can associate names with the terminals and non-terminals in the right hand side of the production. The non-terminal on the left has the special name RESULT, and this is the value returned by the semantic action. So, now all our rules would have the form

```non_terminal_1::=
{:
RESULT = new SomeObject();
:}
|
non_terminal_2:n_t_2 non_terminal_3:n_t_3
{:
RESULT = new SomeObject(n_t_2.getValue(), n_t_3.getValue());
:};
```

It should be noted that the type of RESULT was defined in the second section (where you also defined the terminal and non-terminals along with their class types). You do not need to add return statement since these are automatically inserted by the parser. You can refer to the objects on the right hand sides as if they were actual objects in Java, so in the above example, getValue are methods of the associated classes. The variable declaration example can now be rewritten as follows :

```variable_declarationS::=
{:
System.out.println("End VarsList");
:}
|
variable_declarationS:lst variable_declaration:v_decl
{:
System.out.println("VarsList");
:};

variable_declaration ::=
type:t_id identifier_list:id_l SEMICOLON
{:
VarDecl n;
Iterator li = id_l.iterator();
while(li.hasNext())
{
n = new VarDecl(t_id, (String)(li.next()));
}
RESULT = r;
System.out.println ("#############VarDecl "+id_l);
:};
constant_declarationS ::=
|
constant_declarationS constant_declaration {::};

identifier_array ::= IDENTIFIER:id identifier_arrayS:id_a
{:
RESULT =  id + id_a;
System.out.println("Array/ID"+id+" "+id_a);
:};
identifier_arrayS ::=
{:
RESULT= new String();
System.out.println("_E_");
:}
|  LSQPAREN constant_expression:c_e RSQPAREN
{:
RESULT = "[EXP]";
System.out.println("Index");
:};
identifier_list::=
identifier_array:id_a identifier_listS:id_l
{:
RESULT = id_l;
System.out.println("IDL");
:};
identifier_listS ::=
{:
System.out.println("EndIDList");
:}
|  COMMA identifier_list:id_l
{:
RESULT = id_l;
System.out.println ("ID-Arry");
:};
```

Using the above notation, you can basically build up a whole parser. The approach I think makes more sense is to first write an acceptor and then annotate the productions with semantic actions. This makes the job easier since you know you have the right set of rules before you get into the semantic actions.
To test your build, you can have a script do the following :

``` java -jar ../jflex/lib/JFlex.jar Lexer.flex
java -jar ../cup/java-cup-11a.jar Parser.cup
javac -cp ../cup/java-cup-11a.jar:../jflex/lib/JFlex.jar:. Tester.java
for d in \$(find  examples/ -name *.c |sort); do
echo "Processing " \$d
java -cp ../cup/java-cup-11a.jar:../jflex/lib/JFlex.jar:. Tester \$d 2>&1
done
```

Where Tester.java is a simple java program that calls the parser and its parse method.
So, that’s all for now, looking forward to your comments on where I can add things to make it more clearer.

Well… thanks to everyone who did show up, it meant a lot to me. And of course, those who did not show up, it meant a lot (monetarily) to me as well 😀

Special thanks to JJ and FK for a lot of help. These two lads put up a lot of great effort. JJ, thanks for all the help in organizing, but then again, I know you were trained by the best in ITEC 😉 And to FK, for that lovely dinner and the 1200 ruppees you didn’t ask for 🙂

Missed a lot of you guys there, and hope to have a real get together some day… take care y’all.

Especially for those mathematicians living in Warsaw 🙂

Sure, you could get a lot of stuff signed by celebrities and put it on ebay, but THIS goes on your resume!

So, here is a first-program in MPI which i did about 6 months ago in Prof. Calvin Lin’s Parallel systems class.

MPI is a communication protocol for multiprocessors. The way it works, in short, is that there is a root process (kind of like the root thread in a multi threaded application). The difference in MPI is all the processes exist simultaneously (just like physical processors would already exist instead of being spawned). The root process would divide up the work to do, and send it to all the other processors via messages, hence message passing.

Each process can send messages and receive messages, but there has to be more control and discipline in message passing and sending. Its more trickier to get the message passing in the right order and structure.  But, hands-on, here is a basic example.

This is what a basic application should be structured like. The main would initialize MPI (5-9), and then get the total number of processes participating in the computation as well as the current process number. This is where it differs from multi-threading because unlike in multithreading where you spawn threads and tell what function they would execute, all MPI processes have to execute the same code. So you need the process ID to “divide” the work. In this case, line 15 does the initialization and division of work and line 17 would do the actual work. So if you were to sum an array, the initialization would read in the array from file, and the line 17 would invoke the worker thread which would sum the data in its allocated data. The initialization can be done by any process, here i have used MY_ROOT_PROCESS to represent the root process.

```int main(int argc, char** argv)
{
int mynode, totalnodes;
int mpiResCode = MPI_Init(&argc, &argv);
if(mpiResCode!=MPI_SUCCESS)
{
printf("\nFailed to initialize MPI.");
MPI_Abort(MPI_COMM_WORLD,mpiResCode);
}
MPI_Comm_size(MPI_COMM_WORLD, &totalnodes);
MPI_Comm_rank(MPI_COMM_WORLD, &mynode);
for(int i=0;i<100;i++)
{
if(mynode==MY_ROOT_PROCESS)
doInitialize();
else
doWork();
}
MPI_Finalize();
return 0;
}
```

The trick is getting the messages right. For instance, in the worker process which is waiting for work to be allocated to it, you would have something like follows;

```MPI_Recv(&localDataSize,1,MPI_INT,MY_ROOT_PROCESS,MY_SIGNAL_CONTROL,MPI_COMM_WORLD,&status);
localbuff = new int[localDataSize];
MPI_Recv(localbuff,localDataSize,MPI_INT,MY_ROOT_PROCESS,MY_SIGNAL_CONTROL,MPI_COMM_WORLD,&status);
...
...
...
MPI_Send(&numVals,1,MPI_INT,i+1,MY_SIGNAL_CONTROL,MPI_COMM_WORLD);
MPI_Send(globalData+startIndex,numVals,MPI_INT,i+1,MY_SIGNAL_CONTROL,MPI_COMM_WORLD);

```

and in the root process, which is responsible for sending out work, you would have a similar sending commands. Notice that the sequence must match, since the message passing is stateless, and maintains no information between communications.
The first three lines are in the recipient of the work. It would, in this example, first receive a size of the work it should expect, it then allocates a buffer for that size and then it receives the work. In the sender (root process in our example), the sender first sends the size, and then all the data. After the data transfer has taken place, it can work on the data. But once the data processing has been completed, similar transfer has to take place for the results to be combined. This would be easier to do naively, but if you were to use a tree-structured reduce operation, this becomes just a bit trickier.

I was working on finishing a Parallel systems assignment in which we had to implement a prefix framework. The nice thing about the framework is that if you have N threads (or processes) working on a problem in parallel, then a sequential approach would require N stages to combine the result. But using the prefix framework, we can pair up the results and combine the result in logN stages.

I had worked in Java Threads, and found them quite interesting particularly because you just needed a minimal knowledge of threads to start working with them. pthreads, however, were slightly …un-object like, hence it made a bit of a mess to get an object oriented behaviour from them.

The core of the tree combination is the combineTree function, which is a function of the thread instance (i made up classes to represent different aspects of the problem). The function is as follows;

```void LocalThreadContext::combineTree()
{
int * rs;
{
if((sequenceID%(2*stride))==0)
{
g->combineFunction(&lRes,g->children[sequenceID+stride].getLocalResult());
///Display logic..... just something to make it look like a tree.
#if DEBUG_CONSOLE
printf("\n");
for(int j=1;j<=stride;j*=2)
printf("\t");
printf("Combining %d with %d",sequenceID, sequenceID+stride);
#endif
/////End of display logic!
}
}
}
```

The DEBUG_CONSOLE is something you might use if you are debugging and would like a nice tree-like representation of the combining-logic.

It was fun, but the problem of expressing arbitrary problems in such a framework remains something i would need to work on (if i get some time, which i won’t!).

The threads start off by locking on the result variable, so that no thread attempts to combine its result before the result has been computed. The thread would be given a global thread context, which would define how the thread is to use the data provided to it (int array or double array etc) and how to combine its result with the result of another thread (sum problem has different behavior compared to two_max problem). The thread would perform the function on each of the data elements it is provided and then combine the result using the function provided. The complete code is available here

Wonder if this would work to solve the discrete log problem?

PS: you have to make him talk in polynomial time!

I recently completed reading Simon Singh’s “Fermat’s Last Theorem” and must admire how Simon has managed to trace the historical breakthroughs as well as breakdowns in this battle to solve one of the most innocent yet unsolved riddles in mathematics. Fermats last theorem seems very naive, basically it states that the Pythagoras Theorem is true but if we change the power from 2 to any other whole number, no whole number solutions exist.
Pierre de Fermat proposed this in one of the texts “Diophantus”, but he also wrote that the margins were too narrow for the proof although he had a wonderful proof. It took almost three centuries to prove…but theres more.
A definite read for the curious mind, traces the history of mathematics, especially number theory.

Since my web hosting provider has decided to extend my sites redemption period for eternity, i thought i would make the best of my time and start updating my blog for once.

So, just to start things off, some news from london as well as highlights from the trip to there.

First of all the flight was … uneventful. The guy sitting next to me was the “SILENT TYPE”. My only bad luck during the flight was that the earphone jack next to my seat was broken and the earphone would not completely go in. Now if you have travelled in a 757-300,the type they fly to europe, you would know how the earphones work. They are not stereo jacks rather two mono jacks connected by a plastic frame, it just happened in my case that one of the mono jacks would not go in, so i had to use my own stereo jack and enjoy mono channel sound… on a freakin 757-300. And to add to the irony of it, the captain announces at the begining of the flight “This is the biggest production aircraft, and designed  to travel in weather far worse than this so dont worry”… hello captain… ever  heard of the Airbus A380

During the flight i watched Die Hard 4,which for some reason had all the cool scenes cut out.Apparently for some reason, they decided to leave out all the cool scenes under the understanding that “this movie has been modified to fit your screen and intellect”. But a strong recommendation:  DO NOT WATCH AN ACTION MOVIE YOU HAVE NOT WATCHED BEFORE”. It would really be awful to know the whole plot and miss the shots that cost half of the budget!

In london, the weather isnt as cold as i expected it to be, but the air is chilly, kind of stingy in a sense.

And that is as eventful as my two days in london have been!