Robert Jan - Wednesday, November 05, 2008 - 9:23 AM

Update: Something went wrong with the code snippets; should be shown correctly now!

Now we have MGrammar mode correctly running in Intellipad, let’s try out some stuff.

Let’s create a language that understands textual representation of the title, location, URL and email address of an RSS Feed. I also want the language to skip whitespace and comments, and the email address should be validated.

So as sample instance data I wrote this:


 

   1: Title: inwit.nl
   2: Url: http://inwit.nl
   3: RssFeedUrl: http://feeds.feedburner.com/inwitnl
   4: Email: rj@vanholland.net
   5:  
   6: //this is comment
   7: /*
   8: this is also comment
   9: 
  10: */
  11:  
  12: Title: IntellipadBlog
  13: Url: http://blogs.msdn.com/intellipad
  14: RssFeedUrl: http://blogs.msdn.com/intellipad/rss.xml
  15: Email: oslo@microsoft.com

As you can see, just two instances of a Feed type with some comments and whitespacing in there.
Now, let’s write a language that swallows this data. What I in fact did was create three languages:

  • A common language with some stuff you’d want to use more often; perhaps the Email Language should be moved here also.
  • The language that understands an Email Address
  • The actual RSS language

The Common Language
This language should cover the part of understanding white spacing and comments; after having looked at the demo done at the PDC and after having looked around in the “C:\Program Files\Microsoft Oslo SDK 1.0\Samples\MGrammar\Languages” directory of  your SDK installation I came up with this:

   1: language InwitCommon
   2: {
   3:  
   4:     token Skippable = Whitespace | Comment;
   5:     
   6:     token Comment = CommentToken;
   7:     token CommentToken 
   8:         = CommentDelimited
   9:         | CommentLine;
  10:     token CommentDelimited = "/*" CommentDelimitedContent* "*/";
  11:     token CommentDelimitedContent = 
  12:         ^('*')
  13:         | '*'  ^('/');
  14:     token CommentLine = "//" CommentLineContent*;
  15:             token CommentLineContent = ^(
  16:              '\u000A' // New Line
  17:           |  '\u000D' // Carriage Return
  18:           |  '\u0085' // Next Line
  19:           |  '\u2028' // Line Separator
  20:           |  '\u2029'); // Paragraph Separator
  21:           
  22:           
  23:           
  24:    token Whitespace = WhitespaceToken+;
  25:    token WhitespaceToken = WhitespaceCharacter+;
  26:             token WhitespaceCharacter 
  27:         = '\u0009'   // Horizontal Tab
  28:         | '\u000B' // Vertical Tab
  29:         | '\u000C' // Form Feed
  30:         | '\u0020' // Space
  31:         | NewLineCharacter;
  32:         
  33:    token NewLineCharacter 
  34:         = '\u000A' // New Line
  35:         | '\u000D' // Carriage Return
  36:         | '\u0085' // Next Line
  37:         | '\u2028' // Line Separator
  38:         | '\u2029'; // Paragraph Separator
  39: }
Now, in your language you can use the Skippable token from this language to set as an interleave; this will let your language skip whitespacing and comments.

The Email Language
I wanted to have some sort of Email address validation within my language. So I came up with this:

   1: language EmailAddressLanguage
   2:     {
   3:         token EmailAddress = 
   4:         localpart
   5:         at
   6:         domainpart;
   7:            
   8:         token abzABZ = ('A'..'Z' | 'a'..'z')+;
   9:         token digits = ('0'..'9')+;
  10:         token otherChars = ('!' | '#' | '$' | '%' | '&' | "'" | '*' | '+' | '-' | '/' | '=' | '?' | '^' | '_' | '`' | '{' | '|' | '}' | '~')+; 
  11:         token allButDot = (abzABZ | digits | otherChars)+;
  12:         token all = (allButDot | dot)+;
  13:         token dot = ('.')#1;
  14:         
  15:         token localpart = 
  16:         (allButDot)+ | 
  17:         allButDot dot all* allButDot+;
  18:         
  19:         token at = "@";
  20:         
  21:         token domainpart = 
  22:         (allButDot)+ dot all* allButDot+;
  23:     }

It’s far from being perfect! It validates email addresses but in some cases doesn’t work correctly yet:
You can have an email address like “bla..bla@hotmail..com” and it will validate. I haven’t looked much deeper in it yet, because this was just a small test but if someone feels like improving this part, please do so and post a comment with your solution!

The RSS Language
Then, I wrote the RSS language itself, which looks like this:

   1: language RssLanguage
   2:     {
   3:         syntax Main = f:Feeds => f;
   4:         
   5:         syntax Feeds = Feed*;
   6:         
   7:         syntax Feed = 
   8:         "Title" ":" t:Title 
   9:         "Url" ":" u:Url
  10:         "RssFeedUrl" ":" r:RssFeedUrl
  11:         "Email" ":" e:EmailAddressLanguage.EmailAddress
  12:         =>
  13:         Feed{
  14:             Title{t},
  15:             Url{u},
  16:             RSS{r},
  17:             Email{e}
  18:             };
  19:         
  20:         @{Classification["Keyword"]} token Title = ('A'..'Z' | 'a'..'z' | '.')+;
  21:         
  22:         token Url = "http://" ('A'..'Z' | 'a'..'z' | '.' | '/')+;
  23:         
  24:         token RssFeedUrl = Url;
  25:         
  26:         
  27:         interleave WhiteSpacing = " " | "\r" | "\n";
  28:         interleave Skippable = InwitCommon.Skippable;
  29:     }

It defined that the Main is a sequence called ‘Feeds’ which contains items of the type Feed. An input Feed will consist of a Title, Url, RssFeedUrl and Email and will be shaped to a Feed with a Title, Url, RSS and Email element.
You can see that I use the EmailAddressLanguage and the InwitCommon language within this language.

Full Listing
To simplify, here is the full listing in one module:

   1: module inwit
   2: {
   3:     language RssLanguage
   4:     {
   5:         syntax Main = f:Feeds => f;
   6:         
   7:         syntax Feeds = Feed*;
   8:         
   9:         syntax Feed = 
  10:         "Title" ":" t:Title 
  11:         "Url" ":" u:Url
  12:         "RssFeedUrl" ":" r:RssFeedUrl
  13:         "Email" ":" e:EmailAddressLanguage.EmailAddress
  14:         =>
  15:         Feed{
  16:             Title{t},
  17:             Url{u},
  18:             RSS{r},
  19:             Email{e}
  20:             };
  21:         
  22:         @{Classification["Keyword"]} token Title = ('A'..'Z' | 'a'..'z' | '.')+;
  23:         
  24:         token Url = "http://" ('A'..'Z' | 'a'..'z' | '.' | '/')+;
  25:         
  26:         token RssFeedUrl = Url;
  27:         
  28:         
  29:         interleave WhiteSpacing = " " | "\r" | "\n";
  30:         interleave Skippable = InwitCommon.Skippable;
  31:     }
  32:     
  33:     language EmailAddressLanguage
  34:     {
  35:         token EmailAddress = 
  36:         localpart
  37:         at
  38:         domainpart;
  39:            
  40:         token abzABZ = ('A'..'Z' | 'a'..'z')+;
  41:         token digits = ('0'..'9')+;
  42:         token otherChars = ('!' | '#' | '$' | '%' | '&' | "'" | '*' | '+' | '-' | '/' | '=' | '?' | '^' | '_' | '`' | '{' | '|' | '}' | '~')+; 
  43:         token allButDot = (abzABZ | digits | otherChars)+;
  44:         token all = (allButDot | dot)+;
  45:         token dot = ('.')#1;
  46:         
  47:         token localpart = 
  48:         (allButDot)+ | 
  49:         allButDot dot all* allButDot+;
  50:         
  51:         token at = "@";
  52:         
  53:         token domainpart = 
  54:         (allButDot)+ dot all* allButDot+;
  55:     }
  56:     
  57:     language InwitCommon
  58:     {
  59:     
  60:         token Skippable = Whitespace | Comment;
  61:         
  62:         token Comment = CommentToken;
  63:         token CommentToken 
  64:             = CommentDelimited
  65:             | CommentLine;
  66:         token CommentDelimited = "/*" CommentDelimitedContent* "*/";
  67:         token CommentDelimitedContent = 
  68:             ^('*')
  69:             | '*'  ^('/');
  70:         token CommentLine = "//" CommentLineContent*;
  71:                 token CommentLineContent = ^(
  72:                  '\u000A' // New Line
  73:               |  '\u000D' // Carriage Return
  74:               |  '\u0085' // Next Line
  75:               |  '\u2028' // Line Separator
  76:               |  '\u2029'); // Paragraph Separator
  77:               
  78:               
  79:               
  80:        token Whitespace = WhitespaceToken+;
  81:        token WhitespaceToken = WhitespaceCharacter+;
  82:                 token WhitespaceCharacter 
  83:             = '\u0009'   // Horizontal Tab
  84:             | '\u000B' // Vertical Tab
  85:             | '\u000C' // Form Feed
  86:             | '\u0020' // Space
  87:             | NewLineCharacter;
  88:             
  89:        token NewLineCharacter 
  90:             = '\u000A' // New Line
  91:             | '\u000D' // Carriage Return
  92:             | '\u0085' // Next Line
  93:             | '\u2028' // Line Separator
  94:             | '\u2029'; // Paragraph Separator
  95:     }
  96: }

And this is what it looks like when writing it within Intellipad:

lang

 

Language Compilation

Next step, is to compile the module ‘RSSLanguage.mg’ I just created; we use the mg.exe compiler provided by the Oslo SDK to do this:

mg
We get an .MGX file out of this. When renamed to a file with a .ZIP extension, I tried to open this file but it’s password protected. Anyone knows the secret password? :)

 

Run-time Language utilization

Last but not least I’d like to use my language within the .NET runtime. Luckily, the Oslo SDK provides us some base classes to do this. I created a new C# Console Application to test test things out.
First add references to the System.Dataflow and Microsoft.M.Grammar assemblies which can be found within the Bin directory of the Oslo SDK.:

image

Then, I wrote this code:

   1: using System;
   2: using System.Collections.Generic;
   3: using System.Linq;
   4: using System.Text;
   5: using System.Dataflow; // DynamicParser, GraphBuilder
   6: using Microsoft.M.Grammar; // MGrammarCompiler
   7:  
   8: namespace ConsoleApplication
   9: {
  10:     class Program
  11:     {
  12:         static void Main(string[] args)
  13:         {
  14:             try
  15:             {
  16:                 string imageFileName = @"C:\Users\Robert Jan\Desktop\My Documents\Oslo\MyOslo\ConsoleApplication\RssLanguage.mgx";
  17:                 string inputFileName = @"C:\Users\Robert Jan\Desktop\My Documents\Oslo\MyOslo\ConsoleApplication\FeedsInput.m";
  18:                 //inwit == module name
  19:                 //RssLanguage == language name
  20:                 string parserName = "inwit.RssLanguage";
  21:                 
  22:                 DynamicParser parser = MGrammarCompiler.LoadParserFromMgx(imageFileName, parserName);
  23:  
  24:                 object output = parser.ParseObject(inputFileName, ErrorReporter.Standard);
  25:  
  26:                 Helper.WalkMGraphTree(output);
  27:  
  28:             }
  29:             catch (Exception e)
  30:             {
  31:                 Console.WriteLine(e.Message);
  32:             }
  33:             Console.ReadLine();
  34:         }
  35:     }
  36: }

First, I Create a DynamicParser instance, and provide it with the compiled language image file (the .MGX file) and with the parserName. The parser name is the name of the module and the name of the language concatenated.

I then parse the input file using the ParseObject method, and we will get the result.

I wrote a nice Helper function that walks the result tree, and outputs its contents to the Console. Feel free to use it yourself (after giving me a comment here of course :)).

   1: using System;
   2: using System.Collections.Generic;
   3: using System.Linq;
   4: using System.Text;
   5: using System.Dataflow;
   6:  
   7: namespace ConsoleApplication
   8: {
   9:     class Helper
  10:     {
  11:  
  12:         public static void WalkMGraphTree(object rootNode)
  13:         {
  14:             IGraphBuilder builder = new GraphBuilder();
  15:             WalkNode(rootNode, builder);
  16:  
  17:         }
  18:         private static void WalkNode(object node, IGraphBuilder builder)
  19:         {
  20:             if (node.GetType().Name == "SequenceNode")
  21:             {
  22:                 foreach (object sequenceElement in builder.GetSequenceElements(node))
  23:                 {
  24:                     
  25:                     WalkNode(sequenceElement, builder);
  26:                 }
  27:                 Console.WriteLine();
  28:             }
  29:             else if (node.GetType().Name == "SimpleNode")
  30:             {
  31:                 Identifier id = builder.GetLabel(node) as Identifier;
  32:                 WriteLine(id.Text,false);
  33:                 foreach (object successorElement in builder.GetSuccessors(node))
  34:                 {
  35:                     WalkNode(successorElement, builder);
  36:                 }
  37:                 Console.WriteLine();
  38:             }
  39:             else
  40:             {
  41:                 WriteLine(Convert.ToString(node),true);
  42:             }
  43:         }
  44:  
  45:         private static void WriteLine(string line, bool newline)
  46:         {
  47:             Console.Write(line + " ");
  48:             if (newline)
  49:             {
  50:                 Console.Write(Environment.NewLine);
  51:             }
  52:         }
  53:         
  54:     }
  55: }

Now when I run the Console App, the output looks like this:

image

 

Summary

Here’s the summary of the steps I took, and the end result accomplished:

  • First, we created our languages; we separated some functionalities in separate languages and used these within  the RssLanguage
  • We created some input data and tested the languages combined with the input data within Intellipad
  • We compiled the languages with MG.exe into an .MGX image file.
  • We created a .NET applications which loads the image file and parses the input data through the language.
  • We created a Helper method which walks the result graph tree, and shows us the result within our Console.

Valuable links
Steef-Jan gave some pretty good links last Monday, I’d like to highlight one of those and give you two others:

Go and read what Martin Fowler has to say about Oslo and also check out what MSDN has to say about MGrammar:


Posted in: MGrammar , Oslo  Tags:

Currently rated 5.0 by 2 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Comments

Page List

    Calendar

    «  March 2010  »
    MoTuWeThFrSaSu
    22232425262728
    1234567
    891011121314
    15161718192021
    22232425262728
    2930311234
    View posts in large calendar

    Recent Comments

    Feedburner Statistics 3/8/2010
    29 Readers ~ 78 hits ~ 1 reach

    Disclaimer
    The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

    © Copyright 2010 Inwit.nl