Flex


Click here to change the theme.

Flex is a tool that generates C source code (compatible with the C++ compiler) that are lexical analyzers (scanners). It recognizes syntax, whereas another program, Bison, can be used to generate the semantic analysis portion. When both are used, a lot of work required for writing a compiler is not necessary. Flex can be used for many other purposes as well, not just for compilers. Often we don't need the semantic analysis portion and Flex is enough. Regardless of whether Bison is used, it is worthwhile to learn about Flex. Flex can be used to do things that might be difficult or impossible to do with Regular Expressions. There is an option to generate C++ code from Flex but it is not clear how good that code is.

There is a lot of material about Flex and Bison available, including books. This page is intended to help with use of the Windows version with Visual Studio. The simple sample I have here works for me using win_flex.exe (Windows Flex) version 2.6.3.

Flex ("Fast Lex") is a free version of Lex. Bison is a free version of YACC ("Yet Another Compiler Compiler"). Lex (and especially) YACC are classic tools with many years of use.

The Windows version of Flex and Bison are available from Win flex-bison download | SourceForge.net. Download and unzip using the "Download" button. Once unzipped where you want them to be, see Win flex-bison / Wiki / Visual Studio custom build rules. That will save you from many details of getting Flex and Bison set up and working. You might need to sign out of Windows and sign in again. I was getting an error such as "Windows cannot find Winflex.exe" until I signed out and in. After that, when you create a project you will need to add the Custom Build Rules to the project using "Build Dependencies" | "Build Customizations" for the project and checking the checkbox for the rules. Note that Flex and Bison are moving to GitHub but I don't know where the special Windows versions are in GitHub.

If you have any Flex and/or Bison files in existing projects prior to installing the Custom Build Rules then add the rules to the project then modify the properties of the file(s). Change the properties of the files to set the "Item Type" to "Flex files" or "Bison files" as appropriate.

Creating a Visual C++ Project

The following describes what can be done instead of the Custom Build Rules. if you the Custom Build Rules then you can skip to the Sample.

It is possible to set up a VC project to automatically generate the C or CPP file from Flex. I let's the Flex input file has a "l" or "lex" (as in lexical) for an extension and that the output will have a "cpp" extension.

Don't use precompiled headers unless you know how to use them well enough to use them for this. I had a problem with it complaining about macro redefinitions and premature end-of-file, until I turned off use of precompiled headers.

Generate a "Win32 Console Application" and make it an empty project (no generated source code). For example, I am using "SimpleFlex" for my project name.

Optional: You can customize the Fileview's folders so that the Flex input file is shown in the Source Files folder. In the properties of the Source Files folder, add the extension ("l" or "lex") to the list of extensions.

Then create a file with a "l" (or "lex") extension for the project; for example, "SimpleFlex.l". In the file, use one of the samples from below. Then in the project settings, create a Custom Build Step. If you are not familiar with Custom Builds, then look for the "Custom Build Step" tab in the project settings. Use the following for the Custom Build Step:

Description: Generating lexical analyzer

Commands: C:\Software\FLEX252\flex.exe -o$(ProjDir)\$(InputName).cpp $(InputPath)

Outputs: $(ProjDir)\$(InputName).cpp

Where:

Description
Is actually anything you want to use
Commands
Consists of the path to Flex, the output file and the input file. You will need to change the path for Flex to whatever is correct for your system. Otherwise, for a C language scanner, you can use the command as-is. For a C++ scanner, change the "c" extension to "cpp".
Outputs
Specifies the filename of the output file.

After providing the code for the Flex input file creating the Custom Build Step, compile the file. You can use Ctrl-F7 to just compile. Actually, at this point, you can just build the project; there is nothing for the build to do except generate the scanner (the cpp file). The custom build should execute Flex, but the only way you will know it does is because the description is shown in the Build output. The cpp file should have been generated and then it can be added to the project. Now when you build the project, the cpp file should be generated. If you get the errors I describe above (macro redefinitions and premature end-of-file) then turn off precompiled headers for the project.

Sample

This sample will simply change all the vowels ("aeiou") to a "|". Create a C++ Win32 console application project. Do not create an "Empty project" and do use Precompiled Headers. Then:

  • Edit the stdafx.h file and add "#include <iostream>"
  • Add the Custom Build Rules to the project as above (just check the checkbox in the "Build Dependencies" | "Build Customizations..." for the project)
  • Create another C++ file but give it the extension "l" (as in Lex) or "lex"; I suggest using "project.lex" where "project" is the name of your project
  • Provide the code shown below for the "project.lex" or "project.l" file
  • If you used "lex" for the extension then go to the properties for the file and change "Item type" to "Flex files"
  • Also in the properties for the file there will be a "Flex files" node in the left; in the "Flex Options" node of "Flex files" you can provide an "Output File Name"; if you use "project.cpp" where ""project" is your project's name then Flex will replace the file generated by Visual Studio; if you don't do that the delete the "project.cpp" file or at least comment out the main in that file
  • Compile the file (use Ctrl-F7) but do not build the project yet
  • If you did not specify an "Output File Name" then add the "project.flex.cpp" (the Flex output) to the project

project.lex or project.l

%top{
#include "stdafx.h"
}
%%
[aeiou] fputc('|', yyout);
%%

int yywrap(void) {
	return 1;
}

int main()
{
	const char folder[] = "???????????????????";
	char Buffer[_MAX_PATH];
	errno_t err;
	// open input
	strcpy_s(Buffer, folder);
	int n = strlen(folder);
	strcat_s(Buffer, "flexin.txt");
	std::cout << "Opening " << Buffer << '\n';
	err = fopen_s(&yyin, Buffer, "r");
	if (err != 0) {
		std::cout << "Error opening input file\n";
		return 1;
		}
	// open output
	Buffer[n] = 0;
	strcat_s(Buffer, "flexout.txt");
	std::cout << "Opening " << Buffer << '\n';
	err = fopen_s(&yyout, Buffer, "w");
	if (err != 0) {
		std::cout << "Error opening output file\n";
		return 1;
	}
	// Scan
	int token = yylex();
	while (token != 0)
		token = yylex();
	return 0;
}

Be sure to provide a folder for "folder" (with a trailing "\") and you will need an input file in the folder.

When Flex executes, the output file will have the following contents:

  • a #line directive that helps the compiler and debugger refer to the Flex input instead of the cpp file generated by Flex
  • #include "stdafx.h"
  • a scanner with the statement "fputc('|', yyout);" in it
  • the "yywrap" and "main" functions with #line directives

When the generated program executes, "main" will open the input file "flexin.txt" and output file "flexout.txt" and assign the files to yyin and yyout respectively then it calls yylex in a loop. The yyin and yyout files and the yylex function exist in the code generated by Flex. The yywrap function is also used by Flex; it allows use of multiple input files, such as when input data has "include" statements. Since we have only one input file, "yywrap" just returns "1" to indicate "true". The "%top" block specifies code that must go at the top of the generated file. Our #include will be at the top, preceded only by a "#line" directive.

The scanning is done by the yylex function. In this sample, the rules section specifies that a "|" is to be written when a vowel is encountered. All other characters will be written as-is.

Quick Introduction

Flex input files consist of three sections, separated by a line containing only "%%", as in the following:

definitions
%%
rules
%%
user code

Each of those are optional except a minimum Flex file would be just one line with the two characters "%%". A minimum file such as that would cause the generated program to just copy "stdin" (the keyboard) to "stdout" (the console window).

The definitions section can have:

  • an "%options" statement (or statements) to provide options
  • names to be provided for values for use later; they are not required, they just make things more convenient
  • declarations of "start conditions"
  • a "%top" block, as in our example, that provides code that must go at the top of the generated file; multiple '%top' blocks are allowed and their order is preserved
  • C code to be copied as-is to the output program

Lines beginning with "/*" begin a comment and "*/" ends the comment. Comments are copied as-is to the output. Also, text beginning with whitespace (is indented) is copied as-is to the output. Text beginning after a line with "%{" and up to a line before "%}" is copied as-is to the output. If the text being copied as-is exists in the rules section prior to any rule then it becomes part of the scanning routine and therefore can contain variables and/or code to be executed at the beginnning of the scanning routine.

The user code section is also copied as-is.

The rules (also called patterns) section consists of a series of rules of the form:

pattern action

Where the pattern must be unindented (begin in the first column) and the action must begin on the same line. Patterns are an extended set of regular expressions. Actions are C code to be executed when the corresponding pattern is satisfied. The patterns and actions are the important part of the Flex input. In most articles about Flex most actions are functions that write something out. That however is not what happens when Flex is used with Bison. When Flex is used with Bison, the typical action is a "return" statement that returns a "token" id. The name of the main function generated by Flex is called "yylex". When Flex is used with Bison, Bison calls "yylex" and when the "return" statement is executed as an action, the return value is returned to Bison.

You can refer to the documentation and other articles for more.

References

Books

The following two books might be helpful.

ISBN Author Publisher Cost (est.) Title
9780077092214 J. P. Bennett McGraw-Hill $166 Introduction to compiling techniques : a first course using ANSI C, LEX, and YACC
9780134743967 Axel T. Schreiner and H. George Friedman Prentice Hall $47 Introduction to Compiler Construction With Unix
9781565920002 Doug Brown, John Levine and Tony Mason O'Reilly $14 lex & yacc, 2nd Edition

I have the lex & yacc book from O'Reilly. Even though it is categorized as a "UNIX Programming Tools" book, it has very little that is specific to UNIX. It is nearly all about just Lex and YACC, which are very compatible with Flex and Bison.

Online Tutorials

Most (probably all) of the following I found by searching the internet.