Flex


Click here to change the theme.

Flex is a tool that generates C source code (compatible with the C++ compiler) that are lexical analyzers (scanners). It recognizes syntax, whereas another program, Bison, can be used to generate the semantic analysis portion. When both are used, a lot of work required for writing a compiler is not necessary. Flex can be used for many other purposes as well, not just for compilers. Often we don't need the semantic analysis portion and Flex is enough. Regardless of whether Bison is used, it is worthwhile to learn about Flex. Flex can be used to do things that might be difficult or impossible to do with Regular Expressions.

There is a lot of material about Flex and Bison available, including books. This page is intended to help with use of the Windows ("WIN32") version. The simple sample I have here works for me using win_flex.exe (Windows Flex) version 2.6.3.

Flex ("Fast Lex") is a free version of Lex and Bison is a free version of YACC ("Yet Another Compiler Compiler"). Lex and (especially) YACC are classic tools with many years of use.

The Windows version of Flex and Bison are available from Win flex-bison download | SourceForge.net. Download and unzip using the "Download" button. Once unzipped where you want them to be, see Win flex-bison / Wiki / Visual Studio custom build rules. That will save you from many details of getting Flex and Bison set up and working. You might need to sign out of Windows and sign in again. I was getting an error such as "Windows cannot find Winflex.exe" until I did that. After that, when you create a project you will need to add the Custom Build Rules to the project using "Build Dependencies" | "Build Customizations" for the project and checking the checkbox for the rules. Note that Flex and Bison are moving to GitHub but I don't know where the special Windows versions are in GitHub.

If you have any Flex and/or Bison files in existing projects prior to installing the Custom Build Rules then add the rules to the project then modify the properties of the file(s). Change the properties of the files to set the "Item Type" to "Flex files" or "Bison files" as appropriate.

Creating a Visual C++ Project

The following describes what can be done instead of the Custom Build Rules. if you the Custom Build Rules then you can skip to the Sample.

It is possible to set up a VC project to automatically generate the C or CPP file from Flex. I let's the Flex input file has a "l" or "lex" (as in lexical) for an extension and that the output will have a "cpp" extension.

Don't use precompiled headers unless you know how to use them well enough to use them for this. I had a problem with it complaining about macro redefinitions and premature end-of-file, until I turned off use of precompiled headers.

Generate a "Win32 Console Application" and make it an empty project (no generated source code). For example, I am using "SimpleFlex" for my project name.

Optional: You can customize the Fileview's folders so that the Flex input file is shown in the Source Files folder. In the properties of the Source Files folder, add the extension ("l" or "lex") to the list of extensions.

Then create a file with a "l" (or "lex") extension for the project; for example, "SimpleFlex.l". In the file, use one of the samples from below. Then in the project settings, create a Custom Build Step. If you are not familiar with Custom Builds, then look for the "Custom Build Step" tab in the project settings. Use the following for the Custom Build Step:

Description: Generating lexical analyzer

Commands: C:\Software\FLEX252\flex.exe -o$(ProjDir)\$(InputName).cpp $(InputPath)

Outputs: $(ProjDir)\$(InputName).cpp

Where:

Description
Is actually anything you want to use
Commands
Consists of the path to Flex, the output file and the input file. You will need to change the path for Flex to whatever is correct for your system. Otherwise, for a C language scanner, you can use the command as-is. For a C++ scanner, change the "c" extension to "cpp".
Outputs
Specifies the filename of the output file.

After providing the code for the Flex input file creating the Custom Build Step, compile the file. You can use Ctrl-F7 to just compile. Actually, at this point, you can just build the project; there is nothing for the build to do except generate the scanner (the cpp file). The custom build should execute Flex, but the only way you will know it does is because the description is shown in the Build output. The cpp file should have been generated and then it can be added to the project. Now when you build the project, the cpp file should be generated. If you get the errors I describe above (macro redefinitions and premature end-of-file) then turn off precompiled headers for the project.

Sample

This sample will simply change all the vowels ("aeiou") to a "|". Create a C++ Win32 console application project. Do not create an "Empty project" and do use Precompiled Headers. Then:

  • Edit the stdafx.h file and add "#include <iostream>"
  • Add the Custom Build Rules to the project as above (just check the checkbox)
  • Create another C++ file but give it the extension "l" (as in Lex) or "lex"; I suggest using "project.lex" where "project" is the name of your project
  • Provide the code shown below for the "project.lex" or "project.l" file
  • If you used "lex" for the extension then go to the properties for the file and change "Item type" to "Flex files"
  • Compile the file (use Ctrl-F7) but do not build the project yet
  • Add the "project.flex.cpp" (the Flex output) to the project

project.lex or project.l

%option outfile="project.cpp"
%top{
#include "stdafx.h"
}
%%
[aeiou] fputc('|', yyout);
%%
int yywrap(void) {
	return 1;
}

int main()
{
	const char folder[] = "???????????????????";
	char Buffer[_MAX_PATH];
	errno_t err;
	// open input
	strcpy_s(Buffer, folder);
	int n = strlen(folder);
	strcat_s(Buffer, "flexin.txt");
	std::cout << "Opening " << Buffer << '\n';
	err = fopen_s(&yyin, Buffer, "r");
	if (err != 0) {
		std::cout << "Error opening input file\n";
		return 1;
		}
	// open output
	Buffer[n] = 0;
	strcat_s(Buffer, "flexout.txt");
	std::cout << "Opening " << Buffer << '\n';
	err = fopen_s(&yyout, Buffer, "w");
	if (err != 0) {
		std::cout << "Error opening output file\n";
		return 1;
	}
	// Scan
	int token = yylex();
	while (token != 0)
		token = yylex();
	return 0;
}

Be sure to change "project" in the first line to be your project's name, so that Flex will replace the file generated by Visual Studio.You will also need to provide a folder for "folder" (with a trailing "\") and you will need an input file in the folder.

When the program executes, "main" will open the input file "flexin.txt" and output file "flexout.txt" and assign the files to yyin and yyout respectively then it calls yylex in a loop. The yyin and yyout files and the yylex function exist in the code generated by Flex. The yywrap function is also used by Flex; it allows use of multiple input files, such as when input data has "include" statements. The "%top" block specifies code that must go at the top of the generated file. Our #include will be at the top, preceded only by a "#line" directive.

The scanning is done by the yylex function. In this sample, the rules section specifies that a "|" is to be written when a vowel is encountered. All other characters will be written as-is.

Quick Introduction

Flex input files consist of three sections, separated by a line containing only "%%", as in the following:

definitions
%%
rules
%%
user code

The definitions section allows providing names for values for use later. They are not required; they just make things more convenient. Definitions can also include declarations of "start conditions". A "%top" block, as in our example, provides code that must go at the top of the generated file. Multiple '%top' blocks are allowed and their order is preserved.

Lines beginning with "/*" begin a comment and "*/" ends the comment. Comments are copied as-is to the output. Also, text beginning with whitespace (is indented) is copied as-is to the output. Text beginning after a line with "%{" and up to a line with "%}" is copied as-is to the output. If the text being copied as-is exists in the rules section prior to any rule then it becomes part of the scanning routine and therefore can contain variables and/or code to be executed at the beginnning of the scanning routine.

The user code section is also copied as-is.

The rules (also called patterns) section consists of a series of rules of the form:

pattern action

Where the pattern must be unindented and the action must begin on the same line. Patterns are an extended set of regular expressions.

References

Books

The following two books might be helpful.

ISBN Author Publisher Title
007709221X J. P. Bennett McGraw-Hill Introduction to compiling techniques : a first course using ANSI C, LEX, and YACC
1565920007 John Levine O'Reilly lex & yacc, 2nd Edition

I have the lex & yacc book from O'Reilly. Even though it is categorized as a "UNIX Programming Tools" book, it has very little that is specific to UNIX. It is nearly all about just Lex and YACC, which are very compatible with Flex and Bison.

Online Tutorials

Most (probably all) of the following I found by searching the internet.