In a project in my previous job, I used a product that provides TCL as an extensible API for users. TCL is pretty good and easy to extend, but when you write a bigger script, things become terrible. Also, according to my personal experience, the biggest weakness of TCL is that it cannot evaluate expressions. You have to call ‘expr’ command to evaluate anything, which is pretty troublesome. It’s flow control is also kind of ugly. So Richard Stallman published an article “Why you should not use Tcl” long time ago, said that one should avoid TCL when choosing an extension language:
As interest builds in extensible application programs and tools, and some programmers are tempted to use Tcl, we should not forget the lessons learned … a language for extensions should not be a mere “extension language”. It should be a real programming language, designed for writing and maintaining substantial programs … Extensions are often large, complex programs in their own right, and the people who write them deserve the same facilities that other programmers rely on.
I couldn’t agree more. But I don’t like Python for it restricts you to add certain number of tabs in front of every line… It’s also a pain to integrate Python with C++ or Java code. I don’t like JavaScript (Rhino) either. Since I got some time lately, I decided to write an interpreter to use as an embedded language tool in the future… I chose BASIC as the prototype, for it was the first programming language I learned long time ago. The previously finished expression evaluator was also for this project.
After some work, I finished the main part of it. It can be downloaded here as an executable: mybasic.zip. (Please use Winzip or Winrar to unzip it. Or, here are the unzipped files: mybasic.exe, readme in English, Chinese and Japanese.
The current form of it is a command line tool to interprete BASIC programs or evaluate expressions. It writes the result to stdout and returns 0. When error occurs, it writes error message to stderr, and returns 1.
This is version 0.6, without array & datetime support yet. Since the initial purpose of this is to make an embedded language interpreter, I simplified the BASIC syntax to only keep the important parts (and also borrowed some new stuffs from Visual Basic).
This software was written in C++, with STL, without 3rd-party software (such as yacc and so on).
Usage:
mybasic -e <expression>
or
mybasic -f <BASIC_program_file_name>
The sample BASIC program used above (test.bas):
function mySum (byval x as integer) as integer if x <= 1 then return x else return x + mySum(x - 1) end if end function print mysum(100)
Here is the spec of it:
Expression Evaluatorr
The expression should be written in BASIC syntax. algebra, string, comparison and logic calculations are supported.
The spec of it is as follows:
Data types
- boolean
- integer
- double
- string
Operators
- math: ^, +(unary), -(unary), *, /, \, MOD, +, –
- string: +
- relational: =, <>, <, <=, >, >=
- logic: NOT, AND, OR
- others: (, ) (to change precedences of operators)
Math Functions
abs, sign, int, sqrt, exp, log, log10, rad (change degrees to radians), deg (change radians to degrees), sin, cos, tan, asin, acos, atan, sinh, cosh, tanh, asinh, acosh, atanh, sec, csc, pi (return value of Pi), random (return random number in [0, 1])
String Functions
str(num) (convert number to string), space(n), tab(n), ltrim(str), rtrim(str), trim(str), len(str), ucase(str), lcase(str), val(str) (parse string to number), isNumeric(str), left(str, n), right(str, n), mid(str, from, n), instr(str1, str2) (return index starting from 0; if not found, return -1)
Differences from traditional BASIC language
- some function names are different, for example, “sqrt” instead of “sqr”
- indexes in a string starts from 0 instead of 1
- BASIC type characters (such as “2#” for 2 of double type) not supported
BASIC Interpreter
This interpreter supports a practically simplified BASIC syntax.
Spec of ‘practically simplified BASIC syntax’ (besides things listed above):
Supported
- comment: REM, ‘ (single quotation mark)
- variable declarison: DIM … AS …
- condition (block IF): IF, THEN, ELSEIF, ELSE, END IF
- FOR loop: FOR, TO, STEP, NEXT, CONTINUE FOR, EXIT FOR
- DO loop: DO, WHILE, UNTIL, LOOP, CONTINUE DO, EXIT DO
- function & sub-routine: FUNCTION, END FUNCTION, SUB, END SUB, RETURN, BYVAL, BYREF
- output: PRINT
- others: END
Not supported
- LET (most useless command. ignored by the interpreter.)
- line number (too old)
- GOTO (evil)
- GUSUB…RETURN (too old)
- DEF FN (too old)
- label (since GOTO/GOSUB/etc are not supported, there is no need to have labels)
- Single-line IF
- READ…DATA (too old)
- SELECT CASE (don’t like the syntax. don’t want to introduce something too different, such as the ‘switch/case’ in C/C++/Java. can be replaced with IF)
- ON [ERROR] … (too old)
- WHILE … WEND (old but not good; replaced by the more powerful & flexible DO … LOOP)
- EXIT SUB, EXIT FUNCTION (use RETURN instead)
- CALL (but functions can be called directly as sub-routines)
- variant data type (decreases code quality)
- user defined type (class)
- PRINT USING (if it is really necessary, will consider adding it)
- file I/O (OPEN, etc… will provide new set of file I/O API in later versions)
Differences from normal BASIC language
- variables must be declared (with DIM) before use (similar to always declaring ‘OPTIONAL EXPLICIT’ in Visual Basic. to help improve code quality.)
- “DIM a, b, c AS DOUBLE” is OK, but “DIM a AS INTEGER, b AS DOUBLE” not supported (to help improve code quality)
- Boolean type not allowed to convert from & to Integer (to help improve code quality)
- functions: BYVAL or BYREF must be specified (no default) (to help improve code quality)
- in PRINT statement:
- “,” means inserting a tab, instead of starting to print at a certain place (too old-fashioned and not practical any more)
- “;” means to start right after the previous place (same as original BASIC spec), but won’t insert a space before a number (programmer should be responsible for this when necessary!)
Not yet supported (will be supported in later versions)
- data type:
- array (multi-dimensional)
- array index starts from 0, cannot be specified to another number
- in ‘DIM A(N) AS INTEGER’, N is not upperbound, but size of array
- date
- time
- datetime
- array (multi-dimensional)
- built-in functions
- some statistical functions
- some algorithms
- file I/O
- other interoperabilities (?)
- commands
- INPUT (?)