Announcing loccount 2.0 – now up to 74 languages

By Eric Raymond

I just released the 2.0 version of loccount.

This is a major release with many new features and upgrades. It’s gone well beyond just being a faster, cleaner, bug-fixed port of David A. Wheeler’s sloccount. The count of supported languages is now up to 74 from sloccount’s 30. But the bigger change is that for 33 of those languages the tool can now deliver a statement count (LLOC = Logical Lines Of Code) as well as opposed to a line count (SLOC = Source Lines of Code, ignoring whitespace and comments)

To go with this, the tool can now perform COCOMO II cost and schedule estimation based on LLOC as well as COCOMO I based on SLOC.

The manual page includes the following cautions:

SLOC/LLOC figures should be used with caution. While they do predict project costs and defect incidence reasonably well, they are not appropriate for use as ‘productivity’ measures; good code is often less bulky than bad code. Comparing SLOC across languages is also dubious, as differing languages can have very different complexity per line.

With these qualifications, SLOC/LLOC does have some other uses. It is quite effective for tracking changes in complexity and attack surface as a codebase evolves over time.

That’s how I’ve used it – to track the drastic reduction in NTPsec’s codebase size. It was also my finger exercise in learning Go. For those of you who enjoy reading code, there are a couple of points of interest…

One is how much of the program’s intelligence is in tables rather than executable code. Adding support for a new language is usually as simple as adding a new table entry and a test load. Some of the internal changes in 2.0 relate to enriching the table format – for example, two things you can now declare are that a language uses the C preprocessor and that C-style backslash escapes are implemented in its string constants. I’ve often advocated this kind of programming by declarative specification in my writings; here you can see it in action.

The contrast with sloccount is extreme; sloccount was clever, but it’s a messy pile of scripting-language hacks that makes adding a new language – or, really, changing anything about it at all – rather difficult. This is probably why it hasn’t been updated since 2004.

Another point of interest is the way the program uses Go’s CSP concurrency primitives. It’s a pattern that could generalize to other ways of data-mining file trees. Walk the tree, spawning a thread to gather stats on each file; each thread writes a summary to one single rendezvous channel; the main thread reads summary blocks off the rendezvous channel and aggregates them into a report. There’s no explicit lock or release anywhere, all the synchronization is supplied by the channel’s atomicity guarantees.

That’s pretty simple, but as a readily accessible demonstration of why CSP rocks compared to conventional mailbox-and-mutex synchronization its very simplicity makes it hard to beat.

Ironically, one of the source languages this tool written in Go cannot deliver LLOC reports for is Go itself. That’s because Go doesn’t have an explicit end-of-statement marker; counting statements would in principle require a full parse. Something might be done with the assumption that the input source in in the canonical form that go fmt produces.

Plans for future releases:

* Beat Go into not barfing on ABC comments? (We have a test load)

* Tell Objective-C .m from MUMPS .m with a verifier?

* sloccount handles different asm comment conventions. We should too.

* How to weight EAF:

* Check that we handle every extension in sloccount’s list.

* Detect inline assembler in C?

Here’s the set of languages supported: ada algol60 asm autotools awk c c# c++ c-header clojure clojurescript clu cobol cobra csh css d dart eiffel elisp erlang expect f# fortran fortran03 fortran90 fortran95 go haskell icon java javascript kotlin lex lisp lua m4 makefile ml modula2 modula3 mumps oberon objective-c occam pascal perl php php3 php4 php5 php6 php7 pl/1 pop11 prolog python rebol ruby rust sather scheme scons sed shell simula sql swift tcl verilog vhdl vrml waf yacc.

If something you actually use isn’t on there, send me a code sample with the name of the language in it. The code samples should contain a block comment and a winged comment as well as code comprising at least one full statement. “Hello, world” will do.