Saturday, June 23, 2012

lldb: VI mode and tab-completion

lldb uses libedit, which is a BSD licensed alternative to GNU readline. A feature of libedit is the ability to configure per-application settings for command-line bindings via ~/.editrc. Detailed documentation for this configuration file can be found using man editrc or via online documentation, such as

As I prefer VI bindings, I initially configured my .editrc as follows, to replace the default emacs bindings:

lldb:bind -v

Unfortunately, when I re-ran lldb, tab-completion was not functioning.  I added the following line, which is exported in libedit, and what I assumed would be the default completer function:

lldb:bind ^I rl_complete

Although tab-completion started working again, I was seeing frequent segfaults in lldb, along with an error that it could not bind the rl_complete command.  How was I going to find this elusive tab-completion command?  I could dig through lldb source, but figured there must be a quicker way.  I reviewed all the commands in the editrc docs and found 

           Editline extended command.
I cleared my .editrc and added:
lldb:bind ^P ed-command

tab-completion was working again, and pressing ^P presented me a :, puting me in Editline's extended command mode. Typing bind listed all the current bindings; I was looking for ^I, which showed up as

"^I"           ->  lldb_complete

lldb_complete was the elusive command I needed to fix tab-completion in VI mode, so I settled on the following for my .editrc

lldb:bind -v
lldb:bind ^I lldb_complete
lldb:bind ^P ed-command

I suspect this technique will come in handy for other programs that use libedit and a non-standard completion function, hence this long post.

Wednesday, June 06, 2012

llvm / Clang hacking: Part 3

Part 3 in my N-part series on my exploration of hacking on llvm and Clang (c-language) tool chain.


This post assumes you've successfully completed Part 1 and Part 2 of the series.  I'm also going to assume if you're interested in hacking on Clang, you have an understanding of compilation and are familiar with terms such as lexing, parsing, syntactic analysis, semantic analysis and code generation.  If not, then you need to purchase a copy of Compilers: Principals, Techniques and Tools, also known as the Dragon Book and read through it.  There are also plenty of resources on Google.

Objective-C Extension: NSURL Literals

Objective-C literals are an exciting syntactic feature coming to the next release of Clang.  This will be available in Xcode 4.4 and presumably the next iOS update.  I was indirectly presented with the challenge on Twitter from @casademora when querying what an NSURL literal might look like.  Truthfully, I've wanted an excuse to hack on Clang and this seemed small enough in scope to achieve in a day.  I threw out the idea of NSURL literals being represented by a @@ prefix, so the following line would compile:

NSURL *url = @@""

NOTE: I'm not suggesting NSURL literals should be introduced in to Objective-C.  This merely serves a reasonable feature for academic exploration.

Parsing: libparse

Armed with the knowledge that these new literals were available, I started exploring the libparse code in Clang.  ParseObjc.cpp seemed like a good place to start, which turned out to be correct and lead me to the rather aptly named Parser::ParseObjCAtExpression method.  The implementation of this method is obvious, determining the next token and delegating parsing to various methods depending of the type of expression encountered.  Our syntax requires a second @ token, so I added the following code to the switch statement:

case tok::at:
    // Objective-C NSURL expression
    ConsumeToken(); // Consume the additional @ token.
    if (! {
      return ExprError(Diag(AtLoc, diag::err_unexpected_at));
    return ParsePostfixExpressionSuffix(ParseObjCURLLiteral(AtLoc));

In english, if we find another @ token, we'll assume an NSURL literal and attempt to parse, by delegating to our new ParseObjCURLLiteral method. The implementation of ParseObjCURLLiteral is again quite simple:

ExprResult Parser::ParseObjCURLLiteral(clang::SourceLocation AtLoc) {
    ExprResult Res(ParseStringLiteralExpression());
    if (Res.isInvalid()) return move(Res);
    return Owned(Actions.BuildObjCURLLiteral(AtLoc, Res.take()));

The first thing it does is attempt to parse a C-string literal using the existing ParseStringLiteralExpression method.  If the result of this is invalid, fail; otherwise, call our new Actions.BuildObjCURLLiteral method.  Actions is an instance of the Semantic Analysis class Sema in libsema, responsible for generating the Abstract Syntax Tree (AST), which is consumed by the code generator in libcodegen.

Semantic Analysis: libsema

This library is responsible for converting parsed code into an AST.  I looked at the BuildObjCStringLiteral and BuildObjCNumericLiteral methods to gain a better understanding of the responsibility of libsema.  These methods return an ExprResult which will ultimately be used by the code generator to write llvm IR. 

First this is to determine is how to generate an NSURL instance.  The NSURL URLWithString: class method is the best candidate, following the lead of the other Obj-C literals.  This class method requires an NSString as it's first and only argument, so the first thing we need is to generate this expression.  As can be seen in the first few lines of our BuildObjCURLLiteral method

ExprResult Sema::BuildObjCURLLiteral(SourceLocation AtLoc, Expr *String) {
  StringLiteral *S = reinterpret_cast(String);
  if (CheckObjCString(S))
      return true;
  ExprResult ObjCString = BuildObjCStringLiteral(AtLoc, S);

We take our string expression and build and NSString. Next we need to confirm the existence of and cache the NSURL class declaration

if (!NSURLDecl) {
  NamedDecl *IF = LookupSingleName(TUScope,
  NSURLDecl = dyn_cast_or_null(IF);
  if (!NSURLDecl && getLangOpts().DebuggerObjCLiteral)
    NSURLDecl =  ObjCInterfaceDecl::Create (Context,
                                              0, SourceLocation());
  if (!NSURLDecl) {
    Diag(AtLoc, diag::err_undeclared_nsurl);
    return ExprError();

  // generate the pointer to NSURL type.
  QualType NSURLObject = CX.getObjCInterfaceType(NSURLDecl);
  NSURLPointer = CX.getObjCObjectPointerType(NSURLObject);

and finally the URLWithString: selector

if (!URLWithStringMethod) {
  Selector Sel = NSAPIObj->getNSURLLiteralSelector(NSAPI::NSURLWithString);
  URLWithStringMethod = NSURLDecl->lookupClassMethod(Sel);

NOTE: I implemented the NSURL features on the NSAPI class, which were easily determined by examining the other APIs exposed, such as NSArray and NSDictionary.

The remaining requirement of this method is to return an expression that will ultimately result in a call to objc_msgSend with the arguments: NSURL class, URLWithString: selector and NSString constant.  Conveniently, the ObjCBoxedExpr provides just what we need, resulting in this final call

SourceRange SR(S->getSourceRange());

// Use the effective source range of the literal, including the leading '@'.
return MaybeBindToTemporary(
                            new (Context) ObjCBoxedExpr(ObjCString.take(), NSURLPointer, URLWithStringMethod,
                                                        SourceRange(AtLoc, SR.getEnd())));

The SR (SourceRange) argument is used to associate this AST node with the source code it is generated from.  The ObjCBoxedExpr class contains all the pieces needed to execute the objc_msgSend, which will be later consumed by the code generator.  By reusing this class, we've avoided the need to write our own code generation.

Code Generation

To conclude this article and clarify what happens with our AST node (ObjCBoxedExpr), lets take a look at libcodegen to see how our NSURL call is converted to executable code.  The CGObjC.cpp file contains a rather curiously named method, EmitObjCBoxedExpr, taking a single parameter ObjCBoxedExpr.  It is quite succinct, making it very easy to understand (code removed for clarity). Worth noting is this function requires the boxing selector be a class method.

  // Grab the NSString expression that will be the argument to URLWithString:
  const Expr *SubExpr = E->getSubExpr();
  // Grab the URLWithString: selector
  const ObjCMethodDecl *BoxingMethod = E->getBoxingMethod();
  Selector Sel = BoxingMethod->getSelector();
  // Generate a reference to the class pointer, which will be the receiver.
  // Assumes that the method was introduced in the class that should be
  // messaged (avoids pulling it out of the result type).
  CGObjCRuntime &Runtime = CGM.getObjCRuntime();
  const ObjCInterfaceDecl *ClassDecl = BoxingMethod->getClassInterface();
  llvm::Value *Receiver = Runtime.GetClass(Builder, ClassDecl);

  // adds the NSString to the call arguments list
  const ParmVarDecl *argDecl = *BoxingMethod->param_begin();
  QualType ArgQT = argDecl->getType().getUnqualifiedType();
  RValue RV = EmitAnyExpr(SubExpr);
  CallArgList Args;
  Args.add(RV, ArgQT);
  // Generates the llvm IR code to execute the objc_msgSend function
  RValue result = Runtime.GenerateMessageSend(*this, ReturnValueSlot(),
                                              BoxingMethod->getResultType(), Sel, Receiver, Args,
                                              ClassDecl, BoxingMethod);
  return Builder.CreateBitCast(result.getScalarVal(),

Limitations and Improvements

This implementation is fairly rigid; only allowing a single line NSString, whereas a more robust, production-quality implementation should support multi-line NSString declarations.  

As per the overview for NSURL, URLs understood are described in RFCs 1808, 1738 and 2732.  Adding code to interpret the contents of the string and validate per these RFCs, reporting a warning to the engineer would add considerable value to this feature, much like the warnings provided when a format string and its arguments are potentially invalid.

Source Code

Complete source to this series is available on github in my clang repository fork.

Next Up

I may explore compiling a release build and using the compiler within Xcode as an alternate.  Suggestions are welcome; message me on twitter, @stuartcarnie


Follow me on twitter, @stuartcarnie.

llvm / Clang hacking: Part 2

Part 2 in my N-part series on my exploration of hacking on llvm and Clang (c-language) tool chain.


This post assumes you've successfully completed Part 1 of the series.  


By default, Clang presents a gcc-compatible command-line interface.  In most circumstances, this allows Clang to be a drop-in replacement for gcc for rapid testing and easier adoption.  When using the gcc interface, Clang spawns a new job to handle the compilation, which prevents debugging the various stages of the compilation process.  You can see this by running the following command:

clang -### test.m -o test

With that in mind, you should invoke Clang with the -cc1 option as the first argument, which directly executes the Clang cc1 tool.  TIP: running the following command outputs considerably useful command line documentation:

clang -cc1 --help

Also noteworthy is the following command

clang -cc1as --help

which documents many of the arguments available to the Clang Integrated Assembler.

If your using Xcode, you'll find main() in Clang executables » clang » Source Files » driver.cpp.  Notice in this folder you'll also find the cc1_main.cpp, which is the entry point for the clang compiler. 

Clang executable source files

To set these command-line options, select Product | Edit Scheme:

Edit Scheme

Note again that the first argument must be -cc1.  Next is the input file to compile, which you will substitute for your own source file.  Finally you must specific the include path for your current clang headers as this development build will not find them automatically.

With these set, you should be able to successfully run and debug the entire Clang tool chain.

Recommended Reading

Now that you have an environment and presumably can debug the compiler, I'd recommend you read the following articles for clarification on the design and internals of Clang:

Next Up

Next up I'll walk through creating a language extension to Objective-C, supporting NSURL literals, following (in principal) new Objective-C Literals coming in the next release of Clang.


Follow me on twitter, @stuartcarnie.

Saturday, June 02, 2012

llvm / Clang hacking: Part 1

Part 1 in my N-part series on my exploration of hacking on llvm and Clang (c-language) tool chain.  I am running OS X 10.7, however I will try to highlight the steps where you should consider substituting for your platform.

Getting Started

Follow these steps, with the following exceptions if you prefer git (it is a lot faster); I am using the official llvm mirror on

  • Step 2, substitute the svn command for
    git clone
  • Step 3, substitute the svn command for
    git clone
  • Step 4, substitute the svn command for
    git clone
  • Step 5, I am using CMake, so instead of ../llvm/configure
    cmake -G "Unix Makefiles" ../llvm

Creating Xcode project for Clang

You could run

  • mkdir llvm
  • cd llvm
  • cmake -G Xcode ../llvm

to create an Xcode project for the entire llvm/Clang toolchain, however it ends up being thousands of source files and 223 targets! As I'm only interested in hacking on Clang, lets generate a project file for Clang and related projects only. Starting from the folder which contains build and llvm:

  • mkdir clang
  • cd clang
  • cmake -DCLANG_PATH_TO_LLVM_SOURCE=../llvm -DCLANG_PATH_TO_LLVM_BUILD=../build -DCMAKE_BUILD_TYPE=Debug ../llvm/tools/clang

Assuming all went well, you'll now have an Xcode project called Clang.xcodeproj with about 400 source files and 60 targets. Open it up and let Xcode index everything, which may take a few minutes depending on your hardware.  Once completed, switch to the clang target and build!

Note: On Windows, it is likely cmake will auto-detect your Visual Studio environment and the above commands will Just Work™

Xcode clang target

Assuming all goes well, a few minutes later you should see:

Clang build Succeeded

Back in Terminal, you can run the following command, which creates a Hello World program and tests your build of Clang.

From the clang folder

  • cd bin/Debug
  • echo "#include \nint main(int argc,char**argv) { printf(\"hello world\\\n\"); return 0; }" > hello.c && clang hello.c -o hello && ./hello

If you see hello world after running the 2nd command, pat yourself on the back, as you've successfully setup a working llvm / Clang development environment to start your hacking.

Next Up

Part 2 and debugging Clang.


Follow me on twitter, @stuartcarnie.