Hi there!

Today I want to show you one of my latest projects I've been working on: nut-tree/nut.js

Put simply, nut.js (short for Native Ui Toolkit) allows you to remote control your mouse, your keyboard and your clipboard. Nothing new so far, there are quite a few packages which already provide this functionality.

The reason why I started building nut.js is the fact that none of the existing tools allowed me to steer my cursor based on images. One can do quite a lot with keyboard shortcuts, but, let's be honest, many applications are designed for mouse interaction. And doing this by only using coordinates is doomed to be a PITA.

Since I did quite a lot of image processing and computer vision stuff back in university I sat down on a weekend and started to tinker around with existing tools. What can I say, it worked out pretty well and I was able to draft a first prototype.

A few weeks went by and I continuously kept working on my little side project while working full-time on a customer project. Things slowly started coming together and oh boy, working on a cross-platform native tool teaches you A LOT.

The Stack

nut.js is built using the following tech stack:

It is built using the current LTS version of node (at the time of writing: node 10, a.k.a. lts/dubnium), with node 12 support (the next LTS version) right around the corner). I decided to use TypeScript because type safety is a cool thing to have :).

I'm currently only using Travis for CI, but I might add AppVeyor in the near future (more on that later). The CI build uses a VNC Docker container to run headless E2E tests with a defined UI state, a nice way to verify that everything works as expected.

SonarCloud provides some metrics and quality gates, GreenKeeper keeps my dependencies up to date.

All in all a pretty decent setup which is worth a separate post.

Going Native

nut.js makes heavy use of native addons, both written using Native Addons for Node.js and recently also N-API. Automating native keyboard and mouse control requires system level API calls, something which is only possible using node C++ addons. The current release of nut.js uses a fork of octalmage/robotjs. I initially forked the repo because there was no release for node v10.x of robotjs and no roadmap regarding upcoming releases. For the upcoming release of nut.js I ported this fork to N-API for easier usage with future node versions and the possibility to extend it at my own pace.

For image based mouse movement I'm using opencv4nodejs. After comparing multiple node bindings for OpenCV I can safely say that this library provides the best and most up-to-date OpenCV bindings for node. Once again, I'm using my own fork here.

opencv4nodejs comes with a mechanism which builds OpenCV from scratch when installing the package and afterwards compiles the C++ addon.

While this approach might be the most flexible, installing the package takes 30+ minutes.

With opencv4nodejs-prebuilt I spent quite some time to enable a quick cross-platform installation. opencv4nodejs-prebuilt ships an opinionated, ready to use build for Windows, Linux and macOS, but can be re-compiled if required. The setup to achieve this consists of multiple packages:

  • Platform specific npm packages for Windows, Linux and macOS which ship pre-compiled OpenCV libs
  • Platform and node version specific pre-compiled OpenCV bindings using prebuild

After a little tweaking each pre-built binding also ships the required OpenCV libs and can be installed from GitHub releases. prebuild-install tries to download the correct binding for a given platform + node version combination, so no compilation is required. If no suitable binding is available or fails the runtime check, a re-build is triggered.

What's in the box?

nut.js exports objects to access certain OS functionality:

export {
  clipboard,
  keyboard,
  mouse,
  screen,
  ...
};

Most of these objects (expect clipboard) hold a public config object which allows to tweak certain parts like typing speed, mouse speed or paths to images for screen matching.

Keyboard

A little demo showing the use of keyboard:

"use strict";

const { keyboard, Key, sleep } = require("@nut-tree/nut-js");

const openLauncher = async () => {
  await keyboard.type(Key.LeftAlt, Key.F2);
};

describe("Keyboard test", () => {
  it("should open Thunar", async () => {
    await openLauncher();
    await keyboard.type("thunar");
    await keyboard.type(Key.Return);
    await sleep(1000);
    await keyboard.type(Key.LeftAlt, Key.F4);
  });
});

As you can see, via keyboard it's possible to type either text, single keys or key combos.

Mouse

Mouse movement follows a simple pattern:

mouse.move(...);

takes a sequence of Point ({x, y}) coordinates which describe a path to follow. Additionally, nut.js exports high-level movement functions:

"use strict";

const { mouse, right, down, left, up } = require("@nut-tree/nut-js");

const square = async () => {
  await mouse.move(right(500));
  await mouse.move(down(500));
  await mouse.move(left(500));
  await mouse.move(up(500));
};

describe("Basic mouse test", () => {
    it("should move the mouse in square shape", async () => {
        jest.setTimeout(10000);
        await square();
    });
});
left(x)
right(x)
up(x)
down(x)

return a path of x pixels in the respective direction, relative to the current mouse position.

Screen

When it comes to screenshot based mouse movement, this pattern extends even further:

"use strict";

const { screen, mouse, centerOf, straightTo } = require("@nut-tree/nut-js");

describe("Basic mouse test", () => {
    it("should move the mouse in square shape", async () => {
        jest.setTimeout(10000);
        screen.config.resourceDirectory = "../../e2e/assets";

        await mouse.move(straightTo(centerOf(screen.find("mouse.png"))));
    });
});

screen.config.resourceDirectory = "../../e2e/assets"; configures the path to load image files from.

Now in order to move the mouse to the location of our template image on the screen, nut.js applies the following pattern:

  1. screen.find("mouse.png"); returns a Region ({left, top, width, height}) object which holds the coordinates of our template image on our screen
  2. centerOf(x) returns the center Point p of a given Region x
  3. straightTo(p) calculates a straight path from our current mouse position to the given Point p
  4. mouse.move(...) follows this path as we have already seen before

When searching for images, nut.js takes care of multiple image scales as well as pixel densities. This makes image based movement robust against scaling and different display types.

Jest Matchers

All the sample snippets shown earlier are regular Jest tests. Jest uses "matchers" to expect certain test values. While writing e2e tests for nut.js I got curious whether it would be possible to write my own matchers for use with nut.js. This would be a nice feature to verify mouse position or whether the screen shows an image or not:

"use strict";

const {jestMatchers, mouse, screen, Point, Region} = require("@nut-tree/nut-js");

beforeAll(() => {
    expect.extend(jestMatchers);
});

describe("Basic test with custom Jest matchers", () => {
    it("should verify that cursor is at a certain position", async () => {
        const targetPoint = new Point(10, 10);
        const targetRegion = new Region(20, 20, 30, 30);

        await mouse.setPosition(targetPoint);

        expect(mouse).toBeAt(targetPoint);
        expect(mouse).not.toBeIn(targetRegion);
    });

    it("should verify that the screen shows a certain image", async () => {
        screen.config.resourceDirectory = "../../e2e/assets";

        await expect(screen).toShow("mouse.png");
    });
});

Extending Jest was easily possible thanks to its great documentation! :)

What's to come?

For future releases of nut.js I'm planning to include OCR support, the next release will be using latest OpenCV 4.x and libnut. I'm also looking into ways to provide a cross-platform highlighting functionality, which would be useful for visual debugging.

If you have any questions or ideas for possible features, don't hesitate to open an issue! :)

So long

Simon