Async GraphQL with Rust: Part Four

(This article will be cross-posted to the Formidable Blog.)

Part 4: Unit & Integration Testing

Welcome back to my series covering Async GraphQL in Rust! This series is aimed at intermediate Rust developers who have grasped the basics and are ready to build production-ready server applications. Today's entry will cover unit and integration testing. Though Rust has a strict and sound type system, I believe that strong test coverage is still important to check my assumptions and prove that my code will work the way I think it will. It also serves as an example for maintainers who come along after you, showing how you intend your code to be used. Testing can be a challenge, but the confidence it can build as you make changes over time is well worth the effort.

Unit Testing

Unit testing is the validation of individual units of functionality by isolating those units from anything they depend upon outside themselves. This is typically achieved through mocking, which allows you to replace those dependencies with fake versions that track how they are called and return pre-programmed responses.

If you're coming to Rust from the JavaScript ecosystem, unit testing will feel a bit different. Because of the type system strictness, mocking libraries must be much more proactive rather than reactive. In JavaScript, you can fill in the details of how a dependency should be called and what it returns later, or omit them altogether if you want. With the popular mockall library for Rust, however, you must define strict call and response expectations up front. As the documentation describes it, "Any accesses contrary to your expectations will cause a panic."

Integration Testing

There are various competing and sometimes overlapping definitions for things like "integration testing", "functional testing", "end-to-end testing", etc. I view integration testing as the next step up from unit testing, where you get rid of the internal mocks and allow your application to work together as a whole. The external integrations - things like HTTP endpoints, databases, search indexes, or 3rd party APIs - are either mocked or replaced with test-only versions that work well within a Docker Compose formation. This is to allow you to run integration tests the same way locally and within your CI build pipeline, so that you don't need to depend on any persistent deployed resources to run your integration tests. End-to-end tests come along after that, exercising your deployed resources (usually within the context of a client application) using realistic access patterns.

Integration testing comes with some extra concerns in Rust as well. You'll need to pick an async/threading model that works well with the libraries and tools you're using. Since my GraphQL server is based on Tokio, I'll use that for my tests as well. You'll then need to make sure you account for the resulting quirks based on that model, like Tokio's event loop stopping and starting in between tests. For my API, this meant that I couldn't reuse my Tokio-based database connection between tests - I needed to re-initialize it each time.

Project Organization

Before I dive into mocking dependencies, I first want to take a moment to talk about structuring your project to make testing easier. Rust's module system is rather strict, and it can introduce some difficulty around things like circular dependencies if you break your project up into too many crates. This is a mistake I made in the initial project layout I presented in the previous articles in this series.

Domains & Unit Tests

I'm a big fan of co-locating related concerns rather than separating them out by functional layer. This means keeping the models, services, resolvers, tests, etc. for a particular business logic domain together in the same place instead of keeping all the models together, all the services together, and so on. Things like HTTP endpoints or serverless functions act as entry points that make use of those domains. I took advantage of Rust workspaces to group related domains together in a separate crate - keeping Users, Profiles, and Roles together, for example. Over time, this began to introduce cyclic dependencies between my internal crates.

To correct this problem, I merged all of my business logic crates together into a single crate called "domains", organized into folders using Rust submodules. Here's what the new structure looks like:

libs
├── ...
├── domains
│   ├── Cargo.toml
│   └── src
│       ├── episodes
│       │   ├── authorization.polar
│       │   ├── model.rs
│       │   ├── mutations.rs
│       │   ├── queries.rs
│       │   ├── resolver.rs
│       │   ├── service.rs
│       │   ├── tests
|       |   |   ├── resolver_test.rs
│       │   │   └── service_test.rs
│       │   └── tests.rs
│       ├── profiles
│       │   └── ..
│       ├── role_grants
│       │   └── ..
│       ├── shows
│       │   └── ..
│       ├── users
│       │   └── ..
│       ├── episodes.rs
│       ├── profiles.rs
│       ├── role_grants.rs
│       ├── shows.rs
│       ├── users.rs
│       └── lib.rs
└── ...

Each submodule consists of a module definition - like episodes.rs -  and a folder with the same name containing the nested modules - like episodes/. I've separated my unit tests into a additional nested module defined in episodes/tests.rs, with the actual test modules going in the episodes/tests/ folder. With this structure, entities that depend on each other can exist without causing cyclic dependency problems.

Applications & Integration Tests

The entry points I mentioned earlier, such as HTTP endpoints or serverless functions, are separated into the apps folder rather than being placed in the libs folder. This is because each one of them represents a separate, independent process that makes use of the crates in libs. The entry points are easy to find, and they often correlate directly to your deployable resources. Each application process will typically have its own compiled binary, and its own container or serverless function.

These are typically the best hubs to orient your integration tests around. Each entry point represents one or more specific usages of your libs within a particular context. Each test will consist of some setup, a call with specific inputs, and a response with specific outputs. Your mocked external integrations typically function in ways that are similar to the actual services, so you can validate functionality by checking your responses.

My folder structure for integration tests looks like this:

apps
└── api
    ├── Cargo.toml
    ├── ...
    ├── src
    │   ├── bin
    │   │   └── ...
    │   ├── ...
    │   ├── lib.rs
    │   └── main.rs
    └── tests
        ├── test_episode_integration.rs
        ├── test_profile_integration.rs
        ├── test_show_integration.rs
        ├── test_user_integration.rs
        └── test_utils.rs

The tests folder within a crate is recognized by Cargo, and each module within it is compiled as a separate crate with access only to the public exports of the api parent crate. By default these tests are executed alongside your unit tests, but since these are integration tests that depend on a running Docker Compose formation I add the #[ignore] tag to each test to skip them during unit test execution. In my Makefile.toml (which is used by cargo-make), I have a task that sets up the integration test context I need and then uses the --ignored flag when running the cargo test command. This executes just my ignored integration tests.

Test Data Factories

Creating fake data for testing is a task that is often done ad-hoc inside each test, or haphazardly in whatever way was convenient at the time they were written. This inconsistency can cause headaches down the line when you or other maintainers want to make changes.

To make it easier to stay consistent, I like to use the fake library in Rust. It allows you to derive a "Dummy" test factory with random properties, which you can then make use of in your unit and integration tests:

let mut show_input: CreateShowInput = Faker.fake();
show_input.title = "Test Show".to_string();

In unit tests, it's easy enough to modify these fakes after creation to create the conditions of the specific scenario you want to test. In integration tests, you'll often build up app-specific test utils for developer convenience that use these dummy data structures to create objects with consistent relationships in your data store.

If you view the apps/api/tests/test_utils.rs module inside the example Github repository, you'll see utility methods like TestUtils::create_user_and_profile() that use this strategy:

    /// Create a User and Profile together
    pub async fn create_user_and_profile(
        &self,
        username: &str,
        email: &str,
    ) -> Result<(User, Profile)> {
        let user = self.ctx.users.create(username).await?;

        let mut profile_input: CreateProfileInput = Faker.fake();
        profile_input.user_id = user.id.clone();
        profile_input.email = email.to_string();

        let profile = self.ctx.profiles.create(&profile_input, &false).await?;

        Ok((user, profile))
    }

This example uses the UsersService and ProfilesService instances that are initialized within the init() method at the top of the file and placed within the caster_api::Context instance at self.ctx.

Mocking Dependencies

To replace dependencies with mock objects, I use the mockall library. It can handle a wide variety of different scenarios, as outlined in the user guide within the documentation. How you use the library will vary somewhat depending on whether you're mocking a struct or a trait, whether you're using generic types or lifetimes, and whether it's internal to your codebase or in an external 3rd-party crate.

As mentioned above, mocking in a statically typed language like Rust (and in similar languages, like Go) is typically more proactive and strict than in dynamic languages like JavaScript. You'll need to provide expectations and return values for each call, and if the call never happens or the arguments don't match exactly your test will panic.

Internal Code

For things that you control within your codebase, it's generally easy to automatically generate mocks. In my example repo within the libs/domains crate, I define traits for each of my Services so that I can replace them with mocks inside my unit tests. For example, the EpisodesService trait in libs/domains/src/episodes/service.rs defines methods for all of the operations you can perform for the Episode entity. The DefaultEpisodesService implementation defines a db property to hold the database connection, and its methods use that connection to complete the requested operations.

To mock this behavior for unit testing, such as when you want to test the Resolvers that use each Service in isolation, I add the following tag to derive an automock:

/// An EpisodesService applies business logic to a dynamic EpisodesRepository implementation.
#[cfg_attr(test, automock)]
#[async_trait]
pub trait EpisodesService: Sync + Send {
    // ...
}

I then use this in the libs/domains/src/episodes/tests/resolver_test.rs module to define expectations for EpisodesService behavior when testing the Resolver:

    let mut service = MockEpisodesService::new();
    service
        .expect_get()
        .with(eq(episode_id), eq(&true))
        .times(1)
        .returning(move |_, _| Ok(Some(episode.clone())));

External Dependencies

External libraries require a little more effort to mock, because you need to manually specify the types for each method. For an example, take a look at the libs/testing/src/oso.rs module. Here I use the mock!() macro to define a mock Oso library, defining a type for the is_allowed() method.

This can then be used later just like the automock above:

    let mut oso = MockOso::new();
    oso
        .expect_is_allowed()
        .returning(
        |_: User, _: &str, _: Episode| -> Result<bool, oso::OsoError> {
                Ok(true)
            }
        );

The problem you'll face with this, however, is that the code that you are testing won't accept MockOso as Oso, and your test will fail to compile. To solve this, I use the #[cfg()] annotation to swap out MockOso for Oso only when being built for testing:

#[cfg(not(test))]
use oso::Oso;
#[cfg(test)]
use caster_testing::oso::MockOso as Oso;

Unit Tests

Now we can put everything together and write some unit tests! The goal of unit testing is to isolate individual units of functionality by mocking any outside dependencies. These outside dependencies could be external libraries, or they could be internal services or utilities. Rather than allowing code execution to flow through to the actual implementations of these dependencies, we want to provide pre-programmed "mocked" versions behave the way that we expect these dependencies to. This eliminates them as a possible cause for an exception or failure during a test. If the test fails, it must be because of something within the unit of functionality that you are testing.

These tests are valuable because it is much easier to arrange the exact initial conditions that you need to test a particular scenario. Integration tests are arguably more valuable than unit tests, but the harder it is to orchestrate all of the setup for each edge case to test and the harder it is write and maintain the tests themselves, the more fragile and less valuable it becomes. Unit tests allow you to much more easily test every permutation of your code's logic, covering many cases that would be quite difficult to reproduce in an integration test. In addition, they run faster and require no outside infrastructure to work, making them ideal for quick CI checks against every commit in a repository.

Service Tests

The demo API on Github uses "Services" to refer to the code that lives in between the entry points and the data access code. For the Caster application, the entry point is a GraphQL endpoint that supports a variety of operations via Resolvers. The data access code is handled by the SeaOrm library as an interface to a PostgreSQL database via Models. The Service code lives in between the Resolvers and the Models to manage business logic.

In my libs/domains/src/episodes/tests/service_test.rs module, each test typically begins by setting up a mocked database connection (conveniently provided by the SeaOrm library) and setting up some pre-programmed query results.

#[tokio::test]
async fn test_episodes_service_get() -> Result<()> {
    let mut show: Show = Faker.fake();
    show.title = "Test Show".to_string();

    let mut episode: Episode = Faker.fake();
    episode.title = "Test Episode".to_string();
    episode.show = None;

    let db = Arc::new(
        MockDatabase::new(DatabaseBackend::Postgres)
            .append_query_results(vec![vec![episode.clone()]])
            .into_connection(),
    );

    // ...

    Ok(())
}

Then, I initialize the DefaultEpisodesService with the mocked db connection, and call a Service method to perform an operation.

    let service = DefaultEpisodesService::new(&db);

    let result = service.get(&episode.id, &false).await?;

Now comes a tricky part - I need to drop() the Service instance so that references to my database connection (tracked by an Arc) are cleaned up. The reason for this is that I then call Arc::try_unwrap(db), a method which attempts to extract the connection from the Arc and return ownership to the test function. This, in turn, is so that I can then call db.into_transaction_log(), a method which consumes the internal reference to self and transforms it into a different type. If there are any references still tracked by the Arc, the connection cannot be removed from the Arc and the operation fails at runtime with a Result::Err.

    // Destroy the service to clean up the reference count
    drop(service);

    let db = Arc::try_unwrap(db).expect("Unable to unwrap the DatabaseConnection");

Now I can finally check my expectations! I use the trusty assert_eq!(), enhanced by the power of use pretty_assertions::assert_eq; for better comparisons.

    assert_eq!(result, Some(episode.clone()));

    // Check the transaction log
    assert_eq!(
        db.into_transaction_log(),
        vec![Transaction::from_sql_and_values(
            DatabaseBackend::Postgres,
            r#"SELECT "episodes"."id", "episodes"."created_at", "episodes"."updated_at", "episodes"."title", "episodes"."summary", "episodes"."picture", "episodes"."show_id" FROM "episodes" WHERE "episodes"."id" = $1 LIMIT $2"#,
            vec![episode.id.into(), 1u64.into()]
        )]
    );

This allows me to check the exact SQL queries generated by my operations without relying on a Postgres database connection somewhere.

Resolver Tests

I test my Async GraphQL resolvers by initializing a Schema with dummy requirements. In libs/domains/src/episodes/tests/resolver_test.rs, I first set up a GraphQL query definition:

/***
 * Query: `getEpisode`
 */

const GET_EPISODE: &str = "
    query GetEpisode($id: ID!) {
        getEpisode(id: $id) {
            id
            title
            summary
            picture
            show {
                id
            }
        }
    }
";

In each test, I use the automock-generated MockEpisodesService to set up a preprogrammed call expectation and response:

#[tokio::test]
async fn test_episodes_resolver_get_simple() -> Result<()> {
    let episode_id = "Test Episode";
    let episode_title = "Test Episode 1";

    let mut episode: Episode = Faker.fake();
    episode.id = episode_id.to_string();
    episode.title = episode_title.to_string();

    let mut service = MockEpisodesService::new();
    service
        .expect_get()
        .with(eq(episode_id), eq(&true))
        .times(1)
        .returning(move |_, _| Ok(Some(episode.clone())));
    
    // ...
    
    Ok(())
}

Then I initialize the Schema and execute the query, optionally providing variables:

    let schema = init(service);

    let result = schema
        .execute(
            Request::new(GET_EPISODE).variables(Variables::from_json(json!({ "id": episode_id }))),
        )
        .await;

Finally, I can parse the result and check expectations:

    let data = result.data.into_json()?;
    let json_episode = &data["getEpisode"];

    assert_eq!(json_episode["id"], episode_id);
    assert_eq!(json_episode["title"], episode_title);

Integration Tests

When you're ready to put things together and test your components in concert, you can go in several different directions. You may go straight to End-to-End testing along with a client application, using a testing tool designed for something like web or native mobile applications. You may write automated integration tests that aren't in the context of a particular application but exercise your API in a realistic way by targeting deployed resources in a dev or staging environment. Those tests can often be challenging to write and maintain, and fragile for a variety of different reasons, including outages for external integrations or 3rd party APIs and conflicts when two developers are trying to integration test different branches at the same time.

Using Docker Compose and tools like LocalStack and the official Postgres image, you can write integration tests that don't require deployed resources in order to function. Everything from database queries to SQS or Kinesis events to Lambda function invocations and beyond can be tested both on your local machine and within a CI pipeline context without needing to reach out to any deployed resources. This strategy gives you the independence to run many integration tests at the same time for as many different branches as you need, without collisions or lengthy queues.

GraphQL Queries & Mutations

I put integration tests targeting GraphQL calls via HTTP in the apps/api crate. They are specific to this entry point, and any other usages of these domains in a different context should be integration tested separately within their respective apps.

Similarly to my unit tests, I first define a GraphQL query document:

const CREATE_EPISODE: &str = "
    mutation CreateEpisode($input: CreateEpisodeInput!) {
        createEpisode(input: $input) {
            episode {
                id
                title
                summary
                picture
                show {
                    id
                }
            }
        }
    }
";

Then, I initialize my TestUtils, which is a module that lives alongside my integration tests for the api application:


/// Test creation of a new Episode
#[tokio::test]
#[ignore]
async fn test_create_episode() -> Result<()> {
    let utils = TestUtils::init().await?;
    let ctx = utils.ctx.clone();
    
    // ...

    Ok(())
}

Inside the apps/api/tests/test_utils.rs module I hold some very useful tools. The run_server() function uses tokio::span() to kick off the HTTP server process in the background:

pub async fn run_server(context: Arc<Context>) -> Result<SocketAddr> {
    let (addr, server) = run(context).await?;

    // Spawn the server in the background
    tokio::spawn(server);

    // Wait for it to initialize
    sleep(Duration::from_millis(200)).await;

    // Return the bound address
    Ok(addr)
}

Then, I provide an the async init() function that you see me use at the top of test_create_episode(). This sets up LocalStack-compatible versions of of each dependency, returning this struct as a result:

/// Common test utils
pub struct TestUtils {
    pub http_client: &'static Client<HttpsConnector<HttpConnector>>,
    pub oauth: &'static OAuth2Utils,
    pub graphql: GraphQL,
    pub ctx: Arc<Context>,
}

Next, I use the OAuth2 testing utils to generate a JWT token for my test user:

    let Credentials {
        access_token: token,
        username,
        ..
    } = utils.oauth.get_credentials(TestUser::Test).await;

Then I create a User, a Show, and grant some Roles to the User for the Show:

    // Create a user and a show
    let user = ctx.users.create(username).await?;

    let mut show_input: CreateShowInput = Faker.fake();
    show_input.title = "Test Show".to_string();

    let show = ctx.shows.create(&show_input).await?;

    // Grant the manager role to this user for this episode's show
    ctx.role_grants
        .create(&CreateRoleGrantInput {
            role_key: "manager".to_string(),
            user_id: user.id.clone(),
            resource_table: "shows".to_string(),
            resource_id: show.id.clone(),
        })
        .await?;

I use the GraphQL testing utils to send a GraphQL query to my server running in the background:

    let req = utils.graphql.query(
        CREATE_EPISODE,
        json!({
            "input": {
                "title": "Test Episode 1",
                "showId": show.id.clone(),
            }
        }),
        Some(token),
    )?;

    let resp = utils.http_client.request(req).await?;

And now I'm ready to check expectations!

    let status = resp.status();

    let body = to_bytes(resp.into_body()).await?;
    let json: Value = serde_json::from_slice(&body)?;

    let json_episode = &json["data"]["createEpisode"]["episode"];
    let json_show = &json_episode["show"];

    assert_eq!(status, 200);
    assert_eq!(json_episode["title"], "Test Episode 1");
    assert_eq!(json_show["id"], show.id.clone());

Now, before I get to cleaning up behind myself for this test, I take a moment to consider whether I need to catch failures. If a test is particularly error prone and you don't want it to skip the cleanup in the event of a failure, you can use panic::catch_unwind() to run cleanup regardless of whether your expectations pass or fail:

    let result = panic::catch_unwind(|| {
        block_on(async {
            let json: Value = serde_json::from_slice(&body)?;

            let json_episode = &json["data"]["createEpisode"]["episode"];
            let json_show = &json_episode["show"];

            assert_eq!(status, 200);
            assert_eq!(json_episode["title"], "Test Episode 1");
            assert_eq!(json_show["id"], show.id.clone());

            Ok(()) as Result<()>
        })
    });

Either way, you clean up by deleting the test instances you created so that you don't collide with other tests:

    // Clean up
    ctx.users.delete(&user.id).await?;
    ctx.episodes
        .delete(json_episode["id"].as_str().unwrap())
        .await?;
    ctx.shows.delete(&show.id).await?;

If you used panic::catch_unwind() above, you can then resume the panic after you're done with cleanup:

    if let Err(err) = result {
        panic::resume_unwind(err);
    }

Next Time

With this strategy, you can do a great deal of validation without ever needing to target deployed resources. These tests can run quickly without fear of colliding with other developers testing their own branches, and they can provide a great backbone for test-driven development. This allows you to focus on just the most critical paths for your more expensive and hard-to-maintain End-to-End tests.

Next time, I'll cover WebSockets and two-way event-driven interfaces - not just one-way Subscription functionality, though Async GraphQL provides that feature if you need it. With the open connections and push capabilities that WebSockets provide, you can power highly dynamic "realtime" event systems. After that, I'll finish up my series with a guide for deploying to an environment based on Docker containers, including a CI pipeline based on Github Actions.

I hope you're enjoying my series and you're excited to begin building high-performance GraphQL applications with the sound type safety and memory protection that Rust is famous for! You can find me at Formidable Labs, and if you're interested in potential contract work I'd love to set up a conversation with our team! You can also find me on Twitter @bkonkle, and on Discord at bkonkle#0217. I lurk around a few Rust and TypeScript Discord servers, and I'd love to hear from you! Thanks for reading!