Create Massive Amounts of Fake Data Using GraphQL Schemas

Have you ever found yourself in need of fake user profiles for testing your app? Perhaps you’re racing against the clock in a hackathon, striving to develop a proof of concept without the necessary data for a demo. Enter gqlfake.

gqlfake is your command-line companion, simplifying the creation of structured, synthetic data using GraphQL schemas to define fields and data types.

Installation

To install and use gqlfake, you must have Node.js installed.

We can install gqlfake with npm:

npm install gqlfake --location=global

This command will globally install gqlfake so you can access the CLI tool in the terminal from any path.

Now that we have gqlfake installed, let’s take a look at a quick example on how to use it.

Generating Shaped Fake Data

Say we have a GraphQL schema file titled schema.graphql with the following content:

type User {
  name: String
  avatar_url: String
}

The above schema defines a User type with specific attributes. Now, let’s see what we’d have to do if we wanted to generate 100 such fake User objects.

Adding Directives to the Schema

To let gqlfake know what kind of data to generate for each field, we’ll have to use directives and a little FakerJS magic.

Edit schema.graphql to contain the following:

type User {
  name: String @generate(code: "return faker.person.fullName()")
  avatar_url: String @generate(code: "return faker.internet.avatar()")
}

Here, we attach the @generate directive next to both fields and pass in the code argument. The code argument is a string containing a call to any valid javascript. It lets gqlfake know what kind of data to populate each field with (for example, you wouldn’t want the name field to be populated by an email, so we explicitly use the code argument to specify what genre of data a specific field needs to have).

Now that we’ve setup our schema file correctly, we can use gqlfake to generate fake but realistic data. The gqlfake generate command allows us to do this:

gqlfake generate --schema-path ./schema.graphql --num-documents 100

This creates a JSON file containing 100 User objects. This file will be stored in a newly created directory titled datagen (if you run this command twice, the datagen directory won’t be deleted, but the JSON file will be overwritten with new fake data).

INFO:

We can shorten --schema-path to -s and --num-documents to -n when passing in command line arguments to gqlfake.

Here is an example of what this file will look like:

[
  {
    "name": "Beverly Block",
    "avatar_url": "https://cloudflare-ipfs.com/ipfs/Qmd3W5DuhgHirLHGVixi6V76LhCkZUz6pnFt5AJBiyvHye/avatar/624.jpg"
  },
  {
    "name": "Wilson Zulauf",
    "avatar_url": "https://cloudflare-ipfs.com/ipfs/Qmd3W5DuhgHirLHGVixi6V76LhCkZUz6pnFt5AJBiyvHye/avatar/684.jpg"
  },
  {
    "name": "Kirk Kris",
    "avatar_url": "https://cloudflare-ipfs.com/ipfs/Qmd3W5DuhgHirLHGVixi6V76LhCkZUz6pnFt5AJBiyvHye/avatar/866.jpg"
  },
  ...97 more
]

You can also define and share code snippets across multiple fields and types. Let’s take a look at an example with our User type.

Modify your schema.graphql file to contain the following:

type User {
  firstName: String 
    @generate(
      code: """
      firstNameOfUser = faker.person.firstName()
      return firstNameOfUser
      """
    )
  lastName: String 
    @generate(
      code: """
      lastNameOfUser = faker.person.lastName()
      return lastNameOfUser
      """
    )
  emailID: String
    @generate(
      code: """
      return faker.internet.email({
        firstName: firstNameOfUser,
        lastName: lastNameOfUser
      })
      """
    )
}

In the above example, we generate a fake firstName and a lastName for each User. We also store this data in variables called firstNameOfUser and lastNameOfUser. This allows us to use the generated first and last names in the emailID where we generate a realistic email using the two pieces of data.

CAUTION:

If you want to use variables across multiple fields and types, DO NOT use variable keywords const, let, or var when defining variables. If you use these variable keywords, the variables you define will only be usable within that specific code snippet.

Compiling Generated Data Into One File

There may be cases where you want to compile all the generated data into one file. This is especially useful if you want to, for example, start up a fake API with tools such as json-server. We can compile data generated by the gqlfake generate command using the gqlfake compile command.

Let’s say our schema.graphql contains the following:

type Book {
  id: ID!
  title: String!
  authorID: ID!
  publicationYear: Int!
}

type Movie {
  id: ID!
  title: String!
  directorID: ID!
  releaseYear: Int!
  genre: String!
}

We can now run the gqlfake generate command with the following options:

gqlfake generate --schema-path ./schema.graphql --num-documents 2

We get two resulting JSON files, both in the datagen directory:

Book.json:

[
  {
    "id": "0bbf3f82-794e-4f05-bf30-9f269683c5a1",
    "title": "odit sint veniam",
    "authorID": "085d8b32-b0f9-4883-aef7-f16233c6a235",
    "publicationYear": 1987
  },
  {
    "id": "b44d9e5e-17e4-43e6-ad66-8d4ad9b305aa",
    "title": "perspiciatis magnam ea",
    "authorID": "9b7a9732-6a42-45ef-9948-51632fafeb7a",
    "publicationYear": 1987
  }
]

Movie.json:

[
  {
    "id": "640a0d20-3e37-477e-88f9-0ecbd22b8176",
    "title": "repudiandae id eius",
    "directorID": "3b67a142-29bb-4a5a-bdab-bbb6bae7d118",
    "releaseYear": 1987,
    "genre": "exercitationem repellat"
  },
  {
    "id": "c6711541-360f-470d-8b89-60f5a7ad8700",
    "title": "repudiandae placeat voluptates",
    "directorID": "f8643b5e-c464-4ebc-9848-29661418aa77",
    "releaseYear": 1987,
    "genre": "quibusdam accusamus"
  }
]

Let us now compile all the generated data into one file with gqlfake compile. Run the following command:

gqlfake compile --output-path ./data.json

Executing this command creates a data.json file that contains the following:

{
  "books": [
    {
      "id": "0bbf3f82-794e-4f05-bf30-9f269683c5a1",
      "title": "odit sint veniam",
      "authorID": "085d8b32-b0f9-4883-aef7-f16233c6a235",
      "publicationYear": 1987
    },
    {
      "id": "b44d9e5e-17e4-43e6-ad66-8d4ad9b305aa",
      "title": "perspiciatis magnam ea",
      "authorID": "9b7a9732-6a42-45ef-9948-51632fafeb7a",
      "publicationYear": 1987
    }
  ],
  "movies": [
    {
      "id": "640a0d20-3e37-477e-88f9-0ecbd22b8176",
      "title": "repudiandae id eius",
      "directorID": "3b67a142-29bb-4a5a-bdab-bbb6bae7d118",
      "releaseYear": 1987,
      "genre": "exercitationem repellat"
    },
    {
      "id": "c6711541-360f-470d-8b89-60f5a7ad8700",
      "title": "repudiandae placeat voluptates",
      "directorID": "f8643b5e-c464-4ebc-9848-29661418aa77",
      "releaseYear": 1987,
      "genre": "quibusdam accusamus"
    }
  ]
}

We can now use json-server to serve up a mock API using data.json.

Install json-server with:

npm install json-server --location=global

We can start the server with:

json-server ./data.json --watch

This mock API can now be used by, for example, your frontend code to display the generated data.

Executing Initial Code per Type

There may be cases where you want to execute some initial code in each type before the data for each field is generated.

Here’s an example where we want the id of each User object to be incremented every time one is generated.

schema.graphql:

type User 
  @init(
    code: """
    count = 0
    """
  ) {
  id: Int
    @generate(
      code: """
      count += 1
      return count
      """
    )
  fullName: String @generate(code: "return faker.person.fullName()")
}

The above example uses the init directive on the User type to initialize the variable count to 0. (Notice how we don’t use any variable initialization keywords like const, let, or var because we want the count variable to be accessible in the different fields).

After this, every time an id is generated, we run code to increment the count by 1, and return its value.

When gqlfake generate is run with the appropriate options, the below JSON file is generated:

[
  {
    "id": 1,
    "fullName": "Elizabeth Ankunding"
  },
  {
    "id": 2,
    "fullName": "Ashley Stehr"
  },
  {
    "id": 3,
    "fullName": "Rosalie Kessler"
  }
  ...
]

As you can see, the id field is incremented on the generation of each User object.

Using External Dependencies and Libraries

You may also want to use external dependencies in the code you write within @generate directives. In that case, you can point gqlfake to a Javascript file which exports your dependencies.

To do this, we first need to create our Javascript file which will import our necessary dependencies and export them at the bottom of the file.

myDependencies.js:

const axios = require('axios')

// Export the required dependencies
module.exports = {
  axios: axios
}

Now that we’ve exported our dependencies, we can use them in our GraphQL schema’s @generate directives:

type User {
  fullName: String @generate(code: "return faker.person.fullName()")
  favoriteQuote: String
    @generate(
      code: """
      const response = await axios.get('https://api.quotable.io/random')
      return response.data.content
      """
    )
}

In the above schema, we use axios to get a random quote and set it as the favoriteQuote for a User.

gqlfake supports allows you to use top-level await, so you don’t have to create an async function to use the await keyword.

To generate our data, we use the gqlfake generate command with the --dependency-script option:

gqlfake generate -s ./schema.graphql -n 3 --dependency-script ./myDependencies.js

gqlfake imports the dependencies exported by the file pointed to by --dependency-script so they can be used in your GraphQL schema.

The above command generates a file called User.json with the following content:

[
  {
    "fullName": "Jimmie Gleason",
    "favoriteQuote": "The highest stage in moral culture at which we can arrive is when we recognize that we ought to control our thoughts."
  },
  {
    "fullName": "Jody Rogahn II",
    "favoriteQuote": "If you accept the expectations of others, especially negative ones, then you never will change the outcome."
  },
  {
    "fullName": "Dr. Jody Thompson",
    "favoriteQuote": "I love wisdom. And you can never be great at anything unless you love it. Not be in love with it, but love the thing, admire the thing. And it seems that if you love the thing, and you don't just want to possess it, it will find you."
  }
]

The favoriteQuote property for each User object was fetched using axios from api.quotable.io.

Exporting Data to Cloud Databases

Exporting data generated by gqlfake to cloud databases can be useful if you are trying to mock features that require web services (CRON jobs, Cloud Functions, etc.).

Supported Cloud Databases

Google Cloud Firestore

Google Cloud Firestore

Exporting to Cloud Firestore is as easy as running a single command:

gqlfake export-firestore --keypath ./serviceAccountKey.json

INFO:

You must have already created a Google or Firebase project with Cloud Firestore enabled prior to running this command.

Note that --keypath is a required option that can be abbreviated to -k. It points gqlfake to your service account key file. A service account key is a form of credential that allows access to your web resources from any environment. Google allows you to generate such keys for your Google Cloud Projects. Learn more about service account keys and how to generate them here.

Conclusion

gqlfake is a powerful tool for generating massive amounts of fake data using GraphQL schemas. Whether you’re in a hackathon crunch or need synthetic data for testing and development, gqlfake simplifies the process. By combining GraphQL schemas, directives, and the flexibility of JavaScript code snippets, you can generate structured and realistic data.

Create Massive Amounts of Fake Data Using GraphQL Schemas

Installation

Generating Shaped Fake Data

Adding Directives to the Schema

Sharing Variables

Compiling Generated Data Into One File

Executing Initial Code per Type

Using External Dependencies and Libraries

Exporting Data to Cloud Databases

Supported Cloud Databases

Google Cloud Firestore

Conclusion