Setting Up a Vector Database for a New Project Without Getting Lost in Account Screens

Starting a new project with a vector database should be simple. In practice, it rarely is. You click through a signup form, verify an email, land on a dashboard, then spend the next twenty minutes hunting for the API key, the right environment name, the console URL, and whatever the platform calls its main workspace. By the time you have written a single line of code, you have already made four navigational decisions that had nothing to do with your actual project.

This article is about cutting through that. It explains how a vector database project gets set up, what the common friction points are, and how to start building immediately without getting tangled in account infrastructure — using Weaviate running locally as the concrete example.

Why Setup Feels Complicated

Most vector database platforms are built for production teams. Their dashboards reflect that. You see organization settings, billing panels, region selectors, project namespaces, and API credential screens before you have done anything useful. That is not bad design for an enterprise user who needs those controls. But for someone who just wants to build something and understand how vector search actually works, the layered account structure creates real friction.

The other problem is terminology. Platforms use terms like console, workspace, app, dashboard, and portal interchangeably. One platform’s console is another’s app. One’s workspace is another’s project. When you search for help and the answer refers to “the console” but your screen says “app.io,” it is easy to assume you are in the wrong place.

The path out of this is not learning the specific vocabulary of whichever platform you picked. It is starting somewhere that does not require any of it.

The Simplest Possible Starting Point

When you are beginning a new vector database project, you do not need a cloud account. You need three things:

A running vector database instance you can send requests to
A client library that can talk to it
A collection (or index) to store your data in

Everything else — organizations, API keys, billing, region selection — is real infrastructure that matters at scale, but has no bearing on whether your project works at the prototype stage. If you start locally, you skip all of that and get directly to the part where you learn something.

Running locally also means you can move as fast as your code allows. No rate limits on a free tier, no surprises about what features are gated behind a paid plan, and no debugging sessions that turn out to be credential configuration problems rather than actual code bugs.

What Happens During a Typical Cloud Setup

Even if you plan to move to a managed cloud service eventually, understanding the typical account flow helps you recognize which steps actually matter.

Most platforms follow a pattern like this: create an account, verify your email, land on a dashboard, create a project or workspace, then create an index or collection inside that workspace. The API key is usually tied to the project level, sometimes to a specific environment within the project, and occasionally to both. If you use the wrong key for the wrong environment, you get an authentication error that looks identical to a wrong password error.

The console or app is usually a browser-based interface that lets you inspect your indexes, run queries, and manage settings. It is useful once you have real data in it. Before that, it is mostly another screen to navigate.

None of this is hard, but it is all friction that competes with the actual work of building something.

Starting Locally with Weaviate

Weaviate runs entirely in Docker. You do not need an account, an API key, or a cloud workspace. You run two containers — one for Weaviate itself, one for Ollama which handles the embedding model locally — and you have a fully functional vector database accessible at http://localhost:8080.

Here is the docker-compose.yml that gets everything running:

services:
  weaviate:
    command:
    - --host
    - 0.0.0.0
    - --port
    - '8080'
    - --scheme
    - http
    image: cr.weaviate.io/semitechnologies/weaviate:1.37.2
    ports:
    - 8080:8080
    - 50051:50051
    volumes:
    - weaviate_data:/var/lib/weaviate
    restart: on-failure:0
    environment:
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      ENABLE_MODULES: 'text2vec-ollama,generative-ollama'
      CLUSTER_HOSTNAME: 'node1'
      OLLAMA_API_ENDPOINT: 'http://ollama:11434'
    depends_on:
      - ollama

  ollama:
    image: ollama/ollama:0.12.9
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

volumes:
  weaviate_data:
  ollama_data:

Save that file in your project directory and run:

docker-compose up -d

The -d flag runs the containers in the background. Once they start, Weaviate is live at http://localhost:8080 and Ollama is available at http://localhost:11434.

Two things to notice in the configuration: AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' means the local instance accepts connections without any credentials. There is no login screen, no API key to manage. You connect and start working. The second thing is ENABLE_MODULES: 'text2vec-ollama' — this tells Weaviate to use the Ollama container to generate embeddings, which means your text gets converted to vectors locally too, without any external API calls.

Connecting from Python

Install the Weaviate Python client:

pip install weaviate-client

Then connect:

import weaviate

client = weaviate.connect_to_local()

That is the entire connection step. No environment variables, no key lookup, no project ID. connect_to_local() points to http://localhost:8080 and opens a connection.

For any real code, use a context manager instead of calling connect_to_local() directly. It handles closing the connection automatically when you are done:

import weaviate

with weaviate.connect_to_local() as client:
    print(client.is_ready())  # Should print True

If is_ready() returns True, your Weaviate instance is running and accepting connections. You are past the setup phase.

Creating a Collection

In Weaviate, data lives in collections. A collection is roughly equivalent to a table in a relational database, or an index in Pinecone or Elasticsearch. It has a name, a schema that defines the properties your objects will carry, and a vector configuration that tells Weaviate how to generate or receive embeddings.

Here is how to create a collection for a movie dataset:

import weaviate
from weaviate.classes.config import Configure

with weaviate.connect_to_local() as client:
    movies = client.collections.create(
        name="Movie",
        vector_config=Configure.Vectors.text2vec_ollama(
            api_endpoint="http://ollama:11434",
            model="nomic-embed-text",
        ),
    )
    print("Collection created:", movies.name)

The vector_config line is where the embedding model gets wired in. text2vec-ollama tells Weaviate to call Ollama when it needs to vectorize text, using the nomic-embed-text model. This means you do not have to generate embeddings yourself — Weaviate handles that automatically when you insert objects or run queries.

If you run this twice, you will get an error because the collection already exists. To check whether a collection exists before creating it:

with weaviate.connect_to_local() as client:
    if not client.collections.exists("Movie"):
        client.collections.create(
            name="Movie",
            vector_config=Configure.Vectors.text2vec_ollama(
                api_endpoint="http://ollama:11434",
                model="nomic-embed-text",
            ),
        )

Importing Data

Once the collection exists, you can start inserting objects. Weaviate’s batch API is designed for inserting many objects efficiently. The fixed_size batch mode collects objects until the batch reaches the specified size, then flushes them to the database:

import weaviate
from weaviate.classes.config import Configure

data_objects = [
    {
        "title": "The Matrix",
        "description": "A computer programmer discovers that reality is a simulation and joins a rebellion against the machines running it.",
        "genre": "Science Fiction",
    },
    {
        "title": "Spirited Away",
        "description": "A young girl gets trapped in a spirit world and must work at a bathhouse to find a way back to her parents.",
        "genre": "Animation",
    },
    {
        "title": "Parasite",
        "description": "A poor family schemes to become employed by a wealthy household, setting off a series of unexpected events.",
        "genre": "Thriller",
    },
]

with weaviate.connect_to_local() as client:
    movies = client.collections.get("Movie")

    with movies.batch.fixed_size(batch_size=200) as batch:
        for obj in data_objects:
            batch.add_object(properties=obj)

    print(f"Objects in collection: {movies.aggregate.over_all().total_count}")

When this runs, Weaviate calls Ollama in the background to generate a vector for each object based on its text properties. You do not pass vectors manually — the text2vec-ollama module takes care of it. The batch context manager (with movies.batch.fixed_size) automatically flushes remaining objects when the with block exits, so you do not need to call any flush method yourself.

Running a Query

With data in the collection, you can run a semantic search. Vector search takes a text query, converts it to a vector using the same embedding model, and finds the objects whose vectors are closest:

with weaviate.connect_to_local() as client:
    movies = client.collections.get("Movie")

    results = movies.query.near_text(
        query="animation about a child in a strange world",
        limit=2,
    )

    for obj in results.objects:
        print(obj.properties["title"], "—", obj.properties["genre"])

Running this against the three objects above should surface “Spirited Away” near the top, even though the query does not mention any of its exact words. That is the core behavior you are building toward: finding semantically relevant matches without requiring exact string overlap.

Where to Go from Here

Once you have this running locally, the path forward depends on what you are building. If the project is a prototype or a tool for personal use, local Docker may be all you ever need. If it is headed toward production, you will eventually want a managed service with persistence guarantees, monitoring, and support — and at that point, the account setup process that seemed like friction earlier becomes the appropriate infrastructure for what you are doing.

The value of starting locally is not that it avoids cloud setups forever. It is that it separates learning the tool from learning the platform. You understand how collections work, how embedding models connect, how queries behave — before you have to make any decisions about regions, replicas, or pricing tiers. That ordering matters. When you eventually do create a cloud account, you know what you are configuring and why.