-
William Bell authoredWilliam Bell authored
introduction-to-mongo.tex 15.59 KiB
\documentclass[11pt,a4paper]{article}
\usepackage{strath-assignment}
\usepackage{amsmath}
\usepackage{listings}
\usepackage[parfill]{parskip}
\usepackage{microtype}
\usepackage{setspace}
\usepackage{enumitem}
\usepackage{lipsum} % For demonstration only.
\usepackage{babel}
\usepackage{lastpage}
%--------------------------------
% Line separation.
%--------------------------------
% Adjust the list item separation.
\setlist[itemize]{itemsep=-0.2cm}
\setlist[enumerate]{itemsep=-0.2cm}
%----------------------------------------
% Header and footer.
%----------------------------------------
\newcommand{\footlabel}{Introduction to MongoDB}
\lfoot{{\bf \small \footlabel}}
\rfoot{{\bf \small Page \thepage\ of \pageref{LastPage}}}
\cfoot{}
% Needed for the first page, when \maketitle is used.
\fancypagestyle{plain}{
\renewcommand{\headrulewidth}{0pt}
\fancyhf{}
\rhead{\includegraphics[width=37mm]{images/strath_main.jpg}}
\fancyfoot[L]{\footnotesize {\bf \small \footlabel}}
\fancyfoot[R]{\footnotesize {\bf \small Page \thepage\ of \pageref{LastPage}}}
}
%----------------------------------------
% The title
%----------------------------------------
\begin{document}
\sloppy %% Prevent \texttt going beyond the page margin.
\setstretch{1.0}
\title{\footlabel}
\author{W. H. Bell}
\date{
\today
}
\maketitle
%%\begin{abstract}
%%\noindent dd
%%\end{abstract}
%%\tableofcontents
%%\clearpage
% The line spacing.
\setstretch{1.2}
\section{Introduction}
\href{https://www.mongodb.com/}{MongoDB} is a NoSQL database that allows documents to be stored in BSON (Binary JavaScript Object Notation) format. Documents can be inserted using JavaScript Object Notation (JSON). JavaScript can be used to define a schema and query documents that are present in MongoDB.
MongoDB provides interfaces for several programming languages, as well as a \texttt{mongo} shell. Details of the interfaces are given at \href{https://api.mongodb.com/}{https://api.mongodb.com/}. In this document, the Python programming interface is discussed. The Python programming interface is documented at \href{https://api.mongodb.com/python/current/tutorial.html}{https://api.mongodb.com/python/current/tutorial.html}.
\section{Typesetting and spaces}
A fixed width font is used for commands or source code. To aid the reader, a small bucket character is used to indicate spaces in commands or source code. This character should be replaced with a space when the command or source code is entered. An example of the bucket that is used to indicate spaces is demonstrated in Listing~\ref{listing:bucket-space}, where there is a single space between the \texttt{command} and \texttt{argument}.
\begin{lstlisting}[caption={Demonstrating the bucket character used to indicate spaces in source code and commands.},label=listing:bucket-space,numbers=none,showspaces=true]]
command argument
\end{lstlisting}
\section{Assumptions}
It is assumed that these exercises are being run on a Linux PC in the Computer and Information Sciences labs. However, these exercises can be run on another Linux installation, provided the MongoDB python client \texttt{pymongo} has been installed an a MongoDB server is accessible. They can also be run on Windows or OSX, where the commands to change directories have to be updated as needed.
This document includes Bash Linux commands to clone a software repository and change directory. A separate sheet of Bash Linux commands is provided to Computer and Information Sciences students. It is assumed that the reader is either familiar with these commands or has access to the reference document.
The commands that are given in this document have been tested with MongoDB server version 5.0. They may work with other recent versions of MongoDB.
\section{Server connection details \label{section:connection-details}}
These exercises can be run using a MongoDB server on the local PC or using a remote MongoDB server. If a remote MongoDB server is used, then the connection details must be provided by setting environment variables as demonstrated in Listing~\ref{listing:envs}, where \texttt{server\_name}, \texttt{username}, \texttt{password} and \texttt{db\_name} should be replaced by the corresponding connection values. These variables must be set in the terminal window where the Python example programs are run. If the environment variables are not set, the default values from Table~\ref{table:env-defaults} are used.
\begin{lstlisting}[caption={Setting connection Bash environment variables.},label=listing:envs,numbers=none,language=Bash,showspaces=true]
export MONGODB_SERVER=server_name
export MONGODB_USERNAME=username
export MONGODB_PASSWD=password
export MONGODB_DB=db_name
\end{lstlisting}
\begin{table}[h!!]
\begin{center}
\caption{Default settings for environment variables.}
\label{table:env-defaults}
\begin{tabular}{l|l} \hline
\textbf{Variable} & \textbf{Default/Action}\\
\hline
\texttt{MONGODB\_SERVER} & \texttt{localhost} \\
\texttt{MONGODB\_USERNAME} & No authentication. \\
\texttt{MONGODB\_PASSWD} & No authentication. \\
\texttt{MONGODB\_DB} & \texttt{test} \\ \hline
\end{tabular}
\end{center}
\end{table}
The environment variables are read by the functions that are defined in \texttt{mongo\_connect.py}. The functions in \texttt{mongo\_connect.py} are used to connect to the MongoDB database server and select a database.
\clearpage
\section{Exercises}
\begin{enumerate}
\item Open a terminal window.
\item Clone the exercises repository by typing the command that is given in Listing~\ref{listing:git-clone} on one line.
\begin{lstlisting}[caption={Cloning the MongoDB exercises.},label=listing:git-clone,numbers=none,language=Bash,showspaces=true]
git clone https://gitlab.cis.strath.ac.uk/gxb20157/introduction-to-mongodb.git
\end{lstlisting}
\item Change directory to the \texttt{python} folder by typing the command given in Listing~\ref{listing:cd-python}.
\begin{lstlisting}[caption={Changing directory to the \texttt{python} directory.},label=listing:cd-python,numbers=none,language=Bash,showspaces=true]
cd introduction-to-mongodb/python
\end{lstlisting}
\item Set the environment variables as discussed in Section~\ref{section:connection-details}.
\item Test the connection to the MongoDB server by running the command given in Listing~\ref{listing:test-connection}. If the client successfully connects to the MongoDB server, the program will print ``Successfully connected to MongoDB server.''
\begin{lstlisting}[caption={Testing the connection to the MongoDB server.},label=listing:test-connection,numbers=none,language=Bash,showspaces=true]
./test_connection.py
\end{lstlisting}
\item List the databases that are available by typing the command given in Listing~\ref{listing:list-databases}.
\begin{lstlisting}[caption={Listing available MongoDB databases.},label=listing:list-databases,numbers=none,language=Bash,showspaces=true]
./list_databases.py
\end{lstlisting}
\item Create a database by typing the command that is given in Listing~\ref{listing:create-database-run}.
%
\begin{lstlisting}[caption={Running a Python program to create a database.},label=listing:create-database-run,numbers=none,language=Bash,showspaces=true]
./create_database.py
\end{lstlisting}
%
The contents of \texttt{create\_database.py} are given in Listing~\ref{listing:create_database.py}. This Python program opens a client connection to the MongoDB server. It creates a database using the value in the \texttt{MONGODB\_DB} environment variable or the default name \texttt{'test'} if the environment variable has not been set. Line 8 either creates the database or forms a connection to it, if it already exists. A collection is created in a similar manner as a database. A document is defined as a Python dictionary at Line 14 and 15. Finally, the Python dictionary is passed to the function \texttt{insert\_one} to insert it into the database.
If a collection is created, but it does not contain at least one document it will not be saved to the MongoDB server. Likewise, if a database does not contain at least one collection, it will not be saved to the MongoDB server.
\clearpage
\lstinputlisting[caption={The file create\_database.py.},label=listing:create_database.py,language=python,showspaces=true]{"../python/create_database.py"}
\item At the Bash prompt, execute the script \texttt{read\_database.py} in a similar manner as Listing~\ref{listing:create-database-run}.
The contents of \texttt{read\_database.py} are given in Listing~\ref{listing:read_database.py}. This Python program opens a client connection to the MongoDB server and gets a connection to the database. It loops over the collection names in the database, printing each collection name. If the \texttt{'Customers'} collection does not exist, an error message is printed and the program stops. If the ``\texttt{Customers}'' collection does exist, then each document from the collection is printed. The \texttt{find} function is used at Line 23 without arguments, selecting all documents and fields within the collection.
\clearpage
\lstinputlisting[caption={The file read\_database.py.},label=listing:read_database.py,language=python,showspaces=true]{"../python/read_database.py"}
\item Add another document to MongoDB. First, try to add a document to the \texttt{'Customers'} collection. Then try to add another collection to the database.
\item Run the \texttt{drop\_database.py} program, in a similar manner as Listing~\ref{listing:create-database-run}.
\item Run the \texttt{planets\_database.py} program, in a similar manner as Listing~\ref{listing:create-database-run}.
The contents of \texttt{planets\_database.py} are given in Listing~\ref{listing:planets_database.py}. This program creates a database and a \texttt{'OrbitData'} collection. It reads JSON data from a text file named\texttt{'planets.json'} and inserts these data into MongoDB.
The first section of the \texttt{'planets.json'} file is given in Listing~\ref{listing:planets.json}. This file contains a list, where each element is a dictionary. The Python program reads these data into a list that contains dictionaries as its elements. The \texttt{insert\_many} function is called at Line 20 to insert these data into the MongoDB database.
\clearpage
\lstinputlisting[caption={The file planets\_database.py.},label=listing:planets_database.py,language=python]{"../python/planets_database.py"}
\lstinputlisting[caption={The first 24 lines of the file planets.json.},label=listing:planets.json,language=java,lastline=24]{"../python/planets.json"}
\item Run the \texttt{select\_planet.py} program, in a similar manner as Listing~\ref{listing:create-database-run}.
The contents of \texttt{select\_planet.py} are given in Listing~\ref{listing:select_planet.py}. This Python program prints the names of the collections and then checks if the \texttt{'orbit\_data'} collection is present. If the \texttt{'OrbitData'} collection is present, it prints the names of each of the planets and the complete document for Pluto.
The \texttt{find} function at Line 26 includes two arguments, which are each given within \texttt{\{\}} parentheses. The first pair of \texttt{\{\}} parentheses is empty. This is the query condition, which is optional. Since the condition is empty, all documents are considered. The second argument is \texttt{\{'name':1, '\_id':0 \}}. The second argument is a list of document fields that should be enabled or disabled. The format is the field name, a colon and an integer number, where 0 disables the field and 1 enables the field. In general, the \texttt{find} command allows up to three input arguments, as discussed in the \href{https://www.mongodb.com/docs/manual/reference/method/db.collection.find/#mongodb-method-db.collection.find}{MongoDB documenation}.
The \texttt{find\_one} function at Line 32 includes a condition that the field \texttt{'name'} must be equal to \texttt{'Pluto'}. Since the list of fields to be considered is omitted, all fields are returned.
\lstinputlisting[caption={The file select\_planet.py.},label=listing:select_planet.py,language=python]{"../python/select_planet.py"}
\item Run the \texttt{planets\_with\_moons.py} program, in a similar manner as Listing~\ref{listing:create-database-run}.
The contents of \texttt{planets\_with\_moons.py} are given in Listing~\ref{listing:planets_with_moons.py}. This program reads data from the \texttt{'OrbitData'} collection. The \texttt{find} function at Line 25 is used to list the names of the planets and their number of moons. The \texttt{find} function at Line 32 contains two arguments. The text \texttt{\$gt} implies greater than. Therefore, the condition is that the selected documents must have more than zero moons. The second argument is used to print the planet name and number of moons. Finally, the \texttt{sort} function is used to sort the documents by the number of moons.
\lstinputlisting[caption={The file planets\_with\_moons.py.},label=listing:planets_with_moons.py,language=python]{"../python/planets_with_moons.py"}
The comparison query operators are provided in Table~\ref{table:comp-operators} for reference. These operators should be used within double or single quotes when implemented in Python with \texttt{pymongo}.
\begin{table}[h!!]
\begin{center}
\caption{Comparison query operators.}
\label{table:comp-operators}
\begin{tabular}{l|l} \hline
\textbf{Operator} & \textbf{Name}\\
\hline
\texttt{\$eq} & Matches values that are equal to a specified value\\
\texttt{\$gt} & Matches values that are greater than a specified value\\
\texttt{\$gte} & Matches values that are greater than or equal to a specified value\\
\texttt{\$in} & Matches any of the values specified in an array\\
\texttt{\$lt} & Matches values that are less than a specified value\\
\texttt{\$lte} & Matches values that are less than or equal to a specified value\\
\texttt{\$ne} & Matches all values that are not equal to a specified value\\
\texttt{\$nin} & Matches none of the values specified in an array\\ \hline
\end{tabular}
\end{center}
\end{table}
\clearpage
\item Select planets by another document field.
\item Order planets by their mass.
\item Run the \texttt{drop\_database.py} program, in a similar manner as Listing~\ref{listing:create-database-run}.
\item Run the \texttt{index\_database.py} program, in a similar manner as Listing~\ref{listing:create-database-run}.
The contents of \texttt{index\_database.py} are given in Listing~\ref{listing:index_database.py}. This program creates a database and a \texttt{'Customers'} collection. Before any documents are added, an index is defined at Line~15. This index requires that \texttt{CustomerName} entries are unique within the collection. A customer is defined at Line~18 and inserted at Line~23. If the insert operation fails, the program prints the warning message at Line~26.
\lstinputlisting[caption={The file index\_database.py.},label=listing:index_database.py,language=python]{"../python/index_database.py"}
\item Run the \texttt{index\_database.py} program, in a similar manner as Listing~\ref{listing:create-database-run}. What happened the second time that the program was run?
\item Run the script \texttt{drop\_database.py} and then \texttt{index\_database.py} in a similar manner as Listing~\ref{listing:create-database-run}. What happens this time?
\item Run the \texttt{drop\_database.py} program, in a similar manner as Listing~\ref{listing:create-database-run}.
\end{enumerate}
The MongoDB condition and selection of document data is similar to the functionality that is defined with SQL \texttt{where} and \texttt{select} clauses and a relational database. MongoDB will also allow a schema to be defined for document validation.
\end{document}