by tirthajyoti


Simple SQLite demo: Reading a text file, inserting email id's into the Database, and counting average emails received from a particular domain

Dr. Tirthajyoti Sarkar, Fremont, CA 94536

Import SQLite library of Python (it is built-in)

  • Create a connection
  • Create a cursor object
import sqlite3
conn = sqlite3.connect('emaildb.sqlite')

Run SQL commands through the cursor

  • Drop the previous Table if it exists
  • Create a Table with email and counts as attributes
cur.execute('DROP TABLE IF EXISTS Counts')
CREATE TABLE Counts (email TEXT, count INTEGER)''')
<sqlite3.Cursor at 0x7fcb6c0eea40>

Use urllib method to establish a connection with a remote server to read the inbox data

import urllib.request, urllib.parse, urllib.error

print("Opening the file connection...")
# Following example reads Project Gutenberg EBook of Pride and Prejudice
fhand = urllib.request.urlopen('')
Opening the file connection...

Read the text data from the remote server to create a local text file

txt_dump = ''
# Iterate over the lines in the file handler object and dump the data into the text string. 
# Also increment line and word counts
for line in fhand:
# Use decode method to convert the UTF-8 to Unicode string
    # Count the length of words in the line and add to the running count
    word_count+=len(line.decode().split(' '))

Prints basic informationn about the text data

print("Printing some info on the text dump\n"+"-"*60)
print("Total characters:",len(txt_dump))
print("Total words:",word_count)
print(f"Total lines: {line_count}")
Printing some info on the text dump
Total characters: 6687002
Total words: 630427
Total lines: 132045

Open a local file handler with the text file

file = open('mbox.txt','w') 

Show first few lines of the text data
From Sat Jan  5 09:14:16 2008
Return-Path: <>
Received: from murder ( [])
	 by (Cyrus v2.3.8) with LMTPA;
	 Sat, 05 Jan 2008 09:14:16 -0500
X-Sieve: CMU Sieve 2.3
Received: from murder ([unix socket])
	 by (Cyrus v2.2.12) with LMTPA;
	 Sat, 05 Jan 2008 09:14:16 -0500
Received: from ( [])
	by () with ESMTP id m05EEFR1013674;
	Sat, 5 Jan 2008 09:14:15 -0500
Received: FROM ( [])
	BY ID 477F90B0.2DB2F.12494 ; 
	 5 Jan 2008 09:14:10 -0500
Received: from (localhost [])
	by (Postfix) with ESMTP id 5F919BC2F2;
	Sat,  5 Jan 2008 14:10:05 +0000 (GMT)
Message-ID: <>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit

Read the text file line by line to extract the email address and INSERT INTO/UPDATE the SQL Table

for line in fh:
    if not line.startswith('From: '): continue
    pieces = line.split()
    email = pieces[1]
    cur.execute('SELECT count FROM Counts WHERE email = ? ', (email,))
    row = cur.fetchone()
    if row is None:
        cur.execute('''INSERT INTO Counts (email, count)
                VALUES (?, 1)''', (email,))
        cur.execute('UPDATE Counts SET count = count + 1 WHERE email = ?',

Execute SELECT-FROM command to read email count from the database and ORDER BY the count

sqlstr = 'SELECT email,count FROM Counts ORDER BY count DESC LIMIT 20'

for row in cur.execute(sqlstr):
    print(str(row[0]), row[1]) 195 161 158 111 110 96 93 90 84 67 48 44 44 43 32 29 28 28 28 27

Run AVG command with a LIKE matching string to count the average number of emails from a particular source

sqlstr = 'SELECT AVG(count) FROM Counts WHERE email LIKE "%umich%"'
for row in cur.execute(sqlstr):
    print("Average email received from Univ. of Michigan address:",float(row[0]))
Average email received from Univ. of Michigan address: 44.63636363636363

Close the remote and local file handler and the cursor connection