FreeREG logo
Hosted by The Bunker   RootsWeb, the oldest and largest FREE genealogy site Ancestry.co.uk Logo

Converting data from a web page or word processor or other type of file into a CSV file

Many people already have transcriptions of registers, unfortunately their data files are usually in a different format to that required by FreeREG and most commonly being stored in a different piece of software such as Microsoft Works or Word or Excel. Many also have them as web pages. They

a) have all sorts of different layouts
b) have different data fields included, and in a different order.

The data in these files DOES NOT have to be retyped to be used in FreeREG.But it does have to be converted into the correct format and file type so as to be entered into the database. The process of conversion is relatively straight forward and is described in the following material. Alternatively there are people working with FreeREG as Data Editors who are very familiar with the process and can undertake it for you. Basically we can use exactly the same technique to convert all of them into a csv file for use in the database. Depending upon the type of file you start at the appropriate place in these instructions.If you are in doubt about these proceedures contact your County Co-ordinator or one of the Data Editors.

Lets start with the most complex type of file. A webpage. Some people have transcriptions of data from registers on their web pages, and no other copy in a database file. They may have given FreeREG permission to include that data into our database, but the problem is one of converting the data into a format which can be incorporated into the database itself.

What we are going to do, is use normal standard software to do it in several (lots) of stages, rather than writing a computer program specifically to do it. What is described is the simplest of proceedures. There are more efficient ways of doing many of these steps within Excel using formulea but they are quite difficult of those who are unfamiliar with working with Excel.

IMPORTANT - THIS MUST NEVER BE DONE WITHOUT PERMISSION FROM THE OWNER OF THE DATA

As an example, I'm working first with one file of 1832 burials which can be seen as a web page here

This is a test run with just this one file to get you used to it. What we'll be doing with a real run is to use real, and a larger set of data. If we start with several separate files all in the same format then it is much easier to combine them all into one word processor file first before doing all the search and replace stuff... so that we only have to do it once! This all looks a lot... but actually it doesn't take that long!


Stage 1.
Saving the web page


  1. create a directory for the files to go in
  2. whilst viewing the web page save the page as: 1832bur.htm into this directory.

Stage 2.
Editing the file to convert it into text and getting rid of unwanted text


  1. start MS Word (or your favourite word processor
  2. open the file 1832bur.htm (yep... word can open it)
  3. click/drag mouse over the top of the cells in the table to highlight them all.
  4. click on menu "Table" and "Convert table to text"
  5. in Window specify "separated by commas"
  6. delete everything else on the page


Stage 3.
First manipulation of the data using search and replace


We now need to do several runs of search and replace. Test it happening with just this one little file first, so that you can get the hang of it. The problems are:

  • spaces after each comma
  • ", dd, MON, yyyy" where we want ",dd Mon yyyy"

so... Edit... replace

,[space]1 with ,1
,[space]2 with ,2
etc. up to 9

then replace:

,[space]AUG[space], with [space]Aug[space]
,[space]SEP[space], with [space]Sep[space]
etc, for all months

and finally get rid of the remaining spaces

,[space] replaced with ,

that leaves us with

LINNEY,Godfrey, ,52,Brampton Moor ,16 Aug 1832
ELLIOTT,Francis, ,21,Brampton Moor ,17 Aug 1832
MADIN,Samuel, ,2,Brampton Moor ,28 Aug 1832
MADIN,Elizabeth, ,8 Months,Brampton Moor ,28 Aug 1832
ASTWOOD,Mary, ,29,Little Brampton ,6 Sep 1832
WILSON,Ann, ,35,Walton ,13 Nov 1832
TUPMAN,Andrew, ,3 Weeks,Brampton ,2 Dec 1832
HALL,Maria, ,2yr 7m,Brampton Moor ,14 Dec 1832


Stage 4.
What is left?


Note the empty third field. That may well sometimes contain a second Christian name. (That's what it was intended to contain in this data set.)

This is going to be harder to manipulate. What we want to do is to convert:

surname,first1,first2,age, etc to:
surname,bothfirstnames,age, etc.

Frankly, the easiest way to do this is going to be later in Excel (or any spreadsheet).

I would tend to leave these in for the moment, and handle them later, when we can see them in columns in a spreadsheet. They will then be immediately obvious.

  1. OK... ready to save that file now from Word as a .csv file
  2. File... Save As..
  3. pick file type "text only *.txt"
  4. file name 1832bur.csv
  5. OK
  6. Close MS Word


Stage 5
More manipulation using a spreadsheet.


Start Excel (or your favourite spreadsheet)

Open the file 1832bur.csv

Here we go... we now have it in columns. Note the column "C" which will be mostly empty, except for a few with the second Christian name. We can deal with those first. (note that the sample of data we are playing with doesn't have any second Christian names, but other data sets might!)

The easiest way is just to do it manually:

  1. click on the second name, cut,
  2. click on the end of the first name, add a space, and paste it in behind it.

The hard bit has now all been done!!


Stage 6
Adding the FreeREG data fields


We just need to add the extra fields (columns) that can contain the other data we need to be included. In some data sets the contents of some columns (fields) may be empty, but we still need them there!

  1. click on the number "1" at the left end of row 1
  2. on the menu... Insert... Row
  3. we have a new empty row 1
  4. click on the head of column A on the "A"
  5. on the menu... insert... columns
  6. and several times more until we have empty new columns A-N

Now, in row 1, label each of the columns (note all lower case, and must be exactly as written e.g. "rel1-male-first"

A county
B place
C church
D no
E burdate
F firstname
G relationship
H rel1-male-first
I rel2-female-first
J rel1-surname
K surname
L age
M abode
N notes

The existing columns with data will have been relegated to columns O through T.


Stage 7
Adding some data into our fields


Now... this file of data doesn't have anything in column 1 yet

  1. in A2 type in: DBY and press enter
  2. click on the bottom right hand corner of cell A2 and drag it down to highlight the whole column (for all the rows containing data) and let go the mouse. DBY will be copied into all cells in column A
  3. in B2 type in : Brampton and copy it down the column
  4. in C2 type in : St. Thomas and copy it down the column
  5. column D doesn't have register entry numbers, so just leave it empty except the heading "no".


Stage 8
Now we are going to get the real data from those other columns, copy it and paste it into our new data columns


But first:

type into row 1 above the data in column:

O: surname
P: firstname
Q: (this is now an empty column)
R: age
S: abode
T: date

.

  1. Click on the "O" at the top of column O (to select the whole column)
  2. click on copy
  3. click on the "K" at the top of column K (surname column)
  4. click on paste - bingo!
  5. do the same for the other data columns copying, and pasting to the correct column
  6. Finally click on the top of column O to highlight it, and whilst holding down the mouse button, drag right to highlight all columns O through T, and then on the menu, select Edit... and then Delete.

The columns with the old copy of the data is deleted.

We are nearly done; we just need to adjust the Notes column for a weakness(Dare we say a bug) in Excel. It has been found that Excel does NOT ALWAYS put out the correct number of empty fields if the last field (Always the notes field) and some preceding fields are blank. There is a relatively simple work around. If the NOTES field is blank enter the characters zzz; easily done by copying a line cell that contains the zzz and then pasting it to all NOTE fields that are empty. Now you are finished with Excel.


Stage 9:
saving the data file as a csv file


All done! We just need to save it again as a .csv file. The file in your spreadsheet will still be a csv file. All you need do is save it.

Exit Excel... it says "do you want to save it as an Excel Spreadsheet".

No. There is no need to. We want that .csv file.


Now... although we have several columns "empty" except for their headings, we will have created a complete comma separated variable file, with all the right data fields, which can now be inserted into our FreeREG database.


Stage 10:
Correcting the CSV file.

As noted above it is necessary to correct the CSV file for an error caused by Excel. Also after you have loaded the file into FreeREG you may have to correct entries. The best way of proceeding is to open the CSV file with Notepad or some other simple text editor. DO NOT USE WORD OR WORD PERFECT or any other word processing software. Having opened the file, select the Replace option in the Edit menu, enter zzz in the find what field, leave the Replace with field blank, select Replace All, entries disappear, when the Find and Replace is finished close it of and save the file. You are now ready for the last stage of adding the file Headers.
Stage 11:
Adding the File Headers.

You are now at the penultimate stage. Before submitting the data to FreeREG it is necessary to tell FreeREG something about the data that you have entered. This is described in detail on the headers web page. As noted the headers are added using Notepad or something similar. It CANNOT be done with Excel!


Copyright ©2000-6 FreeREG
We make no warranty whatsoever as to the accuracy and completeness of the FreeREG data.

Last update
19 March 2008 / RN/DKD