, ,

UniversityLite Overview

UniversityLite Design Overview

I developed UniversityLite as a rapid deployment e-commerce tool to market university products and information to university students over the web using PHP. In other words, UniversityLite creates, deploys and maintains university websites automatically using custom PHP function. As of 2016, this tool has generated and maintains over 7.000 websites with each receiving about 1,000 views a month. The tool is based on the Model-View-Controller (MVC) architectural programming framework.

UniversityLite as a Function f

UniversityLite as a Function f

In its simplest form, UniversityLite can be understood as a function, where the input is simply the name of a university and the output is a full and complete website tailored to that university. Therefore the job of the function is to automatically create the database and website code for a full and complete website. The method by which this is accomplished is through the systematic gathering of information and the resulting interpretation and presentation of that information based on complex, but predictable logic.

UniversityLite is written primarily in PHP and uses SQL, Javascript and BASH to collect data automatically from sources such as Wikipedia, Google, Bing, Youtube, Amazon, eBay and NCES (and other governmental data sources) to slowly build an understanding about the environment in which to create the site. Most data is gathered through open APIs, such as Bing’s Image API which is used to automatically gather relevant images to a topic, or Amazon’s AWS API to generate products.

UniversityLite Sample REST Request

UniversityLite Sample API REST Request with Amazon AWS

Generating Variables

As the base input for content generation, the tool was fed 7,000 university names that I gathered from the National Center for Education Statistics (NCES).

Let’s look at one of those university names, “Monroe Community College”, as an example to see how the tool generates basic information about the university.

Iterate Through Each University Name

Insert "Monroe Community College" into Algorithm

Insert “Monroe Community College” into Algorithm

First, the tool must determine basic information about the string, “Monroe Community College”. We can reasonably assume this will be the name of a university and will develop an array of queries we will use that represent how the university might be referenced on the internet. First, the tool sees the substring, “Community College” and can reasonably determine we might also call the college by its acronym, “MCC”. This logic is used to generate up to four different unique names each university might go by, such as “Brockport University” from the string “State University of New York at Brockport”.

Now that the tool has an array of search terms, it can get to work using intelligent functions with several of the open APIs mentioned to build a SQL database of variables associated with this university. It would be exhaustive to go through the process of all variables generated as each university name generates thousands of variables. Yet, here are some obvious ones along with the techniques used to gather them:

  • Preferred subdomain such as “mcc.universitylite.com” (query NCES API for website address and strip out the subdomain, ‘mcc’)
  • University Colors such as “Blue and Gold”, as hex values (strippng relevant color names from wikipedia_infobar_request function and sending them to a color_names_to_hex function).
  • City, State and Zip of university, such as “Rochester”, “NY” and “14626” (query NCES API for these values)

…and this process goes on and on to gather such information as demographics, weather, local attractions (such as “Monroe County Museum”), apparel categories (such as “MCC Men’s Pants”), degree programs and product categories based on degree programs (such as “Nursing Supplies”).

UniversityLite builds the SQL database with these variables, organizing each major category into a relevant table. To streamline access and storage, we break each table into the data, a data dictionary describing the data and a variable dictionary (as indicated by the VARS and DD tables below).

Universitylite Database Sample

Universitylite Database Sample

Generating Content

Now that UniversityLite has created our SQL database of a couple thousand variables associated to “Monroe Community College”, we can begin to generate content associated to Monroe County College as a university.

For starters, the functions will begin to create the skeleton content for the website, such as colors, titles, logos and basic content. When displaying products, for example men’s apparel, we use variables such as the university’s name, colors, demographics and degree programs to query the Amazon AWS API.

Function for Sports Weight

Function for Sports Weight

In this example, we determine the university total student population is 52% male, which indicates it is fine to go ahead and generate the REST request to Amazon for a male category (as opposed to an all female university, for example). The variables also show us this university has several sports teams, so we decide to put weight on men’s apparel that is sports related. This weight comes in the form of an array of possible search terms that are given mathematical weights to them based on the funding the sports teams receive (aka their size relative to the overall university’s budget).

The chosen array search term, which let’s say is “MCC Men’s Basketball Activewear” is used to build an Amazon REST request by referencing a custom REST generator build for Men’s Apparel (to assure we are only returning items we want within a certain scope of Men’s apparel). This logic will be repeated across the scope of probability we determine from the sports weight function (and other similarly performing functions) to generate a product list which reflects the weight of sports and other variables actually represented by university statistical data.

REST Request URI Generation

REST Request URI Generation

Once we finally have created our REST request as a string to send to Amazon’s API, we pass it to Amazon, which will in turn provide us an XML database of product information. This will then be imported into the UniversityLite server as a local file for speed in calling the same product string again (dumped after 24 hours for the sake of having relevant pricing models). Each XML comes back between 10MB-30MB, which can quickly grow to over 10GB of XML data in a couple days, depending on how actively the site is being used by actual users. Management of this data with 7,000 active sites (not to mention all the other data generated such as images) requires tight tolerances of data purging to assure the servers that run UniversityLite do so smoothly.

Merging all of the XMLs related to the probability function for sports weight, etc. (using a custom XML Merge function just for this purpose) creates a final XML locally on our server that represents the final collection of information that will be used to display one collection of MCC apparel (in this case, MCC Men’s Apparel).

A sampled section of such a merged XML can be seen below.

Amazon XML Data Return Sample

Amazon XML Data Return Sample

This type of process is used for a couple dozen different product categories which are used through the site. In Monroe Community College’s case, it used for both men and women apparel and merchandise, in addition to several product categories that relate to variables relatable to the university.

Of the greatest importance is applying this logic to textbooks a student could purchase. Because we don’t know what textbooks a student might desire, the search query becomes the basis for the REST request. The only major difference is that the query is passed to a function expecting book titles or ISBNs specifically. One unique aspect, however, is the inclusion of the UniversityLite Aggresearch™. This is a very custom solution I put in place to assure that when UniversityLite Aggresearch™ is chosen, the very lowest priced book opportunities are returned. The basis for this logic is searching several possibilities from multiple vendors and inserting those into an array where we can sort it by the lowest price. The result in many cases is that UniversityLite Aggresearch™ often returns books at competitive prices versus other book searching algorithms found on the internet.

UniversityLite Aggresearch (TM) Search Example

UniversityLite Aggresearch (TM) Search Example

One other major area of the site that falls into the same category is the ability for students to post and sell their own books. Although the process is intended to look seamless to a user, this area has completely different functionality. The buying and selling mechanism is based off the WPAdverts – Classifieds Plugin. A major change I made to the plugin is the ability to automatically fill in the classified posting with Author, Title, ISBN, pictures of the book and suggested prices of the book so the user does not have to. Simply entering in either the ISBN or Title will auto populate all relevant fields. This is a major improvement to the WPAdverts Plugin, and one that makes the plugin unique to UniversityLite. In addition, the e-mail domain variable is used to force users to login with a valid university email to sell a book. This creates a niche and protected market for students at a particular university where they know books are being bought and sold by verified students.

UniversityLite Automated Book Sale Feature

UniversityLite Automated Book Sale Feature

Aside from product generation, UniversityLite also determines useful characteristics about the university that create community pages that automatically update. One such page displays recent YouTube sports videos associated to the university, while another displays recent videos from university students based on a geolocation search to the API. In the former case, the sports team used for the YouTube API search is determined using the wikipedia API and in the geolocation is based on an NCES API call where we query the longitude and latitude. The resulting geolocation is passed to the code snippet below which in turns builds a list of YouTube videos we can insert.

UniversityLite Youtube GeoLocation Code Snippet

UniversityLite Youtube GeoLocation Code Snippet

Another major component of the generated site are dozens of articles, reports and graphs that are automatically “written” for the purpose of displaying information about the university. This is used to drive traffic and interest to the site with unique content. A sample of such an article can be seen below.

UniversityLite News Article Generation Sample

UniversityLite News Article Generation Sample

The many graphs for the article are created using several different methods, most of which use the NCES and Wikipedia variable results now in the SQL Database. In the sample below, we iterate through available variables to build the foundation for a JPEG graph that will be displayed on the site.

A sentence within the article might follow logic something like this code snippet below.

UniversityLite Sentence Generation Sample

UniversityLite Sentence Generation Sample

Load Balancing

UniversityLite is load balanced across three Ubuntu 14.04 LTS virtual servers located in New York, NY with one backup server in Webster, NY. The three primary servers are balanced alphabetically by university name. There are 7,000 websites available as subdomains off of the universitylite.com domain (such as rit.universitylite.com). Each subdomain is given resources by the server only if actively being viewed. As seen in the image below, an arbitrary university named ‘David’s University College’ would be hosted on Server1. Because the functions and algorithms to create any of the A-Z universities resides on each ot the servers, the balancing is done for resource balancing, not necessarily content balancing.

UniversityLite Server Balance

UniversityLite Server Balance

Because of the large set of active subdomains, load balancing through DNS would be cumbersome and time consuming as partial wildcards (such as a*.universitylite.com) in a DNS zone are not defined behavior within the RFC. Manually adding a record for rit.universitylite.com, for example, would also require us to add a manual record for the 6,999 other subdomains. Therefor a DNS zone can only practicaly be used with full wildcards such as *.universitylite.com to to one of the universitylite.com servers. As a result, load balancing is handled first by the Apache virtualhost file, and then subsequently by PHP logic once the subdomain is called.

As seen in the function below, we can query the SQL table once a subdomain passed to the UniversityLite webserver. If the subdomain is found as a SQL entry, we can go ahead and assume the server referenced is the correct load balanced server. However, if it isn’t, we a) either can assume it’s load balanced on one of the other servers or b) it is simply not a valid subdomain anywhere on the site. For example, trying to go to gaboldygooky.universitylite.com will fail all load balancing checks and return an error to the browser.

UniversityLite PHP Load Balance Example

UniversityLite PHP Load Balance Example

In this way, I have created a dynamic load balancing system that allows sites to be added and removed ad-hoc without having to alter DNS. Of course, this is just general domain load balancing, and we must further balance the content itself. There are, afterall, 7,000 different subdomains with uniquely generated HTML, photos, graphs, etc. The easiest way I found to handle this is by having UniversityLite just be one central source of programming code, and to allow other content to be displayed dynamically when required. That is not to say, however, that each site can be dynamically created each time it is viewed, but we can remove some redundancy from the equation.

Below are areas of dialy generated content that CAN be shared between them, assuming the same calculator with description, etc. might be viewed by more than one subdomain. This data alone amounts to about 10GB of freshly generated content for a 24 hour period for the sites on server1. If we did not consolidate this data, this could easily have grown to 23TB of data in just 24 hours on one server.

Directory List of ~/www/universitylite.com/public_html/sites/shared$:

drwxr-xr-x 2 maiolo99 maiolo99 4096 Oct 20 16:49 adpics
drwxr-xr-x 7 maiolo99 maiolo99 4096 Oct 20 21:29 amazon_xml
drwxr-xr-x 2 maiolo99 maiolo99 4096 Sep 23 18:07 apparel_images
drwxrwx--- 2 maiolo99 maiolo99 4096 Sep 18 16:54 avatars
drwxrwx--- 2 maiolo99 maiolo99 4096 Jul 26 23:29 holidays
drwxr-xr-x 356 maiolo99 maiolo99 20480 Oct 8 00:30 ms_images
drwxr-xr-x 2 maiolo99 maiolo99 7966720 Oct 21 16:51 product_images

However, sometimes there is only so much we can do. As seen in the directory listing below, these files are created uniquely for one of the subdomains (mcc.universitylite.com) below. This information is perfectly unique to Monroe Community College, and there is nothing we can really do about it.

Directory Listing of ~/www/universitylite.com/public_html/sites/MCCTextbooks:

./daily_message:
MonroeCommunityCollege_20161019_daily.txt

./graphs-2016-09-19:
graph_MCC_1473753800.jpg
graph_MCC_1473813698.jpg
graph_MCC_1474457188.jpg
graph_MCC_1474581272.jpg
graph_MCC_All Instructional Staff Total_1473662381.jpg
graph_MCC_All Instructional Staff Total_1473662723.jpg
graph_MCC_All Instructional Staff Total_1473722731.jpg
graph_MCC_All Instructional Staff Total_1473773589.jpg
graph_MCC_All Instructional Staff Total_1473791287.jpg
graph_MCC_All Instructional Staff Total_1473813691.jpg
graph_MCC_All Instructional Staff Total_1474227934.jpg
graph_MCC_All Instructional Staff Total_1474457249.jpg
graph_MCC_All Instructional Staff Total.jpg
graph_MCC_Assistant Professor_1473662380.jpg
graph_MCC_Assistant Professor_1473662722.jpg
graph_MCC_Assistant Professor_1473722730.jpg
graph_MCC_Assistant Professor_1473773588.jpg
graph_MCC_Assistant Professor_1473791286.jpg
graph_MCC_Assistant Professor_1473813690.jpg
graph_MCC_Assistant Professor_1474227933.jpg
graph_MCC_Assistant Professor_1474457248.jpg
graph_MCC_Assistant Professor.jpg
graph_MCC_Associate Professor_1473662379.jpg
graph_MCC_Associate Professor_1473662722.jpg
graph_MCC_Associate Professor_1473722730.jpg
graph_MCC_Associate Professor_1473773588.jpg
graph_MCC_Associate Professor_1473791286.jpg
graph_MCC_Associate Professor_1473813690.jpg
graph_MCC_Associate Professor_1474227932.jpg
graph_MCC_Associate Professor_1474457248.jpg
graph_MCC_Associate Professor.jpg
graph_MCC_degrees_by_race.jpg
graph_MCC_demographics_1474316106.jpg
graph_MCC_demographics_1474316169.jpg
graph_MCC_demographics_1474316231.jpg
graph_MCC_demographics_1474316252.jpg
graph_MCC_demographics_1474316321.jpg
graph_MCC_demographics_1474316741.jpg
graph_MCC_demographics_1474316754.jpg
graph_MCC_demographics_1474316757.jpg
graph_MCC_demographics_1474316908.jpg
graph_MCC_demographics_1474317796.jpg
graph_MCC_demographics_1474318228.jpg
graph_MCC_demographics_1474318400.jpg
graph_MCC_demographics_1474318536.jpg
graph_MCC_demographics_1474318968.jpg
graph_MCC_demographics_1474318969.jpg
graph_MCC_demographics_1474321987.jpg
graph_MCC_demographics_1474323294.jpg
graph_MCC_demographics_1474329291.jpg
graph_MCC_demographics_1474329312.jpg
graph_MCC_demographics_1474332642.jpg
graph_MCC_female_students_by_age.jpg
graph_MCC_graduates_to_enrolled_by_race_ratio.jpg
graph_MCC_Instructor_1473662380.jpg
graph_MCC_Instructor_1473662722.jpg
graph_MCC_Instructor_1473722731.jpg
graph_MCC_Instructor_1473773588.jpg
graph_MCC_Instructor_1473791286.jpg
graph_MCC_Instructor_1473813690.jpg
graph_MCC_Instructor_1474227933.jpg
graph_MCC_Instructor_1474457249.jpg
graph_MCC_instructor_annual_income.jpg
graph_MCC_Instructor.jpg
graph_MCC_Lecturer_1473662380.jpg
graph_MCC_Lecturer_1473662722.jpg
graph_MCC_Lecturer_1473722731.jpg
graph_MCC_Lecturer_1473773589.jpg
graph_MCC_Lecturer_1473791287.jpg
graph_MCC_Lecturer_1473813691.jpg
graph_MCC_Lecturer_1474227933.jpg
graph_MCC_Lecturer_1474457249.jpg
graph_MCC_Lecturer.jpg
graph_MCC_male_students_by_age.jpg
graph_MCC_Professor_1473662379.jpg
graph_MCC_Professor_1473662721.jpg
graph_MCC_Professor_1473722730.jpg
graph_MCC_Professor_1473773588.jpg
graph_MCC_Professor_1473791286.jpg
graph_MCC_Professor_1473813690.jpg
graph_MCC_Professor_1474227932.jpg
graph_MCC_Professor_1474457247.jpg
graph_MCC_Professor.jpg
graph_MCC_students_by_age.jpg
staff_sallary_report_graph_MCC_0.jpg

./weather:
Rochester_NY_Weather_2016-09-19.txt

./wiki_infobox-2016-10-19:
Bevier20Memorial20Building(BevierMemorialBuilding)_intro_request_phase2.xml
Bevier20Memorial20Building_intro_request_phase1.xml
Bridge20Square20Historic20District(BridgeSquareHistoricDistrict)_intro_request_phase2.xml
Bridge20Square20Historic20District_intro_request_phase1.xml
Campbell-Whittlesey20House(Campbell-WhittleseyHouse)_intro_request_phase2.xml
Campbell-Whittlesey20House_intro_request_phase1.xml
First20Presbyterian20Church20(Rochester20New20York)(FirstPresbyterianChurch(RochesterNewYork))_intro_request_phase2.xml
First20Presbyterian20Church20Rochester20New20York_intro_request_phase1.xml
Hervey20Ely20House(HerveyElyHouse)_intro_request_phase2.xml
Hervey20Ely20House_intro_request_phase1.xml
Immaculate20Conception20Church20(Rochester20New20York)(ImmaculateConceptionChurch(RochesterNewYork))_intro_request_phase2.xml
Immaculate20Conception20Church20Rochester20New20York_intro_request_phase1.xml
Jonathan20Child20House2020BrewsterBurke20House20Historic20District_intro_request_phase1.xml
Jonathan20Child20House2020BrewsterBurke20House20Historic20District(JonathanChildHouseampBrewsterBurkeHouseHistoricDistrict)_intro_request_phase2.xml
Main2020Oak20RIRTR20station_intro_request_phase1.xml
Main2020Oak20(RIRTR20station)(Main)_intro_request_phase2.xml
Monroe20Community20College20_intro_request_phase1.xml
Monroe20Community20College20(MonroeCommunityCollege)_intro_request_phase2.xml
Monroe20Community20College20Sports_intro_request_phase1.xml
Monroe20Community20College20Sports(UniversityofLouisianaatMonroe)_intro_request_phase2.xml
Monroe20Community20College20Team_intro_request_phase1.xml
Monroe20Community20College20Team(UniversityofLouisianaatMonroe)_intro_request_phase2.xml
Monroe20Community20College20Tribunes20_intro_request_phase1.xml
Monroe20Community20College20Tribunes20(ListofcollegeathleticprogramsinNewYork)_intro_request_phase2.xml
Monroe20Community20College20Tribunes_intro_request_phase1.xml
Monroe20Community20College20Tribunes(ListofcollegeathleticprogramsinNewYork)_intro_request_phase2.xml
Monroe20Community20College_intro_request_phase1.xml
Monroe20Community20College(MonroeCommunityCollege)_intro_request_phase2.xml
MonroeCommunityCollege_infobox.xml
Nick20Tahou20Hots_intro_request_phase1.xml
Nick20Tahou20Hots(NickTahouHots)_intro_request_phase2.xml
RochesterNewYork_infobox.xml
RochesterNY_geosearch.json
Third20Ward20Historic20District20Rochester20New20York_intro_request_phase1.xml
Third20Ward20Historic20District20(Rochester20New20York)(ThirdWardHistoricDistrict(RochesterNewYork))_intro_request_phase2.xml

./youtube-2016-10-19:
BevierMemorialBuildingRochesterNY_youtube_2016-10-19.txt
BridgeSquareHistoricDistrictRochesterNY_youtube_2016-10-19.txt
Campbell-WhittleseyHouseRochesterNY_youtube_2016-10-19.txt
FirstPresbyterianChurchRochesterNewYork_youtube_2016-10-19.txt
geosearch-431012652C-77608488--searchquery-_youtube.txt
geosearch-Rochester-NY--searchquery-RochesterNYMusic_youtube.txt
HerveyElyHouseRochesterNY_youtube_2016-10-19.txt
ImmaculateConceptionChurchRochesterNewYork_youtube_2016-10-19.txt
JonathanChildHouseBrewsterBurkeHouseHistoricDistrictRochesterNY_youtube_2016-10-19.txt
MainOakRIRTRstationRochesterNY_youtube_2016-10-19.txt
MonroeCommunityCollegeSports_youtube_2016-10-19.txt
MonroeCommunityCollege_youtube_2016-10-19.txt
NickTahouHotsRochesterNY_youtube_2016-10-19.txt
ThirdWardHistoricDistrictRochesterNewYork_youtube_2016-10-19.txt

The best we can do with this data is purge it (beyond what we might want to keep for historical purposes). We do this by adding historically usable data into a SQL table for that university, and simply deleting the rest.

The load balancing techniques used on UniversityLite have allowed us to maintain 7,000 unique websites, on only three servers.

Summary

In all, UniversityLite was written in over 800 PHP functions and algorithms, in conjunction with Javascript, SQL, BASH and HTML to provide the automated creation of a website based on a search string such as “Monroe Community College”. The result is a collection of over 7,000 automatically created and maintained websites that generate traffic and income in a most automatically landscape.

Now that I have the groundwork for this structure in place, I am already applying it to other markets such as automobile parts and niche Hawaiian products.

For more information, or a closer demo of my work or code, please feel free to contact me.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *