Using R shiny, a web application framework, the user can explore the movies dataset base on language, genres, min number of reviews on Rotten Tomatoes, year released and Oscar wins.

The movies dataset included follow information:

movie_database_structure

X-axis variable: Year    Y-axis variable: Dollar at box office

screen-shot-2016-09-30-at-9-40-26-am

rshiny_demo

screen-shot-2016-11-01-at-10-58-01-pm

darkknight1lg

screen-shot-2016-11-06-at-3-25-06-pm22743_full

screen-shot-2016-11-06-at-3-28-28-pm

On the top of the right-hand, we can find out 2008; the Dark Knight had a most impressive record. Dark Knight had $ 533,300,000 Boxoffice and high numeric rating 8.5. However, we can graph Rating and Boxoffice two variables that move not always in the same direction. Some movie like The King’s Speech had a numeric rating 8.5 but only have 138,800,000 Boxoffice. Life of Pi is the blockbuster movie in 2012 with the numeric rating 8, but sadly only got 125,000,000 Boxoffice compare to Dark Knight $533,300,000.

screen-shot-2016-11-06-at-3-38-37-pm

screen-shot-2016-11-06-at-4-33-30-pm

If we use the filter to selected the 12 movies Box Office over 400 million dollars.  We can found that most of them are active and adventure films. Besides, it’s fascinating that sequel made the same or higher record than the prequel. For example, Iron Man 3, Shrek 2, The Dark Knight 1, The Dark Knight 2, Toy Story 3.


screen-shot-2016-11-06-at-5-34-27-pm

If we selected X variable is year and Y dollars at the box office, we could find that after 2000 the film market is growing rapidly.  The Lion King in 1994 is the first Animation film that closed to $94,200,000 Box office and opens the  Animation & Adventure film market.

screen-shot-2016-11-06-at-5-35-04-pmAfter battling a fire-breathing dragon and the evil Lord Farquaad to win the hand of Princess Fiona, Shrek now faces his greatest challenge: the in-laws. Shrek and Princess Fiona return from their honeymoon to find an invitation to visit Fiona’s parents. Shrek 2 rank top 1 in Top 2004 Movies at the Worldwide Box Office with $ 436,500,000  Box Office five-time than the Lion King and $937,008,132 revenue. Frozen and 2010 Toy Story 3($ 415,000,000) also be highly accepted by the films market as the Animation & Adventure. (2010 Black Swan, Drama & Mystery $ 107,000,000 Box Office)

It seems that not only children like a fairy tale, adults also enjoyed the funny and laid-back animation. Animated movies are not only targeted at children because there are many grownups now that are fond of them as well. Maybe watching an animated movie gave them nostalgia or memories in their previous life as a child.

knlzyg

Conclusion:  Data Visualization is not an end-goal, it is a process. Using visualization allows users to absorb the data better and see new paths. This enables users to identify new patterns and trends such like the correlation between numeric rating and dollars at the box office. Even extensive amounts of complicated data start to make sense when presented graphically and find out the film market insight . Identifying those relationships helps organizations focus on areas most likely to influence their most important goals.


How to visualize your data : R programming 

In this project, I used R Shiny App application to cleaning the data and explore the movies dataset. Shiny applications have two components: a user-interface definition and a server script. The source code for both of these components is listed below.

The open source: rstudio/shiny-examples

https://github.com/rstudio/shiny-examples/tree/master/051-movie-explorer

The user interface is defined in a source file named ui.R:   I used some HTML/CSS structure to make the interface more attractive. For example , font-family: ‘Lobster’ and change the color to light green ( color: #48ca3b ).

Ui.R
library(ggvis)

# For dropdown menu
actionLink <- function(inputId, ...) {
  tags$a(href='javascript:void',
         id=inputId,
         class='action-button',
         ...)
}

fluidPage(
  
  
  tags$head(
    tags$style(HTML("
                    @import url('//fonts.googleapis.com/css?family=Lobster|Cabin:400,700');
                    
                    h1 {
                    font-family: 'Lobster';
                    font-weight: 500;
                    line-height: 1.1;
                    color: #48ca3b;
                    }
                    h2{
                    font-family: 'Lobster';
                    font-weight: 100;
                    line-height: 1;
                    color: #48ca3b;
                    
                    }
                    
                    "))
    ),

Secondly, created two select input (Language and Genre) and five slider Input ( genre, reviews, year, Oscars, Boxoffice ). Besides, used ggvisOutput function to show the plot.

Ui.R

  
  
  titlePanel(h1("Annette's Movie explorer")),
  fluidRow(
    column(3,
           wellPanel(
             
             
             selectInput("language", "language",
                         c("All", "English", "Vietnamese", "Turkish", "Italian", "Kurdish",
                           "German", "Japanese", "Thai", "Mandarin", " Indian Sign Language",                                    "Spanish",
                           "Latin", "French", "Swedish", "Korean")),
             
             selectInput("genre", "Genre (a movie can have multiple genres)",
                         c("All", "Action", "Adventure", "Animation", "Biography", "Comedy",
                           "Crime", "Documentary", "Drama", "Family", "Fantasy", "History",
                           "Horror", "Music", "Musical", "Mystery", "Romance", "Sci-Fi",
                           "Short", "Sport", "Thriller", "War", "Western")),
             sliderInput("reviews", "Minimum number of reviews on Rotten Tomatoes",
                         10, 300, 80, step = 10),
             sliderInput("year", "Year released", 1940, 2014, value = c(1970, 2014)),
             sliderInput("oscars", "Minimum number of Oscar wins (all categories)",
                         0, 4, 0, step = 1),
             sliderInput("boxoffice", "Dollars at Box Office (millions)",
                         0, 800, c(0, 800), step = 1)
                         
             
             
           ),
           wellPanel(
             selectInput("xvar", "X-axis variable", axis_vars, selected = "Meter"),
             selectInput("yvar", "Y-axis variable", axis_vars, selected = "Reviews"),
             tags$small(paste0(
               "",
               " (as judged by the Rotten Tomatoes staff), and the Numeric rating is",
               " a normalized 1-10 score of those reviews which have star ratings",
               " ."
             ))
           )
    ),
    column(9,
           ggvisOutput("plot1"),
           wellPanel(
             span("Number of movies selected:",
                  textOutput("n_movies")
             )
           )
    )
  )
    )

The server-side of the application is shown below. At one level, it’s very simple–a random distribution is plotted as a histogram with the requested number of bins.

Join tables and some data cleaning : filtering out those with <10 reviews, and select specified columns.

Serve.R
library(shiny)
library(ggvis)
library(dplyr)
if (FALSE) library(RSQLite)

# Set up handles to database tables on app start
db <- src_sqlite("~/Desktop/R-Shiny-project/movie/movies.db")
omdb <- tbl(db, "omdb")
tomatoes <- tbl(db, "tomatoes")

# Join tables, filtering out those with <10 reviews, and select specified columns
all_movies <- inner_join(omdb, tomatoes, by = "ID") %>%
  filter(Reviews >= 10) %>%
  select(ID, imdbID, Title, Year, Rating_m = Rating.x, Runtime, Genre, Released,
         Writer, imdbRating, imdbVotes, Language, Country, Oscars,
         Rating = Rating.y, Meter, Reviews, Fresh, Rotten, userMeter, userRating, userReviews,
         BoxOffice, Production)

# Filter the movies, returning a data frame

Serve.R
function(input, output, session) {
  
  # Filter the movies, returning a data frame
  movies <- reactive({
    # Due to dplyr issue #318, we need temp variables for input values
    reviews <- input$reviews
    oscars <- input$oscars
    minyear <- input$year[1]
    maxyear <- input$year[2]
    minboxoffice <- input$boxoffice[1] * 1e6
    maxboxoffice <- input$boxoffice[2] * 1e6
    
    # Apply filters
    m <- all_movies %>%
      filter(
        Reviews >= reviews,
        Oscars >= oscars,
        Year >= minyear,
        Year <= maxyear,
        BoxOffice >= minboxoffice,
        BoxOffice <= maxboxoffice
      ) %>%
      arrange(Oscars)
    
    # filter by genre
    if (input$genre != "All") {
      genre <- paste0("%", input$genre, "%")
      m <- m %>% filter(Genre %like% genre)
    }
    
    # filter by language
    if (input$language != "All") {
      language <- paste0("%", input$language, "%")
      m <- m %>% filter(Language %like% language)
    }
    
    # filter by director
    if (!is.null(input$director) && input$director != "") {
      director <- paste0("%", input$director, "%")
      m <- m %>% filter(Director %like% director)
    }
    # filter by cast member
    
    m <- as.data.frame(m)
    
    # Add column which says whether the movie won any Oscars
    # Be a little careful in case we have a zero-row data frame
    m$has_oscar <- character(nrow(m))
    m$has_oscar[m$Oscars == 0] <- "No"
    m$has_oscar[m$Oscars >= 1] <- "Yes"
    m
  })

# Function for generating tooltip text

movie_tooltip <- function(x) {
    if (is.null(x)) return(NULL)
    if (is.null(x$ID)) return(NULL)
    
    # Pick out the movie with this ID
    all_movies <- isolate(movies())
    movie <- all_movies[all_movies$ID == x$ID, ]
    
    paste0("<b>", h1(movie$Title), "</b><br>",
           h2(movie$Year), 
           h2("Boxoffice$"), "<b>", format(movie$BoxOffice, big.mark = ",", scientific = FALSE),
           movie$Genre
    )
  }

# A reactive expression with the ggvis plot

Serve.R
# A reactive expression with the ggvis plot
  vis <- reactive({
    # Lables for axes
    xvar_name <- names(axis_vars)[axis_vars == input$xvar]
    yvar_name <- names(axis_vars)[axis_vars == input$yvar]
    
    # Normally we could do something like props(x = ~BoxOffice, y = ~Reviews),
    # but since the inputs are strings, we need to do a little more work.
    xvar <- prop("x", as.symbol(input$xvar))
    yvar <- prop("y", as.symbol(input$yvar))
    
    movies %>%
      ggvis(x = xvar, y = yvar) %>%
      layer_points(size := 50, size.hover := 200,
                   fillOpacity := 0.2, fillOpacity.hover := 0.5,
                   stroke = ~has_oscar, key := ~ID) %>%
      add_tooltip(movie_tooltip, "hover") %>%
      add_axis("x", title = xvar_name) %>%
      add_axis("y", title = yvar_name) %>%
      
      set_options(width = 500, height = 500)
  })
  
  vis %>% bind_shiny("plot1")
  
  output$n_movies <- renderText({ nrow(movies()) })
}
  

# Variables that can be put on the x and y axes

Global.R
# Variables that can be put on the x and y axes
axis_vars <- c(
  "Tomato Meter" = "Meter",
  "Numeric Rating" = "Rating",
  "Number of reviews" = "Reviews",
  "Dollars at box office" = "BoxOffice",
  "Year" = "Year",
  "Length (minutes)" = "Runtime"
)

Finally, use Rstudio to run your APP and you can explore the beauty of the data visualization.

screen-shot-2016-11-06-at-6-11-43-pm

3 thoughts on “Data Visualization Project : Movie Database”

  1. Hi There! We are looking for experienced people that are interested in from working their home on a part-time basis. If you want to earn $500 a day, and you don’t mind developing some short opinions up, this might be perfect opportunity for you! Simply click the link here NOW!

Leave a Reply

Your email address will not be published. Required fields are marked *